library(AquaSensR)AquaSensR requires two input files to use the functions in the package:
- Continuous monitoring data: time series of sensor observations at a site, one column per parameter.
- Data Quality Objectives: parameter-specific data quality objectives used by the four QC checks (gross range, spike, rate of change, and flatline).
Both file types are Excel workbooks (.xlsx). This vignette describes how to import and check each input dataset. It is critical that the input datasets follow the exact specified format. Example files with the correct format are included with the package and are used throughout.
Load the package
Load the package in an R session after installation:
File paths
First, specify the location of the two files by saving their paths to R variables. In practice you will supply paths to your own files, for example:
contpth <- "path/to/your/ContinuousData.xlsx"
dqopth <- "path/to/your/DQO.xlsx"The examples below use the files included with the package:
contpth <- system.file("extdata/ExampleCont1.xlsx", package = "AquaSensR")
dqopth <- system.file("extdata/ExampleDQO.xlsx", package = "AquaSensR")Continuous monitoring data
Use readASRcont() to import continuous monitoring data. The function reads the Excel file, automatically runs a series of checks via checkASRcont(), and then formats the result for downstream use. The tz argument sets the time zone for the output DateTime column (see OlsonNames() for valid values). Be careful to specify the correct time zone, particularly one using a fixed offset (e.g., Etc/GMT+5 for Eastern time) to avoid issues with daylight saving time transitions. If your data are in local time and the time zone observes DST, consider using a time zone like America/New_York that will automatically adjust for daylight savings.
AquaSensR accepts two input formats for the date and time information. The examples below demonstrate both.
Format 1 — separate Date and Time columns (ExampleCont1.xlsx):
contdat <- readASRcont(contpth, tz = "Etc/GMT+5")
#> Running checks on continuous data...
#> Checking column names... OK
#> Checking Date, Time are present... OK
#> Checking at least one parameter column is present... OK
#> Checking date format... OK
#> Checking time format... OK
#> Checking for missing values... OK
#> Checking parameter columns for non-numeric values... OK
#>
#> All checks passed!Format 2 — combined DateTime column (ExampleCont2.xlsx):
contpth2 <- system.file("extdata/ExampleCont2.xlsx", package = "AquaSensR")
contdat2 <- readASRcont(contpth2, tz = "Etc/GMT+5")
#> Running checks on continuous data...
#> Checking column names... OK
#> Checking DateTime is present... OK
#> Checking at least one parameter column is present... OK
#> Checking DateTime format... OK
#> Checking for missing values... OK
#> Checking parameter columns for non-numeric values... OK
#>
#> All checks passed!Both calls return identically structured output (see Output format below).
Format requirements
The continuous monitoring data workbook must follow one of two accepted schemas. Additional unrecognised columns will trigger an error.
Format 1: separate Date and Time columns
| Column | Description |
|---|---|
Date |
Observation date, parseable by lubridate::ymd() (e.g., 2024-06-01) |
Time |
Observation time in 24-hour (e.g., 16:30:33), 12-hour AM/PM (e.g., 4:30:33 PM), or Excel-native format (e.g., 1899-12-31 16:30:33) |
| At least one parameter column | Column name must match a Parameter entry in paramsASR (e.g., Water Temp_C) |
Format 2: combined DateTime column
| Column | Description |
|---|---|
DateTime |
Combined date and time in 24-hour (e.g., 2024-06-01 16:30:33) or 12-hour AM/PM (e.g., 2024-06-01 4:30:33 PM) format |
| At least one parameter column | Column name must match a Parameter entry in paramsASR (e.g., Water Temp_C) |
Currently, AquaSensR allows the following parameters. Note the inclusion of the units in the parameter name. Make sure the parameter name matches the units used in your data.
| Description | Required file name | Units |
|---|---|---|
| Air Temp (C) | Air Temp_C | deg C |
| Air Temp (F) | Air Temp_F | deg F |
| Air BP (psi) | Air BP_psi | psi |
| Air BP (mmHg) | Air BP_mmHg | mmHg |
| Chlorophyll-a (μg/l) | Chlorophylla_ug_l | ug/l |
| Chlorophyll-a (RFU) | Chlorophylla_RFU | RFU |
| Pheophytin (μg/l) | Pheophytin_ug_l | ug/l |
| Pheophytin (RFU) | Pheophytin_RFU | RFU |
| pCO2 (ppm) | pCO2_ppm | ppm |
| Conductivity (μS/cm) | Conductivity_uS_cm | uS/cm |
| Salinity (ppt) | Salinity_ppt | ppt |
| Specific Conductance (μS/cm) | Sp Conductance_uS_cm | uS/cm |
| Cyanobacteria (μg/l) | Cyanobacteria_ug_l | ug/l |
| Phycocyanin (μg/l) | Phycocyanin_ug_l | ug/l |
| Phycoerythrin (μg/l) | Phycoerythrin_ug_l | ug/l |
| DO (mg/l) | DO_mg_l | mg/l |
| DO Adjusted (mg/l) | DO_adj_mg_l | mg/l |
| DO (% Sat) | DO_pctsat | % |
| CDOM (mg/l) | CDOM_mg_l | mg/l |
| FDOM (mg/l) | FDOM_mg_l | mg/l |
| E. coli (#/100ml) | E. coli_#_100ml | #/100ml |
| E. coli (CFU/100ml) | E. coli_CFU_100ml | CFU/100ml |
| Discharge (cfs) | Discharge_cfs | cfs |
| Nitrate (μg/l) | Nitrate_ug_l | ug/l |
| PAR (μmol/m2/s) | PAR_umol_m2_s | umol/m2/s |
| pH | pH_SU | None |
| TDS (mg/l) | TDS_mg_l | mg/l |
| TSS (mg/l) | TSS_mg_l | mg/l |
| Turbidity (NTU) | Turbidity_NTU | NTU |
| Turbidity (FNU) | Turbidity_FNU | FNU |
| Gage Height (ft) | Gage Height_ft | ft |
| Sensor Depth (ft) | Sensor Depth_ft | ft |
| Water Pressure (psi) | Water Pressure_psi | psi |
| Water Pressure (mmHg) | Water Pressure_mmHg | mmHg |
| Water Temp (C) | Water Temp_C | deg C |
| Water Temp (F) | Water Temp_F | deg F |
The list above can also be viewed in R with the paramsASR dataset, which is included in the package and used for the checks.
paramsASR
#> # A tibble: 36 × 6
#> `Parameter Group` Parameter uom Label `WQX Parameter` `WQX Unit of measure`
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Air Temp Air Temp… deg C Air … Temperature, a… deg C
#> 2 Air Temp Air Temp… deg F Air … Temperature, a… deg F
#> 3 Barometric Press… Air BP_p… psi Air … Barometric pre… psi
#> 4 Barometric Press… Air BP_m… mmHg Air … Barometric pre… mmHg
#> 5 Chlorophyll Chloroph… ug/l Chlo… Chlorophyll a … ug/l
#> 6 Chlorophyll Chloroph… RFU Chlo… Chlorophyll a … RFU
#> 7 Chlorophyll Pheophyt… ug/l Pheo… Pheophytin a ug/l
#> 8 Chlorophyll Pheophyt… RFU Pheo… Pheophytin a RFU
#> 9 CO2 pCO2_ppm ppm pCO2… Partial Pressu… ppm
#> 10 Conductivity Conducti… uS/cm Cond… Conductivity uS/cm
#> # ℹ 26 more rowsChecks performed
The readASRcont() function imports the data and runs a series of checks using the checkASRcont() function. The checks evaluate the following and stops with an informative error if any check fails:
-
Column names: all columns are either
Date,Time,DateTime, or a recognised parameter fromparamsASR. -
Required columns present: either
DateandTime(Format 1) orDateTime(Format 2). -
At least one parameter column: at least one column matches an entry in
paramsASR$Parameter. -
Date format (Format 1 only): all values in
Dateparse successfully withlubridate::ymd(). -
Time format (Format 1 only): all values in
Timeare parseable bylubridate::parse_date_time()in 24-hour, 12-hour AM/PM, or Excel-native formats. -
DateTime format (Format 2 only): all values in
DateTimeare parseable bylubridate::parse_date_time()in 24-hour or 12-hour AM/PM formats. -
No missing values: no
NAin any column. - Numeric parameter columns: all parameter columns contain numeric values.
Example: triggering an error
Adding an unrecognised column causes checkASRcont() to stop immediately. The following examples demonstrate this for both formats.
nms <- names(readxl::read_excel(contpth, n_max = 0))
col_types <- ifelse(nms %in% c("Date", "Time", "DateTime"), "text", "guess")
contdat_raw <- suppressWarnings(
readxl::read_excel(
contpth,
col_types = col_types,
na = c("NA", "na", ""),
guess_max = Inf
)
)
contdat_raw$BadColumn <- 1
checkASRcont(contdat_raw)
#> Running checks on continuous data...
#> Error:
#> ! Checking column names...
#> Please correct the column names or remove: BadColumnOutput format
After passing all checks, readASRcont() returns a data frame with the same structure regardless of input format:
-
DateTime: time-zone-awarePOSIXctcolumn - One numeric column per parameter present in the input file
head(contdat)
#> # A tibble: 6 × 8
#> DateTime `Water Temp_C` DO_pctsat DO_mg_l Conductivity_uS_cm
#> <dttm> <dbl> <dbl> <dbl> <dbl>
#> 1 2024-08-14 13:56:33 24.2 76.9 6.44 410.
#> 2 2024-08-14 13:56:43 24.2 76.7 6.43 410.
#> 3 2024-08-14 13:56:53 24.2 76.6 6.42 410.
#> 4 2024-08-14 13:57:03 24.2 76.5 6.41 410.
#> 5 2024-08-14 13:57:13 24.2 76.3 6.4 409
#> 6 2024-08-14 13:57:23 24.2 76.3 6.39 409.
#> # ℹ 3 more variables: TDS_mg_l <dbl>, Salinity_ppt <dbl>, pH_SU <dbl>
head(contdat2)
#> # A tibble: 6 × 8
#> DateTime `Water Temp_C` DO_pctsat DO_mg_l Conductivity_uS_cm
#> <dttm> <dbl> <dbl> <dbl> <dbl>
#> 1 2024-08-14 13:56:33 24.2 76.9 6.44 410.
#> 2 2024-08-14 13:56:43 24.2 76.7 6.43 410.
#> 3 2024-08-14 13:56:53 24.2 76.6 6.42 410.
#> 4 2024-08-14 13:57:03 24.2 76.5 6.41 410.
#> 5 2024-08-14 13:57:13 24.2 76.3 6.4 409
#> 6 2024-08-14 13:57:23 24.2 76.3 6.39 409.
#> # ℹ 3 more variables: TDS_mg_l <dbl>, Salinity_ppt <dbl>, pH_SU <dbl>Data quality objectives
The data quality objectives file includes various information for the quality control checks applied to each parameter (see the quality control vignette for details). Use readASRdqo() to import the data quality objectives. The function reads the workbook, runs checks via checkASRdqo(), and returns a formatted data frame.
dqodat <- readASRdqo(dqopth)
#> Running checks on data quality objectives...
#> Checking column names... OK
#> Checking all columns present... OK
#> Checking at least one parameter is present... OK
#> Checking parameter format... OK
#> Checking Flag column... OK
#> Checking Rate of Change flags... OK
#> Checking columns for non-numeric values... OK
#>
#> All checks passed!Format requirements
The workbook must contain exactly the following columns (all required; thresholds you do not want to apply should be left blank / NA):
| Column | Description |
|---|---|
Parameter |
Parameter name matching paramsASR$Parameter
|
Flag |
Flag level for the thresholds in the row, either “Fail” or “Suspect” |
GrMin |
Gross range, lower threshold |
GrMax |
Gross range, upper threshold |
Spike |
Spike, absolute step size for a flag |
FlatN |
Flatline, consecutive identical observations for a flag |
FlatDelta |
Flatline, maximum within-run absolute change treated as “identical” for a flag |
RoCN |
Rate of change, multiplier applied to the rolling SD (flag if \|diff\| > SD × RoCN) |
RoCHours |
Rate of change, look-back window length in hours |
Checks performed
The readASRdqo() function imports the data quality objectives and runs a series of checks using the checkASRdqo() function. The checks evaluate the following and stops with an informative error if any check fails:
- Column names: Should include only Parameter, Flag, GrMin, GrMax, Spike, FlatN, FlatDelta, RoCN, and RoCHours
- All columns present: All columns from the previous check should be present
-
At least one parameter is present: At least one parameter in the
Parametercolumn matches theParametercolumn inparamsASR -
Parameter format: All parameters listed in the
Parametercolumn should match those in theParametercolumn inparamsASR -
Flag column: The
Flagcolumn should contain only “Fail” or “Suspect” entries - Rate of Change: No entries for Fail Flag rows
-
Numeric columns: All columns except
ParameterandFlagshould be numeric values
Example: triggering an error
Supplying an unrecognised parameter name fails the parameter format check:
# import the data for the example
dqodat_raw <- suppressWarnings(
readxl::read_excel(dqopth, na = c("NA", "na", ""), guess_max = Inf)
)
# introduce a typo in the Parameter column
dqodat_raw$Parameter[1] <- "WaterTemp"
checkASRdqo(dqodat_raw)
#> Running checks on data quality objectives...
#> Checking column names... OK
#> Checking all columns present... OK
#> Checking at least one parameter is present... OK
#> Error:
#> ! Checking parameter format...
#> Incorrect parameter format: WaterTempOutput format
After passing all checks, readASRdqo() returns a data frame with the columns listed in the format requirements table above, with all threshold columns coerced to numeric.
head(dqodat)
#> # A tibble: 6 × 9
#> Parameter Flag GrMin GrMax Spike FlatN FlatDelta RoCN RoCHours
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Water Temp_C Suspect -0.5 28 1.5 60 0.01 6 25
#> 2 Water Temp_C Fail -1 30 2 100 0.01 NA NA
#> 3 DO_pctsat Suspect 0 100 10 30 0.01 6 25
#> 4 DO_pctsat Fail -1 120 25 60 0.01 NA NA
#> 5 DO_mg_l Suspect 2 16 2 30 0.01 6 25
#> 6 DO_mg_l Fail 1 18 4 60 0.01 NA NAThe remaining functions in AquaSensR can now be used after the continuous data and data quality objectives files are successfully imported.