Check data quality objective accuracy data
Value
accdat
is returned as is if no errors are found, otherwise an informative error message is returned prompting the user to make the required correction to the raw data before proceeding.
Details
This function is used internally within readMWRacc
to run several checks on the input data for completeness and conformance to WQX requirements
The following checks are made:
Column name spelling: Should be the following: Parameter, uom, MDL, UQL, Value Range, Field Duplicate, Lab Duplicate, Field Blank, Lab Blank, Spike/Check Accuracy
Columns present: All columns from the previous check should be present
Column types: All columns should be characters/text, except for MDL and UQL
Value Range
column na check: The character string"na"
should not be in theValue Range
column,"all"
should be used if the entire range appliesUnrecognized characters: Fields describing accuracy checks should not include symbols or text other than \(<=\), \(\leq\), \(<\), \(>=\), \(\geq\), \(>\), \(\pm\),
"%"
,"BDL"
,"AQL"
,"log"
, or"all"
Overlap in
Value Range
column: Entries inValue Range
should not overlap for a parameter (excludes ascending ranges)Gap in
Value Range
column: Entries inValue Range
should not include a gap for a parameter, warning onlyParameter: Should match parameter names in the
Simple Parameter
orWQX Parameter
columns of theparamsMWR
dataUnits: No missing entries in units (
uom
), except pH which can be blankSingle unit: Each unique
Parameter
should have only one type for the units (uom
)Correct units: Each unique
Parameter
should have an entry in the units (uom
) that matches one of the acceptable values in theUnits of measure
column of theparamsMWR
dataEmpty columns: Columns with all missing or NA values will return a warning
Examples
# accuracy path
accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx',
package = 'MassWateR')
# accuracy data with no checks
accdat <- readxl::read_excel(accpth, na = c('NA', ''), col_types = 'text')
accdat <- dplyr::mutate(accdat, dplyr::across(-c(`Value Range`), ~ dplyr::na_if(.x, 'na')))
checkMWRacc(accdat)
#> Running checks on data quality objectives for accuracy...
#> Checking column names... OK
#> Checking all required columns are present... OK
#> Checking column types... OK
#> Checking no "na" in Value Range... OK
#> Checking for text other than <=, ≤, <, >=, ≥, >, ±, %, AQL, BQL, log, or all... OK
#> Checking overlaps in Value Range... OK
#> Checking gaps in Value Range... OK
#> Checking Parameter formats... OK
#> Checking for missing entries for unit (uom)... OK
#> Checking if more than one unit (uom) per Parameter... OK
#> Checking acceptable units (uom) for each entry in Parameter... OK
#> Checking empty columns... OK
#>
#> All checks passed!
#> # A tibble: 12 × 10
#> Parameter uom MDL UQL `Value Range` `Field Duplicate` `Lab Duplicate`
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Water Temp deg C NA NA all <= 1.0 <= 1.0
#> 2 pH s.u. NA NA all <= 0.5 <= 0.5
#> 3 DO mg/l NA NA < 4 < 20% NA
#> 4 DO mg/l NA NA >= 4 < 10% NA
#> 5 Sp Conduct… uS/cm NA NA < 250 < 30% < 30%
#> 6 Sp Conduct… uS/cm NA 10000 >= 250 < 20% < 20%
#> 7 TP mg/l 0.01 NA < 0.05 <= 0.01 <= 0.01
#> 8 TP mg/l 0.01 NA >= 0.05 < 30% < 20%
#> 9 Nitrate mg/l 0.05 NA all < 30% < 20%
#> 10 Ammonia mg/l 0.1 NA all < 30% < 20%
#> 11 E.coli MPN/… 1 NA <50 < log30% < log30%
#> 12 E.coli MPN/… 1 NA >=50 < log20% < log20%
#> # ℹ 3 more variables: `Field Blank` <chr>, `Lab Blank` <chr>,
#> # `Spike/Check Accuracy` <chr>