Check for common errors in respirometry data

inspect() scans and subsets a data.frame object for errors that may affect the use of various functions in respR. By default, the function scans only the first 2 columns of a data frame and assumes that the first column is time data. A plot of the data is also produced, including a rolling regression plot using a width of floor(0.1 * nrow([data frame]) for a quick visual inspection of the rate pattern (or stability) of the data. Note that rates for oxygen uptake are returned as negative and plotted on a reverse axis; higher oxygen uptake rates are higher on the rate plot (more negative).

inspect(df, time = NULL, oxygen = NULL, plot = TRUE)

Arguments

df	data.frame object. Accepts any object of class `data.frame`.
time	numeric vector. Defaults to NULL. This specifies the column number(s) of the time data to subset.
oxygen	numeric vector. Defaults to NULL. This specifies the column number(s) of the oxygen data to subset.
plot	logical. Defaults to TRUE. Plots the data for quick visual diagnosis. Works only when the subset dataframe contains exactly 2 columns.

Value

A list object of class inspect.

Details

Time columns are checked for NA/NaN values, sequential time, duplicate time and evenly-spaced time data. Oxygen columns are simply checked for NA/NaN data. Once data checks are complete, the function produces a list object which may be directly loaded into calc_rate(), calc_rate.bg(), calc_rate.ft(), and auto_rate() for further analyses.

If you wish to scan more than two columns, you can do so by specifying the time and oxygen arguments to select specific columns of a large data frame. However, the function will not produce a plot. Thus, you may inspect flowthrough respirometry data, which usually contains oxygen values for inflow and outflow, by specifying a vector of column numbers, e.g. oxygen = c(2,3).

Failed Checks

It should be noted most of these checks are for exploratory purposes only; they help diagnose potential issues with the data. For instance, very long experiments could have had sensor dropouts the user is completely unaware of. Other issues are not issues at all - for instance, an uneven time warning can result from using decimalised minutes, which is a completely valid time metric.

If some of these checks fail, it should generally not hinder analysis of the data. respR has been coded to rely on linear regression on exact data values, and not make assumptions about data spacing. Therefore issues such as missing or NA/NaN values, duplicate values, or uneven time spacing should not cause any erroneous results, as long as they do not occur over large regions of the data. The only major potential issue is if time data are not sequential. This could cause unknown results and incorrect rates to be returned.

Examples

# automatically inspect first 2 columns:
data("sardine.rd")
#> Warning: data set ‘sardine.rd’ not found
inspect(sardine.rd)
#> Error in inspect(sardine.rd): could not find function "inspect"

data("urchins.rd")
#> Warning: data set ‘urchins.rd’ not found
inspect(urchins.rd)
#> Error in inspect(urchins.rd): could not find function "inspect"

# inspect specific columns:
data("urchins.rd")
#> Warning: data set ‘urchins.rd’ not found
inspect(urchins.rd, time = 1, oxygen = 4)
#> Error in inspect(urchins.rd, time = 1, oxygen = 4): could not find function "inspect"

# inspect multiple columns
data("urchins.rd")
#> Warning: data set ‘urchins.rd’ not found
x <- inspect(urchins.rd, time = 1, oxygen = c(2:12))
#> Error in inspect(urchins.rd, time = 1, oxygen = c(2:12)): could not find function "inspect"
print(x)
#> Error in print(x): object 'x' not found
x$list$time.min  # check position of errors in data frame
#> Error in eval(expr, envir, enclos): object 'x' not found

# inspect flowthrough data
data("flowthrough.rd")
#> Warning: data set ‘flowthrough.rd’ not found
x <- inspect(flowthrough.rd, 1, c(2,3))
#> Error in inspect(flowthrough.rd, 1, c(2, 3)): could not find function "inspect"
x
#> Error in eval(expr, envir, enclos): object 'x' not found