--- title: "Advanced usage" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Advanced usage} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` In most situations, you will most likely not be dealing with a single FPOD file, but a large number of files generated by many different pods sitting at different sites. This vignette demonstrates a few useful tips and tricks to deal with multiple files. # Reading multiple files First, load the package and get the path to some example data. ```{r setup} library(fpod) fn <- fp_example("gullars_period1.FP3") # <- example FP3 file ``` Let's imagine a scenario where we have 5 different FP3 files that we want to read. Since the `fpod` package only bundles one example data set, for the purposes of this vignette, we'll "simulate" the multiple files situation by re-reading that same file multiple times, and running our combination logic on the duplicated data set. ```{r} fpod_files <- rep(fn, 5) # simulate 5 FPOD files basename(fpod_files) ``` As we can see, this gives us a character vector with five filenames. Normally, we wouldn't want to manually specify our list of files to read in a variable like this, but rather generate the list automatically by calling `list.files()` on the directory where we've stored the FP3 files, e.g.: ```{r} #fpod_files <- list.files("/users/andre/projects/fpod_troms/data", pattern = "FP3$", full = TRUE, recursive = TRUE) ``` This would give us a list of all FP3 files that are stored in the directory specified, including those that might be tucked away in subfolders, and it would automatically detect any files that may have been added since the last time the code was run. But anyway, going back to the list we're using for this vignette - we can now use, for example, `lapply` to easily read our list of files into R. ```{r} dat <- lapply(fpod_files, fp_read) str(dat, 2) ``` Alternatively, we can pre-declare a list of the right length and populate it element by element via a conventional for-loop. It is not recommended to create an empty vector and add to it iteratively using `c()`, although it might be one intuitive way of doing this. The reason is that this would make R copy the data internally every time `c()` is called (i.e. in each iteration of the loop), and so it is highly inefficient to do this. For small data sets, it probably won't be noticable, but the computational cost increases with increasing data size. ```{r} dat <- vector(mode = "list", length = length(fpod_files)) for (i in 1:length(fpod_files)) { dat[[i]] <- fp_read(fpod_files[[i]]) } str(dat, 2) ``` Personally, I prefer the `lapply` approach because I think it is cleaner, less verbose and possibly more computationally efficient. But both approaches demonstrated here are perfectly valid, as is evident from the identical output from `str` above. # Combining data from multiple files The next step depends on what we want to do with the data. First, let's say we want to just simply combine all NBHF clicks into one potentially enormous data.table. This is pretty simple: ```{r} library(data.table) nbhf <- lapply(dat, function(x) { x$clicks[species == "NBHF"] }) |> rbindlist() ``` Let's have a look: ```{r} nbhf[, 1:7] # show only first 7 cols for brevity ``` In some cases, you may run into trouble with empty FP3 files (files with no clicks registered), e.g. if the POD has been become activated/deactivated due to the tilt trigger, bad batteries, or has restarted for some other reason. Those files should probably be deleted, but if they aren't, we can add a check in the body of the lapply-loop to handle them gracefully. ```{r} nbhf <- lapply(dat, function(x) { clicks <- x$clicks[species == "NBHF"] if (nrow(clicks) == 0) { clicks <- clicks[0L] } clicks }) |> rbindlist() ``` In many cases however, you might want to summarize detection-positive-minutes (DPMs) for all KERNO-F categories (NBHF, OtherCet and Sonar), and buzz-positive- minutes (BPMs) for NBHF clicks. Here's one way we could do that, again using `lapply`: ```{r} dpm <- lapply(dat, function(x) { nbhf <- x$clicks[species == "NBHF"] dolphins <- x$clicks[species == "OtherCet"] sonar <- x$clicks[species == "Sonar"] nbhf$buzz <- fp_find_buzzes(nbhf) nbhf_dpm <- fp_summarize(nbhf) dol_dpm <- fp_summarize(dolphins) sonar_dpm <- fp_summarize(sonar) # checks to handle cases of no detections for each category if (all(is.na(nbhf_dpm$pod))) nbhf_dpm[, pod := x$header$pod_id] if (all(is.na(dol_dpm$pod))) dol_dpm[, pod := x$header$pod_id] if (all(is.na(sonar_dpm$pod))) sonar_dpm[, pod := x$header$pod_id] dpm <- merge(nbhf_dpm, dol_dpm, by = c("pod", "time"), suffix = c("", "_dol")) dpm <- merge(dpm, sonar_dpm, by = c("pod", "time"), suffix = c("", "_sonar")) dpm[, -c("bpm_dol", "bpm_sonar")] }) |> rbindlist() dpm ``` Now we have our FPOD data in a format that is suited for plotting/analyses!