---
title: "Advanced usage"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Advanced usage}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

In most situations, you will most likely not be dealing with a single FPOD file,
but a large number of files generated by many different pods sitting at different
sites. This vignette demonstrates a few useful tips and tricks to deal with multiple
files.

# Reading multiple files

First, load the package and get the path to some example data.

```{r setup}
library(fpod)
fn <- fp_example("gullars_period1.FP3") # <- example FP3 file
```

Let's imagine a scenario where we have 5 different FP3 files that we want to read. Since the `fpod` package only bundles one example data set, for the purposes of this vignette, 
we'll "simulate" the multiple files situation by re-reading that same file multiple times, and running our combination logic on the duplicated data set.

```{r}
fpod_files <- rep(fn, 5) # simulate 5 FPOD files
basename(fpod_files)

```

As we can see, this gives us a character vector with five filenames. Normally, 
we wouldn't want to manually specify our list of files to read in a variable 
like this, but rather generate the list automatically by calling `list.files()` 
on the directory where we've stored the FP3 files, e.g.:

```{r}
#fpod_files <- list.files("/users/andre/projects/fpod_troms/data", pattern = "FP3$", full = TRUE, recursive = TRUE)
```
This would give us a list of all FP3 files that are stored in the directory specified, 
including those that might be tucked away in subfolders, and it would automatically
detect any files that may have been added since the last time the code was run.

But anyway, going back to the list we're using for this vignette - we can now use, for example, `lapply` to easily read our list of files into R.

```{r}
dat <- lapply(fpod_files, fp_read)
str(dat, 2)
```

Alternatively, we can pre-declare a list of the right length and populate it
element by element via a conventional for-loop. It is not recommended to create
an empty vector and add to it iteratively using `c()`, although it might be one intuitive 
way of doing this. The reason is that this would make R copy the data internally
every time `c()` is called (i.e. in each iteration of the loop), and so it is
highly inefficient to do this. For small data sets, it probably won't be noticable,
but the computational cost increases with increasing data size.

```{r}
dat <- vector(mode = "list", length = length(fpod_files))
for (i in 1:length(fpod_files)) {
    dat[[i]] <- fp_read(fpod_files[[i]])
}
str(dat, 2)
```
Personally, I prefer the `lapply` approach because I think it is cleaner, less
verbose and possibly more computationally efficient. But both approaches 
demonstrated here are perfectly valid, as is evident from the identical output
from `str` above.

# Combining data from multiple files
The next step depends on what we want to do with the data. First, let's say we want
to just simply combine all NBHF clicks into one potentially enormous data.table. 

This is pretty simple:
```{r}
library(data.table)
nbhf <- lapply(dat, function(x) {
    x$clicks[species == "NBHF"]
}) |> rbindlist()
```
Let's have a look:
```{r}
nbhf[, 1:7] # show only first 7 cols for brevity
```

In some cases, you may run into trouble with empty FP3 files (files
with no clicks registered), e.g. if the POD has been become activated/deactivated
due to the tilt trigger, bad batteries, or has restarted for some other reason. Those 
files should probably be deleted, but if they aren't, we can add a check in the
body of the lapply-loop to handle them gracefully.
```{r}
nbhf <- lapply(dat, function(x) {
    clicks <- x$clicks[species == "NBHF"]
    if (nrow(clicks) == 0) {
        clicks <- clicks[0L]
    }
    clicks
}) |> rbindlist()
```

In many cases however, you might want to summarize detection-positive-minutes 
(DPMs) for all KERNO-F categories (NBHF, OtherCet and Sonar), and buzz-positive-
minutes (BPMs) for NBHF clicks. Here's one way we could do that, again using `lapply`:
```{r}
dpm <- lapply(dat, function(x) {
    nbhf <- x$clicks[species == "NBHF"]
    dolphins <- x$clicks[species == "OtherCet"]
    sonar <- x$clicks[species == "Sonar"]
    nbhf$buzz <- fp_find_buzzes(nbhf)
    nbhf_dpm <- fp_summarize(nbhf)
    dol_dpm <- fp_summarize(dolphins)
    sonar_dpm <- fp_summarize(sonar)

    # checks to handle cases of no detections for each category
    if (all(is.na(nbhf_dpm$pod))) nbhf_dpm[, pod := x$header$pod_id]
    if (all(is.na(dol_dpm$pod))) dol_dpm[, pod := x$header$pod_id]
    if (all(is.na(sonar_dpm$pod))) sonar_dpm[, pod := x$header$pod_id]
    
    dpm <- merge(nbhf_dpm, dol_dpm, by = c("pod", "time"), suffix = c("", "_dol"))
    dpm <- merge(dpm, sonar_dpm, by = c("pod", "time"), suffix = c("", "_sonar"))
    dpm[, -c("bpm_dol", "bpm_sonar")]
}) |> rbindlist()
dpm
```

Now we have our FPOD data in a format that is suited for plotting/analyses!