vignettes/introduction.Rmd
introduction.Rmd
Geolocation by light has revolutionised animal behavioural studies since its inception, allowing for the tracking of animals using sunset and sunrise estimations. Currently, threshold methodologies use fixed threshold values to determine sunrise and sunset from diurnal light measurements, from which the location is determined (deterministic - via equations). Setting this threshold is time demanding as often not suitable for automated processing at volume due to inherent uncertainties such as contamination in the diurnal light measurements due to clouds and rapid movements around the equator. Here, I present an optimization based methodology which leverages the whole or part of the diurnal light profile and a Bayesian based optimization approach to estimate geolocations from light. The use of additional data during twilight, or a full day, and not only a threshold based sunset or sunrise allows for automation of the process. Including more data in location estimates, based upon a physical model of sky illuminance increases accuracy and provides uncertainty measures for each estimate, allowing for post-hoc further quality control. Convenience and robustness of the method comes at the cost of required computational power compared to fast threshold based approaches.
Keywords: geolocation, tracking, movement ecology, flight, illuminance
Geolocation by light has been a popular animal tracking method in ornithology. It has been successfully used to track long distance migration and foraging behaviour of swifts Hedenström et al. (2019) and other migratory birds, and has revolutionised our understanding of long distance movements.
Geolocation by light is a conceptually easy approach. It relies on the shift of sunrise and sunset times to determine longitude and the length of night (or day) between sunrise or sunset to determine latitude deterministically using astronomical equations (Ekstrom 2004).
However, considerable uncertainty is introduced due to the choice of the threshold in light levels used in determining sunset or sunrise times Joo et al. (2020). The choice of a proper threshold and solar angle is critical as it determines the daylength and therefore estimates of latitude. “Increasing mismatch between light intensity threshold and sun elevation threshold results in a decreasing accuracy in latitudinal estimation.” Livotski. Ongoing research therefore often focuses on the use of ancillary data in addition to threshold based methods (REFERENCE), or iterative methods to constrain the search space of optimal solutions better (Merkel et al. 2016). Twilight free methods have been explored by Bindoff et al. (2018) which uses a hidden markov model approach to constrain parameters using the full light level data and not explicitly sunrise and sunset thresholding.
Given an unknown starting position and a reference daylength to calibrate a threshold on this threshold is often estimated by eye. To improve upon this approach leveraging the full transition profile can be used to estimate locations (Ekstrom 2004). Despite these advances in methodology few computational resources are available to estimate positions in bulk with minimal intervention.
Here, I present the {skytrackr} R package which provides a Bayesian optimization based approach using a physical sky illuminance model Janiczek and DeYoung (1987) to estimate geolocation from light levels. This package and methodology provides robust and automated geolocation estimates, addressing issues regarding the choice of a threshold and sun angle parameters while increasing robustness in case of cloudy or otherwise noisy signals.
The {skytrackr} R package uses a physical sky illuminance model Janiczek and DeYoung (1987) in combination with Bayesian location parameter estimation (Hartig et al. 2023) approach to maximise the log likelihood between observed (logger) and modelled values. Model parameters are limited to latitude, longitude and a sky condition which scales the sky illuminance response for the day. The approach improves upon the template matching as suggested by Ekstrom (2004) by including a physical component to the model rather than considering a piecewise regression based approach.
#> # A tibble: 288 × 5
#> logger date date_time hour lux
#> <chr> <date> <dttm> <dbl> <dbl>
#> 1 CC874 2021-07-11 2021-07-11 00:03:09 0.0525 0.08
#> 2 CC874 2021-07-11 2021-07-11 00:08:09 0.136 0.08
#> 3 CC874 2021-07-11 2021-07-11 00:13:09 0.219 0.08
#> 4 CC874 2021-07-11 2021-07-11 00:18:09 0.302 0.08
#> 5 CC874 2021-07-11 2021-07-11 00:23:09 0.386 0.08
#> 6 CC874 2021-07-11 2021-07-11 00:28:09 0.469 0.08
#> 7 CC874 2021-07-11 2021-07-11 00:33:09 0.552 0.08
#> 8 CC874 2021-07-11 2021-07-11 00:38:09 0.636 0.08
#> 9 CC874 2021-07-11 2021-07-11 00:43:09 0.719 0.08
#> 10 CC874 2021-07-11 2021-07-11 00:48:09 0.802 0.08
#> # ℹ 278 more rows
FIX ME: flipped the optimizer to a Sequential Monte Carlo method (i.e. particle filter method), which seems faster and as accurate - correct below with ref (Hartig et al. 2011).
By default, the DEzs MCMC sampler (Ter Braak and Vrugt 2008) implemented in BayesianTools (Hartig et al. 2023) is used to estimate the posterior distribution of model parameters. For each location I sample the posterior distribution of the parameters (with a thinning factor of 10), and return summary values. I return the median for best estimates of latitude, longitude and sky condition. In addition, the 5-95% quantiles on the geographic position are provided. The package also returns the Gelman-Rubin-Brooks (GRB) metric (Gelman and Rubin 1992), which can be used to assess the convergence of the optimization. Both the quantile ranges and the GRB metric can be used in post-hoc quality control screening.
The method as proposed should perform equally well, or better than, the Ekstrom methodology using template matching (Ekstrom 2004). However, some limitations do apply. The measured light intensities should be translated to physical units in lux to be accurately compared to the physical model. Not all sensors return physical units, with some returning an arbitrary numbers due to data limitations (compression) or hardware choices. In these cases an external calibration to convert arbitrary numbers to lux is necessary. Although performance should improve around the equinoxes, performance will remain poorer than average. However, the addition of geolocation estimation uncertainties as well as GRB metrics make screening outliers easy and consistent. The methodology should compare favourably to the “Twilight-Free” method as described by (Bindoff et al. 2018), as no external packages should be loaded from github.
The package and function descriptions can be found on github at: https://github.com/bluegreen-labs/skytrackr, and can be installed from source using:
if(!require(remotes)){install.packages("remotes")}
remotes::install_github("bluegreen-labs/skytrackr")
To demonstrate the functioning of the package a small demo dataset
comprised of a single day of light logging near a Common swift nest box
in Ghent, Belgium was included (i.e. tag cc874). I will use this data to
demonstrate the features of the package. Note that when multiple dates
are present all dates will be considered. The package is friendly to the
use of R piped (|>) commands. Using all default settings you can just
pipe the data in to the skytrackr()
function. The returned
object will be a data frame containing the best estimate (median) of the
longitude and latitude as well as 5-95% quantile as sampled from the
posterior parameter distribution.
# normally you would first read in the .lux or .glf file using
# cc874 <- stk_read_lux("cc874.lux")
# load the integrated package data
print(head(skytrackr::cc874))
#> logger date date_time hour measurement value
#> 1 CC874 2021-07-11 2021-07-11 00:03:09 0.0525000 lux 0.08
#> 2 CC874 2021-07-11 2021-07-11 00:08:09 0.1358333 lux 0.08
#> 3 CC874 2021-07-11 2021-07-11 00:13:09 0.2191667 lux 0.08
#> 4 CC874 2021-07-11 2021-07-11 00:18:09 0.3025000 lux 0.08
#> 5 CC874 2021-07-11 2021-07-11 00:23:09 0.3858333 lux 0.08
#> 6 CC874 2021-07-11 2021-07-11 00:28:09 0.4691667 lux 0.08
# skytrackr allows for piped data input
# when using the default SMC particle filter
# it is key to specify a known starting location
# to optimally use this method
location <- skytrackr::cc874 |>
filter(
(value > 0.09 & value < 400)
) |>
skytrackr::skytrackr(
iterations = 100,
particles = 30,
start_location = c(51.08, 3.73),
tolerance = 11,
plot = FALSE
)
#> - Estimating locations from light (lux) profiles for logger: CC874!
# print the optimized parameters (i.e. the locations)
print(location)
#> latitude longitude sky_conditions latitude_qt_50 longitude_qt_50
#> 1 51.07991 3.730128 1.808316 51.07996 3.730038
#> latitude_qt_5 latitude_qt_95 longitude_qt_5 longitude_qt_95
#> 1 51.07991 51.08007 3.729858 3.730151
#> sky_conditions_qt_50 grb n date equinox
#> 1 2.071038 NA 16 2021-07-11 NA
Note that the returned location estimates are very close to the original location of the colony (51.8 N/ 3.73 E). These parameters can be fed back into the original (underlying) algorithm to calculate the full diurnal profile. Doing so allows you to compare the modelled (fitted) data to the observed values.
This visually confirms the good fit of the model data with the
observed values. Note that only a subset of the data is used in model
optimization (as defined by the range
parameter). You can
assess the convergence of the model using the grb
values
(i.e the Gelman-Rubin-Brooks metric, Gelman and
Rubin (1992)), which should be lower than 1.1 to indicate the
convergence of the 3 chains used in the MCMC DEz algorithm. This is
further corroborated by considering the upper and lower quantiles and
their spread, which are narrowly defined for both latitude and
longitude.
As a sensitivity analysis I set values to low (effectively 0) values, for a range of 0 - 75% of the dataset. For these subsets I estimate both latitude and longitude using either the full diurnal profile or the default settings using twilight values only. In addition, the same routine was used to introduce bright moments during nighttime (250 lux).
Sensitivity analysis show that especially nighttime bright moments disrupt the positional accuracy of geolocation estimates. While using performance declines steadily when considering the full profile at higher noise levels (c-d).
The package does not mitigate uncertainty introduced by occlusion of the sensor by feathers, or birds roosting in secluded places. However, sensitivity analysis have shown that these factors might have less effect than nighttime illumination. Regardless, performance of the optimization method remains functionally optimal in species such as swifts, which continuously fly during their non-breeding season []. Data assimilation by using pressure measurements at stationary moments [] could be implemented but is currently not. Further improvements can be implemented by masking of locations during optimization (based on step distance and or geographic location, such as a land water mask), and are currently not considered.
The {skytrackr} R package provides an optimization based methodology which leverages the whole or part of the diurnal light profile and a Bayesian based optimization approach to estimate geolocations from light. The use of additional data during twilight, or a full day, allows for automation of the process over threshold based methods. Including more data in location estimates, based upon a physical model of sky illuminance, increases accuracy. The Bayesian framework also provides uncertainty measures for each estimate, allowing for post-hoc quality control.