The skytrackr R package

Abstract

Geolocation by light has revolutionised animal behavioural studies since its inception, allowing for the tracking of animals using sunset and sunrise estimations. Currently, threshold methodologies use fixed threshold values to determine sunrise and sunset from diurnal light measurements, from which the location is determined (deterministic - via equations). Setting this threshold is time demanding as often not suitable for automated processing at volume due to inherent uncertainties such as contamination in the diurnal light measurements due to clouds and rapid movements around the equator. Here, I present an optimization based methodology which leverages the whole or part of the diurnal light profile and a Bayesian based optimization approach to estimate geolocations from light. The use of additional data during twilight, or a full day, and not only a threshold based sunset or sunrise allows for automation of the process. Including more data in location estimates, based upon a physical model of sky illuminance increases accuracy and provides uncertainty measures for each estimate, allowing for post-hoc further quality control. Convenience and robustness of the method comes at the cost of required computational power compared to fast threshold based approaches.

Keywords: geolocation, tracking, movement ecology, flight, illuminance

Background

Geolocation by light has been a popular animal tracking method in ornithology. It has been successfully used to track long distance migration and foraging behaviour of swifts Hedenström et al. (2019) and other migratory birds, and has revolutionised our understanding of long distance movements.

Geolocation by light is a conceptually easy approach. It relies on the shift of sunrise and sunset times to determine longitude and the length of night (or day) between sunrise or sunset to determine latitude deterministically using astronomical equations (Ekstrom 2004).

However, considerable uncertainty is introduced due to the choice of the threshold in light levels used in determining sunset or sunrise times Joo et al. (2020). The choice of a proper threshold and solar angle is critical as it determines the daylength and therefore estimates of latitude. “Increasing mismatch between light intensity threshold and sun elevation threshold results in a decreasing accuracy in latitudinal estimation.” Livotski. Ongoing research therefore often focuses on the use of ancillary data in addition to threshold based methods (REFERENCE), or iterative methods to constrain the search space of optimal solutions better (Merkel et al. 2016). Twilight free methods have been explored by Bindoff et al. (2018) which uses a hidden markov model approach to constrain parameters using the full light level data and not explicitly sunrise and sunset thresholding.

Given an unknown starting position and a reference daylength to calibrate a threshold on this threshold is often estimated by eye. To improve upon this approach leveraging the full transition profile can be used to estimate locations (Ekstrom 2004). Despite these advances in methodology few computational resources are available to estimate positions in bulk with minimal intervention.

Here, I present the {skytrackr} R package which provides a Bayesian optimization based approach using a physical sky illuminance model Janiczek and DeYoung (1987) to estimate geolocation from light levels. This package and methodology provides robust and automated geolocation estimates, addressing issues regarding the choice of a threshold and sun angle parameters while increasing robustness in case of cloudy or otherwise noisy signals.

Implementation

The {skytrackr} R package uses a physical sky illuminance model Janiczek and DeYoung (1987) in combination with Bayesian location parameter estimation (Hartig et al. 2023) approach to maximise the log likelihood between observed (logger) and modelled values. Model parameters are limited to latitude, longitude and a sky condition which scales the sky illuminance response for the day. The approach improves upon the template matching as suggested by Ekstrom (2004) by including a physical component to the model rather than considering a piecewise regression based approach.

#> # A tibble: 288 × 5
#>    logger date       date_time             hour   lux
#>    <chr>  <date>     <dttm>               <dbl> <dbl>
#>  1 CC874  2021-07-11 2021-07-11 00:03:09 0.0525  0.08
#>  2 CC874  2021-07-11 2021-07-11 00:08:09 0.136   0.08
#>  3 CC874  2021-07-11 2021-07-11 00:13:09 0.219   0.08
#>  4 CC874  2021-07-11 2021-07-11 00:18:09 0.302   0.08
#>  5 CC874  2021-07-11 2021-07-11 00:23:09 0.386   0.08
#>  6 CC874  2021-07-11 2021-07-11 00:28:09 0.469   0.08
#>  7 CC874  2021-07-11 2021-07-11 00:33:09 0.552   0.08
#>  8 CC874  2021-07-11 2021-07-11 00:38:09 0.636   0.08
#>  9 CC874  2021-07-11 2021-07-11 00:43:09 0.719   0.08
#> 10 CC874  2021-07-11 2021-07-11 00:48:09 0.802   0.08
#> # ℹ 278 more rows

Modelled and observed illuminance values in log(lux) for three days in early July 2021. Observed values are shown as light grey dots, modelled values for the exact location are shown as a full red line. Longitudinal offset data is shown as the blue full line, while latitudinal offset modelled data for the same period is shown as a full green line.

FIX ME: flipped the optimizer to a Sequential Monte Carlo method (i.e. particle filter method), which seems faster and as accurate - correct below with ref (Hartig et al. 2011).

By default, the DEzs MCMC sampler (Ter Braak and Vrugt 2008) implemented in BayesianTools (Hartig et al. 2023) is used to estimate the posterior distribution of model parameters. For each location I sample the posterior distribution of the parameters (with a thinning factor of 10), and return summary values. I return the median for best estimates of latitude, longitude and sky condition. In addition, the 5-95% quantiles on the geographic position are provided. The package also returns the Gelman-Rubin-Brooks (GRB) metric (Gelman and Rubin 1992), which can be used to assess the convergence of the optimization. Both the quantile ranges and the GRB metric can be used in post-hoc quality control screening.

The method as proposed should perform equally well, or better than, the Ekstrom methodology using template matching (Ekstrom 2004). However, some limitations do apply. The measured light intensities should be translated to physical units in lux to be accurately compared to the physical model. Not all sensors return physical units, with some returning an arbitrary numbers due to data limitations (compression) or hardware choices. In these cases an external calibration to convert arbitrary numbers to lux is necessary. Although performance should improve around the equinoxes, performance will remain poorer than average. However, the addition of geolocation estimation uncertainties as well as GRB metrics make screening outliers easy and consistent. The methodology should compare favourably to the “Twilight-Free” method as described by (Bindoff et al. 2018), as no external packages should be loaded from github.

The package and function descriptions can be found on github at: https://github.com/bluegreen-labs/skytrackr, and can be installed from source using:

if(!require(remotes)){install.packages("remotes")}
remotes::install_github("bluegreen-labs/skytrackr")

Discussion

To demonstrate the functioning of the package a small demo dataset comprised of a single day of light logging near a Common swift nest box in Ghent, Belgium was included (i.e. tag cc874). I will use this data to demonstrate the features of the package. Note that when multiple dates are present all dates will be considered. The package is friendly to the use of R piped (|>) commands. Using all default settings you can just pipe the data in to the skytrackr() function. The returned object will be a data frame containing the best estimate (median) of the longitude and latitude as well as 5-95% quantile as sampled from the posterior parameter distribution.

# normally you would first read in the .lux or .glf file using
# cc874 <- stk_read_lux("cc874.lux")

# load the integrated package data
print(head(skytrackr::cc874))
#>   logger       date           date_time      hour measurement value
#> 1  CC874 2021-07-11 2021-07-11 00:03:09 0.0525000         lux  0.08
#> 2  CC874 2021-07-11 2021-07-11 00:08:09 0.1358333         lux  0.08
#> 3  CC874 2021-07-11 2021-07-11 00:13:09 0.2191667         lux  0.08
#> 4  CC874 2021-07-11 2021-07-11 00:18:09 0.3025000         lux  0.08
#> 5  CC874 2021-07-11 2021-07-11 00:23:09 0.3858333         lux  0.08
#> 6  CC874 2021-07-11 2021-07-11 00:28:09 0.4691667         lux  0.08

# skytrackr allows for piped data input
# when using the default SMC particle filter
# it is key to specify a known starting location
# to optimally use this method
location <- skytrackr::cc874 |>
  filter(
    (value > 0.09 & value < 400)
    ) |>
  skytrackr::skytrackr(
      iterations = 100,
      particles = 30,
      start_location = c(51.08, 3.73),
      tolerance = 11,
      plot = FALSE
    )
#> - Estimating locations from light (lux) profiles for logger: CC874!

# print the optimized parameters (i.e. the locations)
print(location)
#>   latitude longitude sky_conditions latitude_qt_50 longitude_qt_50
#> 1 51.07991  3.730128       1.808316       51.07996        3.730038
#>   latitude_qt_5 latitude_qt_95 longitude_qt_5 longitude_qt_95
#> 1      51.07991       51.08007       3.729858        3.730151
#>   sky_conditions_qt_50 grb  n       date equinox
#> 1             2.071038  NA 16 2021-07-11      NA

Note that the returned location estimates are very close to the original location of the colony (51.8 N/ 3.73 E). These parameters can be fed back into the original (underlying) algorithm to calculate the full diurnal profile. Doing so allows you to compare the modelled (fitted) data to the observed values.

Figure showing the fitted model results (full black line) from optimized parameters (latitude, longitude, sky conditions). Default parameters only use a limited twilight set of parameters. From the observed values (coloured dots) only the ‘True’ values were included in the estimation of the model parameters.

This visually confirms the good fit of the model data with the observed values. Note that only a subset of the data is used in model optimization (as defined by the range parameter). You can assess the convergence of the model using the grb values (i.e the Gelman-Rubin-Brooks metric, Gelman and Rubin (1992)), which should be lower than 1.1 to indicate the convergence of the 3 chains used in the MCMC DEz algorithm. This is further corroborated by considering the upper and lower quantiles and their spread, which are narrowly defined for both latitude and longitude.

Sensitivity analysis

As a sensitivity analysis I set values to low (effectively 0) values, for a range of 0 - 75% of the dataset. For these subsets I estimate both latitude and longitude using either the full diurnal profile or the default settings using twilight values only. In addition, the same routine was used to introduce bright moments during nighttime (250 lux).

Sensitivity analysis show that especially nighttime bright moments disrupt the positional accuracy of geolocation estimates. While using performance declines steadily when considering the full profile at higher noise levels (c-d).

Limitations

The package does not mitigate uncertainty introduced by occlusion of the sensor by feathers, or birds roosting in secluded places. However, sensitivity analysis have shown that these factors might have less effect than nighttime illumination. Regardless, performance of the optimization method remains functionally optimal in species such as swifts, which continuously fly during their non-breeding season []. Data assimilation by using pressure measurements at stationary moments [] could be implemented but is currently not. Further improvements can be implemented by masking of locations during optimization (based on step distance and or geographic location, such as a land water mask), and are currently not considered.

Conclusion

The {skytrackr} R package provides an optimization based methodology which leverages the whole or part of the diurnal light profile and a Bayesian based optimization approach to estimate geolocations from light. The use of additional data during twilight, or a full day, allows for automation of the process over threshold based methods. Including more data in location estimates, based upon a physical model of sky illuminance, increases accuracy. The Bayesian framework also provides uncertainty measures for each estimate, allowing for post-hoc quality control.

Availability and requirements

Project name: skytrackr
Project homepage: https://github.com/bluegreen-labs/skytrackr
Operating system(s): Platform independent
Programming Language: R
License: AGPLv3

References

Bindoff, Aidan D., Simon J. Wotherspoon, Christophe Guinet, and Mark A. Hindell. 2018. “Twilight‐free Geolocation from Noisy Light Data.” Edited by David Orme. Methods in Ecology and Evolution 9 (5): 1190–98. https://doi.org/10.1111/2041-210X.12953.

Bridge, Eli S., Jeffrey F. Kelly, Andrea Contina, Richard M. Gabrielson, Robert B. MacCurdy, and David W. Winkler. 2013. “Advances in Tracking Small Migratory Birds: A Technical Review of Light-Level Geolocation: Light-Level Geolocation Dataloggers.” Journal of Field Ornithology 84 (2): 121–37. https://doi.org/10.1111/jofo.12011.

Ekstrom, Philip A. 2004. “An Advance in Geolocation by Light.” National Institute of Polar Research, no. 58: 210–26.

Gelman, Andrew, and Donald B. Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” Statistical Science 7 (4). https://doi.org/10.1214/ss/1177011136.

Hartig, Florian, Justin M. Calabrese, Björn Reineking, Thorsten Wiegand, and Andreas Huth. 2011. “Statistical Inference for Stochastic Simulation Models - Theory and Application: Inference for Stochastic Simulation Models.” Ecology Letters 14 (8): 816–27. https://doi.org/10.1111/j.1461-0248.2011.01640.x.

Hartig, Florian, Francesco Minunno, Stefan Paul, David Cameron, Tankred Ott, and Maximilian Pichler. 2023. “BayesianTools: General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics.” https://cran.r-project.org/web/packages/BayesianTools/.

Hedenström, Anders. 2016. “Annual 10-Month Aerial Life Phase in the Common Swift Apus Apus.” Current Biology 26: 3066–70.

Hedenström, Anders, Gabriel Norevik, Giovanni Boano, Arne Andersson, Johan Bäckman, and Susanne Åkesson. 2019. “Flight Activity in Pallid Swifts Apus Pallidus During the Non-Breeding Period.” Journal of Avian Biology 50 (2): 1–8. https://doi.org/10.1111/jav.01972.

Hufkens, Koen. 2022. “Skylight: A Simple Sky Illuminance Model.” https://doi.org/10.5281/zenodo.7222045.

Janiczek, P. M., and J. A. DeYoung. 1987. Computer Programs for Sun and Moon Illuminance With Contingent Tables and Diagrams. Vol. 171. Washington, USA: United States Naval Observatory. https://archive.org/details/DTIC_ADA182110/page/n21/mode/2up.

Joo, Rocío, Matthew E. Boone, Thomas A. Clay, Samantha C. Patrick, Susana Clusella‐Trullas, and Mathieu Basille. 2020. “Navigating Through the <Span Style="font-Variant:small-Caps;">r</Span> Packages for Movement.” Edited by Luca Börger. Journal of Animal Ecology 89 (1): 248–67. https://doi.org/10.1111/1365-2656.13116.

Kearsley, Lyndon, Nathan Ranc, Christoph M. Meier, Carlos Miguel Pacheco, Pedro Henriques, Gonçalo Elias, Martin Poot, et al. 2022. “The Aeroecology of Atmospheric Convergence Zones: The Case of Pallid Swifts.” Oikos, April. https://doi.org/10.1111/oik.08594.

Merkel, Benjamin, Richard A. Phillips, Sébastien Descamps, Nigel G. Yoccoz, Børge Moe, and Hallvard Strøm. 2016. “A Probabilistic Algorithm to Process Geolocation Data.” Movement Ecology 4 (1): 26. https://doi.org/10.1186/s40462-016-0091-8.

Norevik, Gabriel, Susanne Åkesson, Tom Artois, Natalie Beenaerts, Greg Conway, Brian Cresswell, Ruben Evens, Ian Henderson, Frédéric Jiguet, and Anders Hedenström. 2020. “Wind‐associated Detours Promote Seasonal Migratory Connectivity in a Flapping Flying Long‐distance Avian Migrant.” Edited by Jason Chapman. Journal of Animal Ecology 89 (2): 635–46. https://doi.org/10.1111/1365-2656.13112.

Norevik, Gabriel, Giovanni Boano, Anders Hedenström, Roberto Lardelli, Felix Liechti, and Susanne Åkesson. 2019. “Highly Mobile Insectivorous Swifts Perform Multiple Intra‐tropical Migrations to Exploit an Asynchronous African Phenology.” Oikos 128 (5): 640–48. https://doi.org/10.1111/oik.05531.

Ter Braak, Cajo J. F., and Jasper A. Vrugt. 2008. “Differential Evolution Markov Chain with Snooker Updater and Fewer Chains.” Statistics and Computing 18 (4): 435–46. https://doi.org/10.1007/s11222-008-9104-9.

an optimization based approach to geolocation by light

Koen Hufkens