1. INTRODUCTION
Recent reductions in the size and cost of autonomous data collection
equipment have allowed ecologists to better and more efficiently survey
their study sites (Acevedo &
Villanueva-Rivera, 2006). Much work has been done to examine the
benefits of using camera trap networks to detect shy and retiring
species whose detection probabilities greatly decrease in the presence
of human researchers (O’Connell et al., 2010). However, many species for
which remote surveying techniques are optimal are difficult to properly
monitor with camera traps due to their small body sizes and / or
preference for heavy vegetative cover (Newey et al., 2015). A number of
these species, particularly interior forest birds, are much easier to
detect via acoustic monitoring techniques due to their frequent,
far-carrying vocalizations, and battery-operated automated recording
units (ARUs) have recently become a cost-effective option for
researchers working with these species (Brandes, 2008). ARUs can operate
in the field for much longer periods of time than humans observers can
(several days or weeks in many cases), efficiently and safely survey
remote areas early in the morning and late at night, and, like camera
traps, minimize disturbance to sensitive species. An additional benefit
of audio recorders relative to camera traps is that audio recorders have
a wider range of detectability than camera traps since they do not
require direct line of sight, which increases the area of coverage and
the likelihood of detecting rare species. However, in order to use audio
recordings from ARUs, vocalizations from the target species must be
detected among large quantities of survey audio. Efficiently and
reliably identifying these detections presents a major challenge when
developing a data-processing pipeline. Although this task is still a
non-trivial consideration in developing a study design, recent
advancements in machine learning (ML) classification techniques, coupled
with dramatic increases in the availability and accessibility of
powerful hardware, have made this process easier than ever (Kahl et al.,
2019). We strongly believe that the application of machine learning
techniques to the processing of large quantities of automated acoustic
event detection data will prove to be a transformative development in
the fields of ecology and conservation, allowing researchers to tackle
biological questions that have previously been impractical to answer.
Several life history characteristics of tinamous (Tinamidae), a group of
terrestrial birds that occur widely in the Neotropics, make them superb
candidates for field-testing this type of audio processing pipeline.
Although a few species in this family occupy open habitats, most show a
high affinity for interior forest areas with thick vegetative cover
(Bertelli & Tubaro, 2002). They are far more often heard than seen, and
some species vocalize prolifically as part of the dawn and dusk choruses
(Pérez‐Granados et al., 2020). This preference for interior forest,
along with their large body sizes and terrestrial nature, makes tinamous
inordinately susceptible to the effects of anthropogenic habitat change,
both in terms of outright habitat loss and to increased human hunting
pressure in fragmented forest patches near populated areas (Thornton et
al., 2012). Intensive life history research in the coming years will be
critical to conservation of tinamous and their habitats, and autonomous
recording has the potential to revolutionize this line of inquiry.
Here we present the preliminary results of an ongoing field study that
involves deploying ARUs at lowland Amazonian forest sites in Madre de
Dios, Peru. Although this region has tentatively among the highest
levels of tinamou alpha diversity in the Neotropics (11 co-occurring
species: eBird, 2017), there is currently a lack of research into which
biological and ecological factors allow such high degrees of alpha
diversity. We collected environmental audio of each day’s dawn and dusk
choruses and designed a data pipeline that uses a machine learning (ML)
audio classifier to identify tinamou vocalization events in the audio
data and organize the detections into a spatiotemporal database for
future use in producing occupancy models for the target species. To our
knowledge, this technology has not previously been used to conduct
community-level surveying for tinamous and represents a promising
alternative to camera traps and more traditional point-count surveying
as a means of studying elusive yet highly vocal bird taxa.