Attribution: Photo by Elianne Dipp on Pexels
Over the last decade, work on animal communication has reached a convergence as the growing availability of large-scale behavioral and bioacoustic data merges with relatively recent advances in machine learning.
As a result, we’re starting to see the study of animal communication shift from narrowly defined, species-specific, and hypothesis-limited analyses toward data-driven discovery at scale.
We refer to this convergence as Animal Language Processing (ALP), an approach that uses AI to study non-human communication at scale. The term was first used in a presentation given in Japanese by our senior AI researcher Masato Hagiwara in 2024, and it offers a useful framework for describing and organizing the shift we’re observing in the field.
How Animal Language Processing Emerged
For decades, ethologists, field biologists, and bioacousticians have built the foundations that make studying animal signals both possible and worthwhile.
They’ve conducted long-term field studies that link vocal behavior to social context, ecology, and individual life histories. They’ve conducted playback experiments that test how animals respond to specific calls, sequences, or contexts. These experiments are vital for grounding any AI-based predictions in real behavior.
We’ve also seen methodological innovations in recording, tagging, and annotation, from autonomous recorders to animal-borne sensors.
This work has resulted in incredible breakthroughs, like discovering that elephants call out to each other using distinct names and that female beluga whales have a “come back!” call to coax their curious calves back from wandering too far.
That’s all to say that this emergent field builds on decades of accumulated knowledge, methods, and data, and expands what is possible with modern AI tools.
What’s Changed: Data, Models, and Scale
Several shifts have made a fundamentally different approach possible:
1. Data volume and diversity
First, advances in sensing technologies have led to an unprecedented accumulation of behavioral and acoustic data across many species and environments. Tools like passive acoustic monitoring, animal-borne tags, and long-term video recording now capture communication that was previously too hard to capture and annotate, including continuous, overlapping, and context-rich interactions.
2. Bigger questions are being asked across species
In parallel, the field has become more collaborative and interdisciplinary. Researchers are increasingly working across populations, species, and disciplines. This includes large, coordinated efforts like ours and Project CETI, alongside a growing set of partnerships and shared datasets that make it easier to compare findings and build tools across taxa.
Researchers are starting to ask broader questions:
- Which communication features transfer across species?
- Do representations learned on birds, bats, and marine mammals reveal shared structural “axes” of communication that cut across taxa?
- Can we begin to identify where communicative systems converge and diverge across the Tree of Life?
3. Modern machine learning as a discovery engine
AI methods have shifted from task-specific classifiers (e.g., “detect this species,” “label this call type”) toward general-purpose representation learning and foundation models.
Self-supervised and transfer-learning approaches can learn structure directly from raw, largely unlabeled recordings and then reuse those representations across species and tasks (see our recent paper on what features matter for bioacoustic encoding). They also allow biologists to interact with their data in natural language (as with NatureLM-audio), making powerful computational models vastly more accessible.
4. A move from hypothesis-limited to data-driven workflows
Instead of defining call types, units, or grammatical rules entirely in advance, researchers increasingly use models to propose candidate structure: clusters, segmentations, or latent dimensions that can then be interpreted and tested with established ethological and linguistic tools. Hypotheses still matter, but they increasingly emerge from large-scale analysis rather than coming first.
A Working Definition of Animal Language Processing
Against that backdrop, we use “Animal Language Processing” as a descriptive label for a set of practices that are already coalescing.
AI-focused: Machine learning and generative models are used as primary tools to discover structure in noisy, often unlabeled communication data.
Data-driven: Analyses start from large-scale, multimodal data like acoustic, behavioral, and environmental signals, instead of relying on predefined units or hand-crafted features.
Species-agnostic: Methods aim to work across taxa rather than focusing on a small set of well-studied species, opening the door to cross-species comparisons and shared infrastructure.
It’s also important to begin to define how ALP fits in with other established categories.
First, while ALP is inspired by the work in NLP, it is not “natural language processing applied to animals.” It’s also not committed to any particular definition of “language.” It does not assume recursion, compositionality, or human-like syntax. Instead, it treats any system of structured signals—vocal, gestural, or multimodal—as data that can be analyzed for regularities, context-sensitivity, and interactional patterns.
We view ALP as complementary to animal linguistics. Linguistics and animal linguistics bring theory, careful concepts, and deep analyses of communication systems. ALP brings computational leverage, cross-species breadth, and the ability to operate at scales that were previously out of reach. Both are needed to decode animal communication.
Ethical and Governance Questions Are Baked In
ALP opens new scientific and societal frontiers to study non-human animal communication at a scale and level of detail that was previously unattainable. These advances create both opportunities and risks that extend beyond any single study or methodology. Many of the ethical considerations of ALP are the same that general animal research faces – they may nonetheless become harder to manage as data, tools, and use cases grow.
A central concern with these technologies is the question of who uses them and why. As these models and tools become more accessible over time, they could be repurposed for exploitation, including the manipulation of wildlife for tourism, fishing, or surveillance. That creates real risks that ALP practitioners will have to weigh. As ALP moves from observation toward intervention, ethical frameworks must emphasize precaution, minimal harm, and transparency. This is already an active area of discussion, with groups like the MOTH program exploring how existing frameworks hold up and where they may fall short.
An Invitation to Sharpen and Refine
“Animal Language Processing” offers a working definition for a convergence already in motion. We hope ALP can serve as a shared umbrella that makes visible the work many communities are already doing together:
- Ethologists and field biologists bringing decades of behavioral insight and contextual knowledge.
- Bioacousticians and technologists building the sensing and data infrastructures.
- Linguists and philosophers articulating concepts of structure, meaning, and communicative intent.
- AI and ML researchers contributing representation learning, foundation models, and generative tools.
ALP is a provisional definition, meant to be tested, challenged, and refined in practice. We invite the community to share feedback on this framework so it best reflects the true nature of how this interdisciplinary community is collaborating and converging.
If you have thoughts or reflections on this working definition, we’d love to hear from you.