NatureLM-audio: Our Flagship Model
NatureLM-Audio is the world’s first large audio-language model for animal sounds. Trained on a vast and diverse dataset spanning human speech, music, and bioacoustics, it brings powerful AI capabilities to the study of animal communication.
With NatureLM-Audio, researchers can:
- Detect and classify thousands of species and vocalizations across diverse taxa
- Recognize species it has never encountered before with better than random accuracy
- Answer complex questions about bioacoustic data using natural language.
- Analyze massive datasets in minutes rather than months, accelerating research at an unprecedented scale.
NatureLM exhibits few-shot learning capabilities at inference time, allowing it to generalize to new tasks without needing to be retrained. Instead of needing to fine-tune the model on a new dataset, users can prompt it with just a few examples of a new task, and the model can generate meaningful responses based on existing knowledge.
NatureLM-Audio also demonstrates emergent properties like the ability to count the number of individuals in a recording, identify distress and affiliative calls, and classify new vocalizations—without explicit training. It also shows promising domain transfer from human speech to animal communication, supporting our hypothesis that shared representations in AI can help decode animal languages.
We are committed to open science and will release NatureLM-audio in 2025.