To better understand what matters for building a generalizable bioacoustic encoder, we tested 19 models across 26 datasets and a new evaluation benchmark. Our main finding is that a two-stage training approach—self-supervised pre-training followed by supervised post-training, both on a mix of bioacoustic and general audio—delivers the strongest performance. We share our methodology, training details, and key takeaways to help accelerate research in animal communication and conservation.
We’re introducing a no-code demo of NatureLM-audio on HuggingFace Spaces. NatureLM-audio is the first large audio language model tailored for bioacoustics. The new interactive UI lets anyone upload animal sounds, ask questions in plain English, and explore tasks like species ID, audio captioning, and life stage classification. This early beta is designed to allow the community to explore—try it out, share your feedback, and help shape the future of decoding animal communication.
FrogID, the world’s largest frog citizen science project led by the Australian Museum, evaluated Earth Species Project’s open-source NatureLM-audio AI model to identify frog calls in over 1.3 million recordings. The AI achieved near-perfect accuracy distinguishing frog sounds from birds, insects, and human speech, potentially saving 300+ hours of manual validation each year. With strong performance even without fine-tuning, NatureLM-audio shows promise for large-scale frog species identification, invasive species monitoring, and biodiversity conservation.
Earth Species Project and Raincoast Conservation Foundation have partnered on a cutting-edge project to decode orca communication using AI.
Explore Earth Species Project’s 2024 breakthroughs in AI and animal communication—and a bold new strategy to help life on Earth thrive.
NatureLM-audio analyzes complex animal vocalizations across thousands of species with simple, natural language prompts. With state-of-the-art performance on novel benchmarks, it tackles bioacoustics tasks like species classification, life-stage prediction, and even audio captioning, opening up powerful new possibilities for animal communication research.
We’re excited to announce that we’ve secured two significant grants – a $10M grant from technology entrepreneur and LinkedIn co-founder Reid Hoffman and a $7M grant from Waverley Street Foundation, a 501(c)(3) nonprofit investing at the intersection of climate solutions and community priorities.
Sara Keen, Senior Research Scientist in Behavioral Ecology and AI, reflects on a pivotal gathering at the Santa Fe Institute titled "Animals in Translation: Creating Criteria and Frameworks for Decoding Communication in Other Species." Joined by a diverse group of researchers, writers, and artists, the event aimed to define standards for successfully deciphering non-human communication systems amidst the burgeoning intersection of machine learning and animal behavior studies.
Senior AI Research Scientist Masato Hagiwara announces the release of BirdAVES—a series of self-supervised animal vocalization encoder models specifically designed for birds. BirdAVES achieves a remarkable 20+% improvement in performance across bird-related datasets and tasks compared to the previous AVES model.