What is ESP?
Founded in 2017, Earth Species Project is a non-profit dedicated to using artificial intelligence to decode non-human communication.
Who is behind ESP?
Earth Species Project was initially founded by Britt Selvitelle, a member of the founding team at Twitter; and Aza Raskin, who helped found Mozilla Labs and is also the co-founder of the Center for Human Technology. Katie Zacarian joined the organization as a co-founder in 2020 and became CEO in 2022. The team now comprises 12 people which includes an AI research team of seven representing deep expertise in diverse fields, from mathematics and neuroscience to deep learning, AI and natural language processing.
Many of the team have backgrounds working in the technology sector and all are deeply committed to ensuring that rapid technological advances benefit people and planet. See more on their backgrounds on our Team Page.
What is your mission?
Earth Species Project is dedicated to using artificial intelligence to decode non-human communication. We believe that an understanding of non-human languages will transform our relationship with the rest of nature. Along the way, our research will accelerate and deepen the fields of animal and ecological research and drive more effective conservation outcomes. We are also aiming to make a contribution to building the broader fields of ethology, bioacoustics and machine learning by making all our work publicly accessible through the creation of an open access data repository - the Earth Species Project Library.
Who are your science collaborators?
Making progress on decoding animal communication will require the collaboration of many organizations and brilliant minds in this interdisciplinary field of ethology, AI, linguistics, mathematics and neuroscience. One of our core principles is collaboration, and we are proud to be partnering with more than 40 biologists and institutions on this journey, ranging from Dr. Ari Friedlaender at University of California at Santa Cruz to Professor Christian Rutz at the University of St. Andrews, Woods Hole Oceanographic Institute and Cornell University. You can see more detail on our Partners Page and find out how to get involved.
What is your approach to decoding non-human communication?
The motivating intuition for ESP was that modern machine learning can build powerful semantic representations of language which we can use to unlock communication with other species. The field of machine learning is experiencing rapid and exponential change, with the total number of machine learning publications doubling from more than 200,000 in 2010 to almost 500,000 in 2021 (Stanford AI Index Report 2021). As a result, we are testing new approaches to delivering on our mission on a regular basis.
We are currently engaged in a number of AI research programs which have been developed in close partnership with our partners in biology and machine learning and that lay the foundations for future research. These include:
- Creating unified benchmarks and data, acoustic and multimodal, to validate our experiments and accelerate the field, vetted by top biologists and AI researchers
- Developing the first foundation models for animal communication. Like the LLMs that have become dominant in human language processing, foundation models are trained on large amounts of data. They can perform difficult predictive tasks and are useful for domains with less annotated data such as animal communication.
- Turning motion into meaning by delivering automatic behavior discovery from large-scale data sourced from animal-borne tags. This is based on recent breakthroughs in machine learning which enable automatic multi-modal translation across text, images, video, movement data, and sound.
- Developing models which can generate novel vocalizations, moving us toward the ability to engage in two-way communication between an AI and another species.
Please see our technical roadmap for more information
What species do you work with?
ESP’s mission is to decode non-human communication in order to transform the way humans relate to the rest of nature. This means we see value in shining a light on the sophistication of a wide range of communication systems across different taxa. We are currently working with datasets from a wide range of bird species, primates, cetaceans, elephants, bats and amphibians as well as dogs and cats.
Some animals’ communication systems are much easier to work with for ML than others based on various factors: 1) the amount of data available 2) their frequency kHz range 3) the ability to study with controls in captivity, 4) what’s already known about model systems based on decades of multidisciplinary study, etc. Our ability to develop machine learning research approaches with relatively “easier” data and easier model systems (e.g. zebra finch) will enable us to work on the harder systems under harder conditions (e.g. dolphins in-situ).
Cetaceans are particularly interesting because of their long history (34 million years as a socially learning, cultural species), complex social behavior, and because - as light does not propagate well underwater - more of their communication is forced through the acoustic channel. Working with songbirds is also useful because there is a lot of bird data with context annotated available online.
We also believe that training models on data from different taxa will prove more effective in the journey to decoding communication, based on the assumption that prior knowledge from one species can provide insights into the communication systems of others. Our approach is to start with the species for which there are significant amounts of data, and species that are social in nature.
When will you see the results of your research?
We are already seeing significant results as our machine learning models help to facilitate the rapid analysis of research data at scale which supports our understanding of how other species behave and communicate.
Importantly, we have recently published the first-ever benchmark datasets for animal vocalizations and movement, BEANS: Benchmark for Animal Sounds, and BEBE: Behavioral Benchmark.These benchmarks are critical to measure the progress of the field. We have also published AVES: Animal Vocalization Encoder based on Self-Supervision: the first-ever self-supervised, transformer-based foundation model for animal vocalizations. In the human domain, over 90% of publications in top journals now rely on these kinds of models, but to date this is the first for non-humans. In the next few years, our aim is to create the same research groundswell for the fields of ethology and conservation.
We are currently working on a wide range of projects with partners including: discovering the vocal repertoires of carrion crows; creating models of beluga whales’ contact calls to map their social structure; and generating real-time novel vocalizations towards two-way communication with zebra finches.
In terms of reaching the goal of fully decoding the communication of another species, this is very much the beginning of our journey, and it is still too early to tell when this may be possible and/or what that end goal will look like. We are currently focused on how we can use ML to interpret animal signals and to better understand animal perceptual systems.
How will your research create broader change/impact in the world?
ESP is developing novel machine learning approaches that are helping biologists and ethologists to rapidly analyze and find patterns in the vast troves of data now being gathered from the latest generation of animal-borne sensors that monitor vocalizations, movement and environmental context. This has the potential to rapidly accelerate the fields of biology and ethology. It also has the potential to support animal welfare and conservation efforts - as the more we know and understand a species, the better we will be able to protect them.
How is the project funded?
We are a 501c3 nonprofit and our work is made possible through the generous support of a community of donors. To make a contribution please visit Support.