What Can a Primate Face Tell Us? A Scalable Approach to Studying Facial Expressions

Anyone who has spent time observing primates knows how expressive their faces can be. A lip-smack, a brief glance, or a subtle shift of the mouth can signal curiosity, tension, submission, or affiliation. For ethologists, these fleeting expressions offer a rich window into the social dynamics of a group. Studying them, however, has traditionally required painstaking manual work.

In a paper presented at the AI for Non-Human Communication workshop at NeurIPs 2025,  Felipe Parodi, Penn AI Fellow and PhD candidate at the University of Pennsylvania, and collaborators introduce PrimateFace to address this gap. Trained on faces from more than 60 primate genera, the system can detect faces, track landmarks, and analyze facial movements across a range of species. We recently spoke with Felipe about what motivated the project, what the team discovered by training models across such diverse primate species, and how tools like PrimateFace could help researchers study animal communication and social interaction at scale.

What motivated the development of PrimateFace, and what gap in primate research were you hoping to address?


Machine learning has transformed how we study human faces. We can track expressions, recognize individuals, and measure subtle movements automatically and at scale. For non-human primates, those tools basically didn’t exist. The approaches that did exist were built for one species at a time, which meant starting from scratch for every new primate you wanted to study. Meanwhile, researchers who study primate facial communication have relied on manual coding, watching video frame by frame, and labeling every expression by hand. It’s rigorous but incredibly slow, and it creates a real bottleneck for the field. We built PrimateFace to close that gap: a large-scale dataset and suite of AI models that work across the entire primate order, from tarsiers to gorillas.
Video 1: PrimateFace predictions on a juvenile macaque. Left: facial landmark predictions overlaid on the face. Top right: mouth aperture over time. Bottom right: audio spectrogram. Green bar indicates vocalization onset.

How does PrimateFace leverage modern AI/ML methods to study animal communication that might be missed by traditional approaches?


At multiple levels. For building the dataset, we used self-supervised models like DINOv2 to intelligently and efficiently select which images to annotate. It groups images by visual similarity, which often corresponded to distinct species, so we could maximize coverage without annotating everything. For labeling, we used an iterative approach where models generate candidate annotations that human experts then correct, which made it feasible to work at the scale we needed. The real payoff is ultimately what these models enable. For example, we can extract facial movements from high-speed video, decompose them into their component patterns, and automatically discover recurring “facial syllables,” like lip-smacking or subtle head turns. This pipeline even revealed sex-specific differences in facial behavior that were invisible to the human experimenters who collected the data.

Why was it important to train the system on many different primate species rather than focusing on just one?

“One of the most interesting findings was that diversity in training data produces better models, not just for primates, but even for human faces.”

One of the most interesting findings was that diversity in training data produces better models, not just for primates, but even for human faces. Models trained on the full range of primate facial morphology, such as baboon muzzles, flat-nosed capuchins, and tarsier faces that are essentially giant eyeballs, transferred remarkably well to human face benchmarks. But models trained only on human faces failed badly when applied to other primates. The morphological variation across the primate order acts as a kind of natural data augmentation, forcing the model to learn what a “face” really is in a much deeper way. Training on just one species only captures a narrow slice of that variation.

Grid of images showcasing various primate superfamilies: Lemuroidea, Lorisidea, Tarsiidae, Ceboidea, Cercopithecoidea, and Hominoidea. Each image features primates with highlighted facial areas indicating facial features.
Image 1: PrimateFace dataset overview. Representative samples from six primate superfamilies (Lemuroidea, Lorisoidea, Tarsioidea, Ceboidea, Cercopithecoidea, Hominoidea), each annotated with 68 facial landmarks.

What does training on many different species tell us about the value of using diverse biological data?


It tells us something the AI community has largely internalized via the bitter lesson: more data is good, and diverse, clean, data is great. Models trained on the natural variation across the primate order learned more generalizable representations of faces than models trained on any single species, no matter how well-sampled. That’s a powerful argument for taxonomic breadth in any biological AI project. If you want a model that truly understands faces (or bodies and vocalizations), train it on the full range of what nature has produced, not just the one species we happen to know best.

“If you want a model that truly understands faces (or bodies and vocalizations), train it on the full range of what nature has produced, not just the one species we happen to know best.”

How do you see tools like PrimateFace fitting into broader efforts to use AI to understand animal communication and social interaction?


We hope PrimateFace will provide the foundation that other tools can build on: face detection, landmark tracking, and identity recognition. Our demonstrations show the range: automated time-stamping that compresses days of footage into minutes of review, individual recognition systems that can be set up in under an hour, analysis of how mouth movements coordinate with vocalizations in howler monkeys, and quantification of gaze-following dynamics between interacting infants. Each of these previously required months of manual work or was simply impossible. For the broader animal communication effort (which ESP is at the forefront of!), we think PrimateFace will add another dimension by making facial communication, one of the richest and least quantified channels primates have, something researchers can study at scale.
Infographic illustrating the curation, model family, annotation, and scientific applications related to facial recognition and analysis, featuring a dog in the dataset section.
Image 2: PrimateFace is an integrated ecosystem for primate facial analysis with a scalable, annotation-efficient workflow.

What do you see as the biggest opportunities for deeper collaboration between AI/ML researchers and biologists?


This is kind of two-fold. The immediate opportunity is in scaling up the tedious parts (e.g., annotation, tracking detections), so that researchers can spend their time on thinking, interpreting, ethologizing rather than data wrangling. But at the end of the day we want to understand what’s being conveyed in all these social signals. Tools like PrimateFace give us the ability to measure dynamics at a resolution and scale that was infeasible, but the real frontier is connecting these signals to what they might mean – what info is a lip-smack actually transmitting? What might it mean when an older macaque bares his teeth at Punch-Kun? These are questions that require both biologists and ML/stats researchers working together because neither side can answer them alone. Biologists are closer to some underlying ‘intent’ behind the animal’s behavior and the ML researcher can build systems that extract structure at scales no human observer could match. And increasingly, AI systems themselves, through approaches like representation learning, can surface patterns that humans weren’t looking for in the first place. This is, I think, where the real breakthroughs will come – not simply in automating what we already measure, but ultimately in revealing dimensions of communication we didn’t know existed.

Felipe Parodi is a Penn AI Fellow and PhD candidate in Neuroscience at the University of Pennsylvania, where he is co-advised by Konrad Kording and Michael Platt. His research sits at the intersection of computational neuroscience, computer vision, and animal behavior, focusing on how primates communicate through facial expressions and social signals in naturalistic settings. He is the lead developer of PrimateFace, an open-source platform for automated facial analysis across the primate order and maintains the awesome-computational-primatology resource.

Citation: Parodi, F., Matelsky, J.K., Lamacchia, A.P., Segado, M., Jiang, Y., Regla-Vargas, A., Sofi, L., Kimock, C., Waller, B.M., Platt, M.L. and Kording, K.P., (2025). PrimateFace: A Machine Learning Resource for Automated Face Analysis in Human and Non-human Primates. bioRxiv, 2025-08.

Discover more from Earth Species Project

Subscribe now to keep reading and get access to the full archive.

Continue reading