Today’s conversation is with Dr Simon Greenhill, Senior Scientist in the Department of Linguistic and Cultural Evolution at the Max Planck Institute for the Science of Human History in Jena, and the ARC Centre of Excellence for the Dynamics of Language at Australian National University. Simon’s research primarily focuses on the evolution of languages and linguistic diversity, and what this can tell about about human prehistory. His research mainly uses Bayesian phylogenetic methods and he has helped build a number of large-scale linguistic and cultural databases. He is also one of the editors of Language Dynamics and Change, and he is on the editorial board of the Journal of Language Evolution.

What are your research interests and your particular area of expertise?
Broadly speaking, my research has focused on three main areas.
First, I think the biggest unsolved question we have in human evolution is why and how we generated such massive diversity of languages and cultures. Why do we have more than 7000 very different languages instead of just a few? why do we have so many ways of building a human society? where do we see the most diversity? how is this diversity linked to, or generated by, social, cultural, and environmental factors? And conversely; how is this diversity constrained, what configurations are rare, and why. For example, I’ve been involved in studies quantifying global and regional linguistic diversity and how this is generated. Another recent focus has been testing hypotheses about how factors like population size or cultural contact can affect the evolutionary dynamics of language change.
Second, how did the current distribution of human societies get the way it is? My first publication was on “Testing Population Dispersal Hypotheses”and since then I’ve worked to test hypotheses about human prehistory. For example, I used computational phylogenetic methods to identify the relationships between the languages of large Austronesian family that stretches from Taiwan to Madagascar, Hawaii, Rapanui (Easter Island), and New Zealand. I used this phylogeny to pinpoint the homeland of the languages (and their speakers!) in Taiwan around 5200 years ago and their dispersal since. I’ve subsequently been involved in ongoing projects attempting to shed light on the histories of the Indo-European, Dravidian, Sino-Tibetan, and Uzo-Aztecan language families.
Third, you can’t answer interesting questions without good methods and good data. In terms of data, a major component of my work has been building large-scale, open access databases of cross-linguistic and cross-cultural data. I have built and released two of the world’s largest cross-linguistic databases, the Austronesian Basic Vocabulary Database and TransNewGuinea.org, and am involved in many other database projects including POLLEX-Online, Pulotu, D-PLACE, and number of forthcoming projects (Lexibank, Numeralbank, Parabank, Glottobank). Using these data, I have worked to evaluate how well computational and phylogenetic methods for languages and cultures work.
What originally drew you towards human evolution, and specifically language evolution?
I’ve always been fascinated by languages, my father learnt Russian, German and Maori and had many dictionaries scattered around the house (Dad collected books, amongst many other things, and decided he liked the Collins ‘Gem’ dictionaries). I remember paging through these old dictionaries comparing words and looking up different meanings on rainy afternoons. I later went on to take German and French at school.
However, I ended up going to university to take computer science. I quickly decided I didn’t want to spend my life programming CRMs or doing IT tech support, so I dropped out of that and then did a year or so doing the undergraduate ‘sampler’ taking papers in biology, psychology, anthropology, German literature, political studies, and philosophy.
At the same time I was reading a lot of popular science books and came across three books which enthralled me: Richard Dawkin’s The Selfish Gene, Stephen Jay Gould’s Wonderful Life, and Jared Diamond’s Guns, Germs, and Steel (which had just been published a year or so earlier). With the benefit of hindsight, I now realise that these works are flawed in various ways but at the time they were life changing, for me at least, and I soon converted to a biology/psychology double major.
A few years later, when I came across a course “Evolution, Behavior, and Cognition” taught by Russell Gray, Mike Corballis, and Fiona Jordan. This course had a major language evolution component and I was immediately hooked and loved every minute of it.
Tell us a bit about your PhD. How did you find your PhD experience?
I did my PhD in the department of Psychology at the University of Auckland. My supervisor was Russell Gray – I badgered him until he took me on as a student. I titled my PhD thesis “The Archives of History: A phylogenetic approach to the study of language,” and it contains a loosely connected set of chapters that all were eventually published as papers: a description of the Austronesian Basic Vocabulary Database, a computational phylogenetic test of hypotheses about Pacific Settlement, an evaluation of how robust phylogenetic methods were to horizontal transmission/borrowing between lineages, and an exploration of rates of change in grammatical features.
I really enjoyed my PhD. It was a lot of hard work but I was in an amazing department and community at Auckland, surrounded by world-leading researchers in psychology, biology, linguistics, anthropology, and computer science. The late, great Pacific Archaeologist, Roger Green, took a particular interest in my career – often popping by my office to give me a paper on a topic he thought I should be educated about and returning the following week to quiz me relentlessly.
I made many good friends and started collaborations with some of them that are still generating new projects a decade later. And I even got to meet both Jared Diamond and Richard Dawkins when they visited the university during my time there (but I chickened out asking them to sign my copies of their books).

After your PhD, what positions have you held and where?
I received my PhD in 2009 and immediately started a postdoc in Alexei Drummond’s Computational Evolution Group in the Computer Science department at Auckland. There I got to spend a lot of quality time learning Bayesian phylogenetic methods while exploring grammatical data from languages. This position lead to a series of papers on the rates of change in language structures.
In 2012 I was awarded a Discovery Early Career Research Fellowship and moved to the Department of Linguistics in the College of Asia and the Pacific at Australian National University in Canberra. Being in one of the world’s best field linguistics departments was quite an eye opener for me (I was literally the only person who did not do fieldwork and colleagues used to joke that I was the honorary indoors-linguist). At ANU my goal was to focus on the relationships between the languages of New Guinea. New Guinea is one of the most linguistically diverse parts of the world (>900 languages!), as well as one of the least-studied parts. This project resulted in a large-scale language database, and a series of papers on linguistic diversity and potential explanations for this diversity.
In 2014 we successfully bid for a major project which became the “ARC Centre of Excellence of the Dynamics of Language.” I was the named director of the language evolution component (one of four). One of the main things started there was a project testing a long-standing hypothesis about rates of change in languages being linked to population size (our answer was “sort of,” and “sometimes”).
A few years later I was offered a permanent Senior Scientist position at the Max Planck Institute for the Science of Human History in Jena. I started there in 2016. Since then I’ve been involved in far too many projects to keep track of.
What current projects are you working on?
I’m involved in a lot of projects at the Max Planck Institute in Jena, but a major one that we’re just wrapping up is a phylogeny of the Uto-Aztecan language family. Uto-Aztecan is one of the biggest language families in North America with between 50 to 70 languages that were spoken from Wyoming and Idaho down to El Salvador. Despite a few hundred years of study there is still ongoing debate about whether they came from the Nevada region almost 9000 years ago, or from California 3000 years ago, or from Mesoamerica 3000 years ago. Our results clearly point to a 3000 year origin in what is now California, which neatly meshes with a lot of the ethnohistorical and ethnobotanical arguments out there. It is also the first time we’ve had a robustly dated language group in North America so I’m hoping it can help shed light on the rest of the Americas as there’s a lot of very interesting language groups there that have important stories to be told (Mayan, Oto-Manguean, Athapaskan, and many more!).
One project that I’ve just started is aiming to further explore the evolutionary dynamics of languages and cultures on a global scale. We’re hoping to use the rates at which different aspects are changing over time to tackle a series of related questions. For example, do the most distinct cultures have the most distinct languages? Or do rates correlate in different parts of the world such that certain aspects of language and culture are more stable over time everywhere? Or does this vary across the globe? I’m looking forward to finding out what this tells in the bigger picture.
Has COVID19 affected your research plans?
Yes and no. In general most of my work requires only a laptop (and computer-cluster), so in that sense I’m far luckier than my colleagues who have had their fieldwork or project plans thrown into complete disarray.
However, I deeply miss meeting colleagues and chatting with them. Many of the most fun projects I’ve been involved in were started by a conversation down the corridor. I’m a little worried by all the great projects that will never be started because the initiating conversation didn’t happen.
I was on paternity leave for the first few months of the pandemic as my wife and I apparently thought bringing another child into the world during a global crisis was a good idea. Since then it’s been a bit of a juggling act to get work done at home with two children, but in the grand scheme of things we’re all healthy and safe so I can’t complain.

What achievement are you most proud of?
If I had to rank things then I think my proudest achievement is the Austronesian Basic Vocabulary Database (ABVD). As part of my PhD project I needed language data from many Austronesian languages and while there was lots of data out there it was generally scatted in the primary literature (journal articles, dictionaries, etc), and more was available in the ‘gray’ literature hidden in people’s filing cabinets or private hobby databases. To collect all these data into one location I built a web interface and collated the data.
We started with about 200 languages collected by Bob Blust, but since then the database has grown in size to have more than 300,000 lexical items from more than 1600 language varieties. The ABVD is therefore one of the world’s largest comparative linguistic databases.
My PhD supervisor Russell and I decided very early on (~2005?) to put all these data online and make them open access. Nowadays the open science and open data movement has really reshaped the playing field making it common for all data to be online and reusable. But it was a different story back then (“data are available from the authors on request”), and I think younger academics often don’t realise how unusual that was. At the time the ABVD was unique. Because of this, the ABVD has been cited as leading the way for open databases of comparative linguistic data and the general structure and framework has been copied by a number of major linguistic and cultural databases.
The database also gets used frequently: the website gets about 800 visitors a week, there are links to it all over the web (especially places like Wikipedia), and the paper we wrote describing the database has about 250 citations. I’m always amazed at the new and innovative ways the ABVD has been used by researchers from cutting-edge computational studies to traditional linguistic and anthropological studies. Even more gratifyingly I get a few emails a year from speakers of a language in the database thanking me for helping make their language more widely known. One person even told me they were using the wordlist to teach his son some words from his grandparents language. Very humbling.
Why do you think studying the evolution of languages is important?
If we want to understand humans, we need to understand language. Language is the strangest thing that people do: We spend a lot of time forcing air through a hole in our heads to send messages to other members of our species. As infants, we come equipped to rapidly learn this trick and as adults we usually spend many hours every day doing this. With a few phonemes we can tell immediately if someone shares our accent, and have powerful clues to where they’re from, their gender, their age, and their social groups.
Language is the best example of a cultural evolutionary system. Language is inextricably tied to culture. Each language is in itself an intricate cultural product that has co-evolved with its speakers to carry priceless information between speakers and across generations (which is why we can use languages to trace history).
More practically, a lot of work on cultural evolution rests on small scale inter-individual studies or rather abstract simulations. Along with these studies I feel we also need to look at the larger macro patterns over thousands of years. It would be great to do that on cultures but this is hard as we don’t really know how to carve culture at her joints so that our units of comparison are consistent across many cultures. Anthropology also largely turned away from large-scale comparison in the 60s and 70s so there’s no real ongoing tradition of comparative data collection on a global scale after that. Linguists, however, have collected vast amounts of comparative data that can be readily used to look at big picture macro-questions. In short, human evolution and cultural evolution needs more linguistics!
What would you be if you were not a scientist?
I’m not sure – I’ve wanted to be a scientist for a few decades now and really enjoy it. If I were looking for a job I’d probably try sell myself as a data scientist (whatever they are). However, one thing I always enjoyed was web development and I did a bit as a sideline when I was doing my PhD. Web development mixes the fun technical side of programming (I used to make spend hours making sure my websites were XHTML 1.0 compliant) with a more creative component designing and styling interfaces.
However now I have two daughters, a 6 year old (Zoë) and a 10 month old (Maya), so realistically I’d probably be a stay-at-home dad.