KinDiv: Multilingual Kinship Terms and Lexical Gaps

HOW DO YOU CALL THE CHILD OF YOUR PARENT’S SIBLING?

IN ENGLISH, it is simple: cousin is the only word you can use.

IN ROMANCE LANGUAGES such as French or Italian, you will call him cousin or cugino if he is a male, and cousine or cugina if she is a female.

IN CHINESE, there is no word equivalent to cousin. Rather, there are eight separate words depending on the sex and relative age of the cousin, as well as whether the side is patrilineal or matrilineal: 表姐; 表妹; 表哥; 表弟; 堂姐; 堂妹; 堂兄; 堂弟.

IN HINDI, there are also eight words, but they are not structured the same way: instead of the relative age, they depend on the sex of the parent’s sibling: फु फे रा भाई; चचेरा भाई; ममेरा भाई; मौसेरा भाई; चचेरी बिहन; मौसेरा बिहन; ममेरा बिहन.

IN SOME DRAVIDIAN LANGUAGES from South India, such as Kannada or Malayalam, you have to choose from no less than 16 (!) different terms based on all four properties above: പെങ്ങള്; അനുജത്തി; ചേച്ചി; അനിയത്തി; മദനി; നാത്തൂൻ; മുറപ്പെണ്ണ്; ഏട്ടത്തി; അണ്ണൻ; അനുജൻ; ആങ്ങള; അനിയൻ; മച്ചുനൻ; അളിയൻ; മുറച്ചെറുക്കൻ; ചേട്.

This is an example of kinship diversity that we capture in the KinDiv lexical database.

WHY DOES KINSHIP DIVERSITY MATTER?

FOR A HUMAN SPEAKER, it matters as you are supposed to use the correct term, otherwise people may misunderstand you or think that you are not showing respect.

FOR A COMPUTER, it matters as naïve translations across languages may lead to absurd results at best, and to misleading the readers at worst. Consider the following example Machine Translation output produced by Google Translate:

ENGLISH ORIGINAL: My brother is three years younger than me.
HUNGARIAN TRANSLATION: A bátyám három évvel idősebb nálam.
MEANING: My elder brother is three years younger than me.

The absurd result is not merely a consequence of insufficient training data, but of something much deeper: namely, the lack of equivalent words for sibling relationships across the two languages.

WHAT IS KINDIV?

KinDiv is a lexical database that formally describes family relationship terms across hundreds of languages. It currently covers over 250 concepts characterising sibling, grandparent, grandchild, aunt/uncle, nephew/niece, and cousin relationships. Compared to other multilingual lexical databases, the novelty of KinDiv is that it not only provides equivalent terms but also lexical gaps that are evidence of the lack of equivalent terms. Lexical gaps tell the human or the computer that no perfect translation is possible, and so the correct term needs to be chosen carefully.

HOW DOES KINDIV DIFFER FROM EARLIER RESULTS IN LINGUISTIC TYPOLOGY?

Kinship diversity around the world has been thoroughly explored in the past, in particular by G. P. Murdock in Kin Term Patterns and their Distribution. While KinDev builds on such works, it extends them in three ways: (1) the coverage of languages and concepts is wider; (2) the data is formal and computer-readable; and (3) KinDiv provides actual lexicalisations beyond signalling their presence or absence in a language.

HOW TO EXPLORE KINDIV DATA

The website of the Universal Knowledge Core multilingual lexical database has an interactive visualisation tool that displays all kinship concepts in all supported languages.

HOW TO OBTAIN KINDIV

KinDiv is freely available for download from the project GitHub page.

AUTHORS

Temuulen Khishigsuren, National University of Mongolia
Khuyagbaatar Batsuren, National University of Mongolia
Gábor Bella, University of Trento