Universal Knowledge Core

A diversity-aware conceptualization of the world

The languages and the corresponding lexicon sizes as they are represented in the UKC.

The Universal Knowledge Core (UKC) is a multilingual, high quality, large-scale, machine-readable, and diversity-aware lexical resource.

The key design principle underlying the UKC is to maintain a clear distinction between the language(s) used to describe the world as it is perceived and what is being described, i.e., the world itself. The Concept Core (CC) is the UKC representation of the world and it consists of a semantic network where nodes are language independent concepts. Each concept is characterized by a unique identifier which distinguishes it from any other concept. The semantic network consists of a set of semantic relations between nodes which relate the meanings of concepts, where these relations are an extension of those used by the Princeton WordNet (PWN) (e.g., hyponym, meronym).

We talk of the Language Core (LC), meaning the component that, in the UKC stores the words, senses, synsets, glosses and examples for all the languages supported by the UKC. In the LC each synset is univocally associated with one language and, within that language, with at least one word.  Synsets are linked to concepts, and there is the constraint that each synset is linked to one and only one concept.

So far, the UKC has evolved in two ways: one is a combination of importing of freely available resources, e.g., WordNets or dictionaries of high quality (more details), and second is a collaborative platform of linguistic experts and native speakers to continuously build and maintain a lexical resource for individual language.  As of June 2021, it contains over 1,100 languages, over 2 million words, over 3 million word senses, and over 110,000 supra-lingual concepts.

Download Vision and Mission

UKC Statistics

Languages
Words
Word Senses
Concepts

Our Project Types

Analytics

Research projects to understand the language diversity. View Projects

Lexicon Generation

Lexical resources development for an individual language View Projects

Lexicon Crowdsourcing

Provide various online tools to navigate, explore and add new lexicons using the mass. View Projects

Lexicon Integration

Integrate lexicons from various languages into UKC View Projects