IndoUKC

IndoUKC is a semantic knowledge base that shows the cultural similarity and diversity of Indian languages. The dataset consist of 18 Indian languages. The IndoUKC dataset can be recovered at the URL

Current statistics of the dataset
No Language Name Concepts mapped to UKC New concepts need to be mapped to UKC Total concepts
1 Assamese 9093 5864 14957
2 Bengali 11119 25210 36329
3 Bodo 9250 6520 15770
4 Gujarati 11105 24458 35563
5 Hindi 11113 27641 38754
6 Kannada 9956 12083 22039
7 Kashmiri 10423 18967 29390
8 Konkani 10938 21422 32360
9 Malayalam 11184 16581 27765
10 Manipuri 9137 7172 16309
11 Marathi 10379 19199 29578
12 Nepali 7093 4606 11699
13 Oriya 10927 24349 35276
14 Punjabi 10764 21577 32341
15 Sanskrit 8710 14244 22954
16 Tamil 9884 14437 24321
17 Telugu 9737 11338 21075
18 Urdu 10864 23396 34260