Research in Natural Language Processing (NLP) has rapidly advanced in recent years, resulting in exciting algorithms for sophisticated processing of text and speech in various languages. Much of this work focuses on English. The Semitic group of languages includes Arabic 422 million speakers (native and non-native) in the Arab world, making it the fifth most spoken language in the world. Amharic (27 million), Hebrew (10 million), Tigrinya (6.7 million), Syriac (1 million) and Maltese (419 thousand). Semitic languages exhibit unique morphological processes, challenging syntactic constructions and various phenomena that are less prevalent in other natural languages. To face these challenges with unique solutions, there is a need for a large and high-quality corpus linguistics infrastructure.
The Israeli Association of Human Language Technologies (IAHLT) is a nonprofit organization of leading industry companies, academia and government, focused on building and providing an innovative Hebrew & Arabic Corpus Linguistics Infrastructure to enable the industry and academia to research and develop innovative NLP & Speech technologies and applications. The IAHLT Corpus Linguistics Infrastructure, knowledge sharing and networking, enhances its members’ abilities to innovate and create advanced solutions for the NLP & Speech markets. One-third of the IAHLT budget comes from its industry members and two thirds from the Israel Innovation Authority and National Digital Israel Initiative.