Research in Natural Language Processing (NLP) has rapidly advanced in recent years, resulting in exciting algorithms for sophisticated processing of text and speech in various languages. Much of this work focuses on English. The Semitic group of languages includes Arabic 422 million speakers (native and non-native) in the Arab world, making it the fifth most spoken language in the world. Amharic (27 million), Hebrew (10 million), Tigrinya (6.7 million), Syriac (1 million) and Maltese (419 thousand). Semitic languages exhibit unique morphological processes, challenging syntactic constructions and various phenomena that are less prevalent in other natural languages. To face these challenges with unique solutions, there is a need for a large and high-quality corpus linguistics infrastructure.
The Israeli Association of Human Language Technologies (IAHLT) is a Nonprofit organization of leading industry companies, academia and government, focused on Improving the Hebrew and Arabic languages understanding in computer systems. The IAHLT is building and providing an innovative Corpus Linguistics Infrastructure and Open-Source Tools to enable the industry and academia to research and develop innovative NLP & Speech technologies and applications. One-third of the IAHLT budget comes from its industry members and two thirds from the Israel Innovation Authority and National Digital Israel Initiative.