An AI group associated with Abu Dhabi’s rulers has launched an advanced Arabic AI tool named Jais. This initiative is part of the United Arab Emirates’ (UAE) broader efforts to pioneer in the field of artificial intelligence.
The Jais model is a product of a collaborative effort between G42, the UAE’s technology holding company, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), and California-based tech firm Cerebras. The tool is open-source and bilingual, catering to over 400 million Arabic speakers globally.
This launch aligns with the UAE and Saudi Arabia’s substantial acquisition of Nvidia chips, which are essential for AI software development. This move is part of a global competition to secure resources for AI expansion. Previously, the UAE had developed another open-source model, Falcon, utilizing more than 300 Nvidia chips. In a significant deal, Cerebras signed a $100 million contract this year to provide G42 with nine supercomputers.
Andrew Jackson from G42’s Inception highlighted the need for large language models (LLMs) to focus on languages other than English, including Arabic, one of the world’s major languages. He questioned why the Arabic-speaking community should not have access to a dedicated LLM.
While existing advanced LLMs like OpenAI’s ChatGPT, Google’s PaLM, and Meta’s LLaMA can comprehend and generate Arabic text, Jackson argued that the Arabic component in these models is significantly diluted.
According to its developers, Jais surpasses Falcon and other open-source models like LLaMA in terms of Arabic accuracy. Furthermore, Jais is designed to have a more precise understanding of the region’s culture and context, unlike most US-centric models, said MBZUAI’s acting provost, Professor Timothy Baldwin.
Baldwin also emphasized that measures were taken to ensure Jais respects cultural and religious sensitivities. Extensive testing was conducted to eliminate harmful, sensitive, offensive, or inappropriate content that does not align with the values of the organizations involved in its development.
Named after UAE’s highest peak, Jais was trained for 21 days on a part of Cerebras’s Condor Galaxy 1 AI supercomputer by a team in Abu Dhabi. G42 has collaborated with other Abu Dhabi entities, including Abu Dhabi National Oil Company, Mubadala, and Etihad Airways, as launch partners to use the technology.
Training the model posed challenges due to the scarcity of high-quality Arabic language data online compared to English. To address this, Jais uses both modern standard Arabic, understood across the Middle East, and the region’s diverse spoken dialects, sourced from media, social media, and code.
In conclusion, Baldwin stated that Jais is clearly superior in Arabic and competitively comparable or even slightly better in English across various tasks than existing models.