Technology · 3 min read

Why Arabic LLMs are the Most Underrated Bet in AI

June 22, 2026 · By Mohammed Samir Gaber

I’ll say it plainly: an Arabic-native frontier model is the most asymmetric bet on the AI map in 2026, and almost nobody outside MENA is building toward it. Falcon, Jais, Allam — the foundations exist. What’s missing is the company that goes the last mile and ships a frontier-quality Arabic LLM with the right product surface.

The Arabic data problem

Frontier closed models are mostly trained on English-dominated web data. Arabic content is roughly 1.5% of Common Crawl despite being the fifth-most-spoken language on earth and the language of 400M people. The result: GPT-6 and Claude 5 are great at Arabic, but not natively great. Subtle dialect handling, formal-vs-colloquial register, code-switching with English in the way real Saudis or Egyptians speak — all rough edges.

What’s already been built

Three serious open-weight Arabic foundation models exist:

Falcon from TII (UAE) — large, capable, well-funded
Jais from G42 / MBZUAI — Arabic-first 30B model, strong on dialects
Allam from SDAIA (Saudi Arabia) — government-backed, Arabic-native instruction tuning

None of these are frontier-equivalent yet. All of them are within striking distance, and all are improving fast. The ceiling is funding and product talent, not technical possibility.

Why this is the asymmetric bet

Three reasons:

The market is enormous and underserved. 400M Arabic speakers across MENA. Default AI experiences for them are second-class today.
Government and enterprise buyers prefer local. Data residency, language fidelity, sovereignty — Saudi Arabia and the UAE will both pay premium rates for Arabic-native AI from a local vendor.
The talent and capital are converging. Saudi PIF, UAE sovereign funds, Aramco’s tech arm, returning Arab AI researchers — the ingredients are now in the region.

What the winning company looks like

Probably not a 1,000-person research lab. More likely a focused 30-person team that takes the best open Arabic foundation, fine-tunes hard on dialect-specific instruction data, and builds a product layer (chat, agents, code, voice) that’s better in Arabic than any frontier closed model. Funded by a sovereign or strategic, headquartered in Riyadh or Abu Dhabi.

What stops it from happening

Two things:

The talent that could build it is mostly inside government-owned labs that are slow to commercialise
Western frontier players keep getting good enough at Arabic that the urgency to build a sovereign alternative dampens

Both are real but neither is permanent. The window stays open through 2027.

The thesis if you build here

The most defensible AI products in MENA over the next decade will use a sovereign Arabic foundation, fine-tuned on local data, deployed inside the region’s data centres, integrated with Saudi-first APIs (Salla, Foodics, Mrsool, government services). MCP is the integration layer; the Arabic model is the brain. Whoever puts both together first owns the next decade.

Why I’m watching closely

MLO Technologies’ product roadmap has an Arabic-first sub-product line already on the calendar. We’re not building the foundation model — we’re building the application layer that needs one. The cleanest dependency we have is on the Arabic LLM that doesn’t quite exist yet at frontier quality.

Building in this space and want to compare notes? Reach out.

All journal entries