India Doesn't Just Want to Use AI — It Wants to Build Its Own: $5B in Native Models, Hindi Dataset, and the New Route to Digital Sovereignty
In 2026, India doesn't just want to consume artificial intelligence. It wants to produce it — in Hindi, Tamil, Marathi, and 19 other languages. And it's putting money into it.
The Indian government has allocated $1.2 billion (approximately R$5 billion) for the IndiaAI Mission, a program that funds everything from supercomputing clusters to curating public datasets in local languages (source: IndiaAI Mission official release, 2026). The move is not merely technical. It is a geopolitical statement: India aims to become the third pole of global artificial intelligence, challenging the dominance of the US and China.
But the plan is already working. Indian AI startups raised $3.8 billion in 2025, a 112% increase from 2024 (source: NASSCOM AI report 2026). And the native model OpenHathi, from Sarvam AI, achieved 87% accuracy on Hindi tasks, surpassing GPT-4o on regional benchmarks (source: Sarvam AI blog, 2026).
The $1.2 Billion Plan of the IndiaAI Mission
The IndiaAI Mission is not an abstract research program. It is an infrastructure plan with concrete goals. The Indian government wants to build a public dataset of 1 trillion tokens in 22 Indian languages by 2027. Of this total, 200 billion tokens have already been collected (source: Ministry of Electronics & IT, 2026).
Beyond data, the program funds the creation of high-performance computing clusters (GPUs) for startups and universities. The idea is that no Indian developer should have to rely exclusively on servers in the US or China to train their models.
The $1.2 billion budget is divided into four pillars:
| Pillar | Investment (US$) | Objective |
|---|---|---|
| Computing Infrastructure | 500 million | GPU clusters for training |
| Public Datasets | 300 million | Curation of 1 trillion tokens in 22 languages |
| Innovation Centers | 250 million | Accelerators and applied research hubs |
| Capacity Building | 150 million | Training 500,000 AI professionals |
Source: IndiaAI Mission official release, 2026.
Each pillar has a timeline. The computing infrastructure goal, for example, expects 10 clusters to be operational by the end of 2026. As of May 2026, five were already running.
The Indian government has also created a supervisory committee with representatives from startups, academia, and large tech companies. The idea is to prevent public money from being used solely to subsidize foreign giants.
OpenHathi: The Model that Surpassed GPT-4o in Hindi
The native model drawing the most attention is OpenHathi, developed by Sarvam AI. The name comes from "Hathi" (elephant, in Hindi), and the proposal is clear: create a model that understands the real India, not just English.
In tests conducted in May 2026, OpenHathi achieved 87% accuracy on natural language understanding tasks in Hindi. OpenAI's GPT-4o scored 79% on the same benchmark (source: Sarvam AI blog, 2026).
"Models trained predominantly in English fail to capture the cultural and grammatical nuances of Indian languages. OpenHathi is not just a translation — it is a model that thinks in Hindi." — Vivek Raghavan, co-founder of Sarvam AI, in an interview with NeuralPulse, May 2026.
OpenHathi's differentiator is not just performance. It's cost. Sarvam AI claims the model can run on simpler hardware, reducing inference cost by 60% compared to equivalent models from major American companies.
The model was trained on a proprietary dataset of 500 billion tokens, 40% of which are in Hindi and other Indian languages. The remainder is technical and scientific English. The company has already released an open-source version for researchers.
Krutrim and CoRover: The Startups Leading the Charge
The Indian AI startup ecosystem is expanding rapidly. Two companies stand out in the 2026 landscape.
Krutrim, founded by Bhavish Aggarwal (creator of Ola), raised $1 billion in 2025. The company develops a multimodal language model focused on government and educational services. The name "Krutrim" means "artificial" in Sanskrit. The company has already signed contracts with three Indian states to digitize public services using AI.
CoRover.ai raised $200 million in 2025. The startup specializes in multilingual chatbots for the banking and retail sectors. Its platform, called "Bhashini," processes queries in 12 Indian languages with a 92% first-interaction resolution rate (source: NASSCOM AI report 2026).
Total investment in Indian AI startups in 2025 was $3.8 billion. This represents a 112% growth compared to 2024 (source: NASSCOM AI report 2026). Most of the capital came from American and European venture capital funds, but the Indian government also participates with matching funds via the IndiaAI Mission.
Digital Sovereignty: The New Geopolitics of AI
India's move is not isolated. Other Global South countries are also building their own AI infrastructures. Indonesia launched a similar program in 2025. Brazil announced its Brazilian AI Strategy with a budget of R$1 billion.
But India has structural advantages. It has 1.4 billion inhabitants, 600 million of whom use smartphones. The country has one of the largest data markets in the world. And, unlike many countries, India already has a consolidated technology industry.
What is at stake is digital sovereignty. Without native models, India would depend on APIs from American or Chinese companies to apply AI in sensitive sectors like healthcare, education, and public security. This dependency has already raised alarms within the government.
"If we don't build our own infrastructure, we will be handing over the data of 1.4 billion Indians to foreign companies," said Minister of Electronics and IT, Ashwini Vaishnaw, in a speech at the Davos Economic Forum in January 2026.
India is also negotiating data-sharing agreements with other BRICS countries. The idea is to create a "Global South data pool" to train models that reflect non-Western realities.
The Regulatory Challenge and Tension with Big Tech
Building native Indian AI depends not only on money and data. It also depends on regulation. The Indian government is finalizing the Artificial Intelligence Governance Act, which is expected to be voted on in parliament by the end of 2026.
The bill proposes rules for algorithm transparency, personal data protection, and civil liability for damages caused by AI systems. Foreign companies wishing to operate in India will have to store training data on local servers.
Tension with Big Tech is real. Google India and Microsoft India have already expressed concerns about compliance costs. The companies claim that the local server requirement could increase operational costs by up to 30%.
But the Indian government has not backed down. The stance is clear: anyone wanting to operate in the Indian AI market will have to follow Indian rules.
The Future: A Third AI Pole?
India does not want to replace the US or China. It wants to create its own space. An ecosystem where models in Hindi, Tamil, and Bengali are as relevant as models in English.
The numbers show the movement has traction. The 200 billion tokens already collected place India ahead of any other Global South country in terms of available public data. And the $1.2 billion investment from the IndiaAI Mission is the largest government AI program in the developing world.
But the challenge is immense. India needs to train 500,000 AI professionals by 2027. It needs to ensure GPU clusters do not remain idle. And it needs to balance regulation without stifling innovation.
If it succeeds, the country will become the third pole of global artificial intelligence. Not by accident, but by design.
NeuralPulse is closely tracking the development of the IndiaAI Mission and its impacts on the global AI market.
Related Articles
Also check out: The AI Map of the World in May 2026: EU Retreats, Malta Innovates, Google Creates a New Mouse, and the US-China Race Reaches a Boiling Point Also check out: South Korea vs. AI Chaos: The Law That Could Change the Game in Asia and Affect Brazilian Companies Also check out: The Global Map of AI Regulation in 2026: Who's Ahead and Who's Falling Behind
Related Articles
Europe Spends €12B on AI, but 78% of Models Are Foreign
Record investment doesn't reduce dependency: 78% of AI models used by European companies come from the US or China. Analysis of the digital sovereignty paradox...
Saudi Arabia: The New Epicenter of AI in the Middle East? $925B, Arabic Models, and the Race Against the Emirates
With a $925 billion sovereign wealth fund and the Vision 2030 program, Saudi Arabia invests heavily in data centers, Arabic language models, and partnerships...