![]()
Microsoft wants to offer the ‘most complete AI and app agent factory’.
Microsoft has released three new AI foundational models, created in-house, in a move that places the company in direct competition with enterprise AI rivals, despite its deep ties with OpenAI.
The new foundational models target three of the most commercially viable modalities: transcription, voice and images. The models are already powering Microsoft’s products, including Copilot, Bing and Azure Speech, the company said, and will be available in a preview via the Microsoft Foundry and MAI Playground.
With this, Microsoft is furthering its goals of delivering “the most complete AI and app agent factory”, it said.
‘MAI-Transcribe-1’ is a first-generation speech recognition model expected to deliver “enterprise-grade accuracy” across 25 languages at around 50pc lower GPU costs than its alternatives. The model scores lower than 4pc average ‘word error rate’ on accuracy benchmarks, while GPT-Transcribe is at 4.2pc and Gemini 3.1 Flash is at 4.9pc.
‘MAI-Voice-1’ is a speech generation model that, according to Microsoft, can produce 60 seconds of expressive audio in under one second on a single GPU.
Together, the two models are meant to deliver an audio AI stack capable of assisting in call-centre workflows and other voice-driven services, such as providing live captioning, automatic subtitling and converting interactions into structured data for research.
Microsoft’s second-generation image model, ‘MAI-Image-2’, is expected to offer artists a way to “explore” different visual directions. The model is created in “close collaboration” with artists, the company said, and is meant to help enterprises create branding and communication material.
MAI-Image-2 debuted in third spot on the Arena.ai leaderboard for image model families, and is currently ranked fifth.
Microsoft, valued at $2.7trn, already offers several AI-embedded apps and platform services. Its Copilot Studio lets users build agents, while the Foundry services offer a place to train and scale models.
Meanwhile, a recently announced Copilot integration with Anthropic’s Claude Cowork is meant to target the growing demand for autonomous agents.
Microsoft backed OpenAI in its recent $122bn funding round alongside the likes of Amazon, Nvidia and SoftBank. Late last year, the company announced a $10bn investment plan for a data centre in Portugal. It also announced a $37.5bn quarterly capital expenditure bill at the end of January.
Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.


