Founded by alums from Google’s DeepMind and Meta, Paris-based startup Mistral AI has consistently made waves in the AI community since 2023.
Mistral AI first caught the world’s attention with its debut model, Mistral 7B, released in 2023. This 7-billion parameter model quickly gained traction for its impressive performance, surpassing larger models like Llama 2 13B in various benchmarks and even rivaling Llama 1 34B in many metrics. What set Mistral 7B apart was not just its performance, but also its accessibility – the model could be easily downloaded from GitHub or even via a 13.4-gigabyte torrent, making it readily available for researchers and developers worldwide.
The company’s unconventional approach to releases, often foregoing traditional papers, blogs, or press releases, has proven remarkably effective in capturing the AI community’s attention. This strategy, coupled with their commitment to open-source principles, has positioned Mistral AI as a formidable player in the AI landscape.
Mistral AI’s rapid ascent in the industry is further evidenced by their recent funding success. The company achieved a staggering $2 billion valuation following a funding round led by Andreessen Horowitz. This came on the heels of a historic $118 million seed round – the largest in European history – showcasing the immense faith investors have in Mistral AI’s vision and capabilities.
Beyond their technological advancements, Mistral AI has also been actively involved in shaping AI policy, particularly in discussions around the EU AI Act, where they’ve advocated for reduced regulation in open-source AI.
Now, in 2024, Mistral AI has once again raised the bar with two groundbreaking models: Mistral Large 2 (also known as Mistral-Large-Instruct-2407) and Mistral NeMo. In this comprehensive guide, we’ll dive deep into the features, performance, and potential applications of these impressive AI models.
Key specifications of Mistral Large 2 include:
- 123 billion parameters
- 128k context window
- Support for dozens of languages
- Proficiency in 80+ coding languages
- Advanced function calling capabilities
The model is designed to push the boundaries of cost efficiency, speed, and performance, making it an attractive option for both researchers and enterprises looking to leverage cutting-edge AI.
Mistral NeMo: The New Smaller Model
While Mistral Large 2 represents the best of Mistral AI’s large-scale models, Mistral NeMo, released on July, 2024, takes a different approach. Developed in collaboration with NVIDIA, Mistral NeMo is a more compact 12 billion parameter model that still offers impressive capabilities:
- 12 billion parameters
- 128k context window
- State-of-the-art performance in its size category
- Apache 2.0 license for open use
- Quantization-aware training for efficient inference
Mistral NeMo is positioned as a drop-in replacement for systems currently using Mistral 7B, offering enhanced performance while maintaining ease of use and compatibility.
Key Features and Capabilities
Both Mistral Large 2 and Mistral NeMo share several key features that set them apart in the AI landscape:
- Large Context Windows: With 128k token context lengths, both models can process and understand much longer pieces of text, enabling more coherent and contextually relevant outputs.
- Multilingual Support: The models excel in a wide range of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Arabic, and Hindi.
- Advanced Coding Capabilities: Both models demonstrate exceptional proficiency in code generation across numerous programming languages.
- Instruction Following: Significant improvements have been made in the models’ ability to follow precise instructions and handle multi-turn conversations.
- Function Calling: Native support for function calling allows these models to interact dynamically with external tools and services.
- Reasoning and Problem-Solving: Enhanced capabilities in mathematical reasoning and complex problem-solving tasks.
Let’s delve deeper into some of these features and examine how they perform in practice.
Performance Benchmarks
To understand the true capabilities of Mistral Large 2 and Mistral NeMo, it’s essential to look at their performance across various benchmarks. Let’s examine some key metrics:
Mistral Large 2 Benchmarks
This table presents the proficiency of various LLMs in different programming languages. Models like Mistral Large 2 (24.07), Llama 3.1 (405B), and GPT-4o are evaluated on their ability to understand and generate code in languages such as Python, C++, Bash, Java, TypeScript, PHP, and C#.
Llama 3.1 405B shows strong performance across multiple languages, with particularly high scores in Python and Java. This proficiency is critical for applications involving code generation, debugging, and software development, making these models valuable tools for developers.
Mistral NeMo Benchmarks
While specific benchmark scores for Mistral NeMo are not provided in the given information, it’s described as offering state-of-the-art performance in its size category. The model outperforms other recent open-source pre-trained models like Gemma 2 9B and Llama 3 8B in various tasks.
Multilingual and Coding Prowess
One of the standout features of both Mistral Large 2 and Mistral NeMo is their exceptional multilingual capabilities. This is particularly important in our increasingly globalized world, where AI models need to understand and generate content in multiple languages.
Mistral Large 2, for instance, supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean. This multilingual proficiency extends beyond mere translation, encompassing nuanced understanding and generation in these languages.
To illustrate this, let’s look at an example of Mistral Large 2 generating text in multiple languages:
[INST] Generate a short greeting in English, French, German, Spanish, and Japanese. [/INST]
Here are short greetings in the requested languages:
English: Hello! How are you today?
French: Bonjour ! Comment allez-vous aujourd’hui ?
German: Hallo! Wie geht es Ihnen heute?
Spanish: ¡Hola! ¿Cómo estás hoy?
Japanese: こんにちは!今日はお元気ですか?
[INST]
Now, write a simple “Hello, World!” program in Python, Java, and JavaScript. [/INST]
These examples showcase both the multilingual text generation and coding capabilities of the model.
Both models are available on various platforms such as Hugging Face, Mistral AI’s platform, and major cloud service providers like Google Cloud Platform, Azure AI Studio, Amazon Bedrock, and IBM watsonx.ai (Mistral AI | Frontier AI in your hands).
The Agentic Paradigm and Function Calling
Both Mistral Large 2 and Mistral NeMo embrace an agentic-centric design, which represents a paradigm shift in how we interact with AI models. This approach focuses on building models capable of interacting with their environment, making decisions, and taking actions to achieve specific goals.
A key feature enabling this paradigm is the native support for function calling. This allows the models to dynamically interact with external tools and services, effectively expanding their capabilities beyond simple text generation.
Let’s look at an example of how function calling might work with Mistral Large 2:
from mistral_common.protocol.instruct.tool_calls import Function, Tool from mistral_inference.transformer import Transformer from mistral_inference.generate import generate from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest # Initialize tokenizer and model mistral_models_path = "path/to/mistral/models" # Ensure this path is correct tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3") model = Transformer.from_folder(mistral_models_path) # Define a function for getting weather information weather_function = Function( name="get_current_weather", description="Get the current weather", parameters={ "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA", }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the user's location.", }, }, "required": ["location", "format"], }, ) # Create a chat completion request with the function completion_request = ChatCompletionRequest( tools=[Tool(function=weather_function)], messages=[ UserMessage(content="What's the weather like today in Paris?"), ], ) # Encode the request tokens = tokenizer.encode_chat_completion(completion_request).tokens # Generate a response out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.7, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id) result = tokenizer.decode(out_tokens[0]) print(result)
In this example, we define a function for getting weather information and include it in our chat completion request. The model can then use this function to retrieve real-time weather data, demonstrating how it can interact with external systems to provide more accurate and up-to-date information.
Tekken: A More Efficient Tokenizer
Mistral NeMo introduces a new tokenizer called Tekken, which is based on Tiktoken and trained on over 100 languages. This new tokenizer offers significant improvements in text compression efficiency compared to previous tokenizers like SentencePiece.
Key features of Tekken include:
- 30% more efficient compression for source code, Chinese, Italian, French, German, Spanish, and Russian
- 2x more efficient compression for Korean
- 3x more efficient compression for Arabic
- Outperforms the Llama 3 tokenizer in compressing text for approximately 85% of all languages
This improved tokenization efficiency translates to better model performance, especially when dealing with multilingual text and source code. It allows the model to process more information within the same context window, leading to more coherent and contextually relevant outputs.
Licensing and Availability
Mistral Large 2 and Mistral NeMo have different licensing models, reflecting their intended use cases:
Mistral Large 2
- Released under the Mistral Research License
- Allows usage and modification for research and non-commercial purposes
- Commercial usage requires a Mistral Commercial License
Mistral NeMo
- Released under the Apache 2.0 license
- Allows for open use, including commercial applications
Both models are available through various platforms:
- Hugging Face: Weights for both base and instruct models are hosted here
- Mistral AI: Available as
mistral-large-2407
(Mistral Large 2) andopen-mistral-nemo-2407
(Mistral NeMo) - Cloud Service Providers: Available on Google Cloud Platform’s Vertex AI, Azure AI Studio, Amazon Bedrock, and IBM watsonx.ai
For developers looking to use these models, here’s a quick example of how to load and use Mistral Large 2 with Hugging Face transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "mistralai/Mistral-Large-Instruct-2407" device = "cuda" # Use GPU if available # Load the model and tokenizer model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) # Move the model to the appropriate device model.to(device) # Prepare input messages = [ {"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "Explain the concept of neural networks in simple terms."} ] # Encode input input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device) # Generate response output_ids = model.generate(input_ids, max_new_tokens=500, do_sample=True) # Decode and print the response response = tokenizer.decode(output_ids[0], skip_special_tokens=True) print(response)
This code demonstrates how to load the model, prepare input in a chat format, generate a response, and decode the output.