Mistral AI’s new open-source Mixtral 8x7B LLM beats GPT 3.5 and LLaMA 2

Right after announcing its successful funding round boosting the company’s valuation to $2 billion, Mistral AI, the French AI startup, has released its latest AI model, Mixtral 8x7B, which outperforms OpenAI’s GPT 3.5 and Meta’s LLaMA 2 in several benchmarks.

Mixtral 8x7B is a fully open-source sparse mixture of experts model (SMoE) that has been launched under the Apache 2.0 license. The company has casually dropped a torrent link on its social media accounts without any comments.

https://twitter.com/MistralAI/status/1733150512395038967

Mistral’s press release says: “Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT3.5 on most standard benchmarks.”

Mixtral has a context window of 32,000 input tokens, which is the same as GPT 4’s context length, but falls short of GPT 4 Turbo’s impressive 128,000 token windows. Mixtral is capable of working with English, French, Italian, German, and Spanish languages. It also shows “strong performance’ in code generation. The model can be refined to follow instructions effectively, reaching a score of 8.3 on MT-Bench.

Thanks to its small footprint, Mixtral can also run locally on machines without dedicated GPUs such as Apple Mac computers with M2 Ultra chips.

Mixtral 8x7B is capable of beating GPT 3.5 and LLaMA 2 in 4 industry standard benchmarks including MMLU, ARC Challenge, MBPP, and GSM-8K. In other tests, Mixtral is still close behind its Meta and OpenAI rivals. Here are the results.

It also displays better performance than Meta’s LLaMA 2 when it comes to hallucination and biases, which translates to false and opinionated information among AI models. This means Mixtral can provide more correct information than LLaMA 2 70B.

The results shown below were conducted on TruthfulQA/BBQ/BOLD benchmarks.

Last but not least, it is worth mentioning that, unlike OpenAI’s ChatGPT which follows strict content policies, Mixtral has no safety guardrails, meaning it can be used to produce unsafe or NSFW content that would be rejected by other models. But this also means that Mixtral will have to face challenges from policymakers and regulators.