Meta recently unveiled its latest, but not its biggest yet open-source AI model, Llama 3, which surpasses Google Gemini’s 1.5 Pro. But now Microsoft has raised the bar with its new Phi-3, which is now its smallest AI model but beats Llama 3, at least on benchmarks.

According to internal tests performed by Microsoft, Phi-3 can beat much bigger language models such as OpenAI’s free GPT 3.5 and Mistral’s open-source Mixtral 8x7B. Its context length is as big as GPT 4 Turbo at 128K tokens.

But unlike bigger language models, Phi-3 is targeting consumer devices such as smartphones, which can easily run such models locally similar to Google’s Gemini Nano. Requiring only 1.8 GB of memory when reduced to 4 bits. It can process over 12 tokens per second on an iPhone 14 equipped with an A16 chip.

Speaking of its size, it is based on 3.8 billion training parameters but is still able to hit a 69% score on the MMLU language comprehension benchmark and 8.38 points on the MT benchmark.

Microsoft credits the performance of Phi-3 solely to the data used for its training. It builds upon the training method used for Phi-2 and Phi-1 and the data used consists of “education level”-filtered web and synthetic LLM-generated data.

By eliminating less relevant web data, such as sports scores, and concentrating on data that enhances knowledge and reasoning skills, the data set has been optimized to approach what Microsoft considers the “data optimum” for a compact model.

The Phi-3-small and Phi-3-medium models, which have 7 billion and 14 billion parameters respectively, and were trained using 4.8 trillion tokens, show performance comparable to the Phi-3-mini in benchmarks against similar class models. While smaller Phi models beat Llama 3’s lower class tiers, the bigger Phi-3 versions are not far behind other larger rival AI models in benchmark scores.

While benchmark scores and real-world application performance can differ, it’s still uncertain how much the open-source community will embrace the model.

Microsoft has designed Phi 3 to use a block structure and tokenizer similar to Meta’s Llama model, aiming to maximize its usability within the open-source community. This compatibility ensures that tools developed for the Llama 2 model series can be easily adapted for use with Phi-3-mini.