Inflection 2.5 LLM nearly rivals GPT 4 with only 40% of resources

Inflection, an innovative AI startup known for crafting advanced AI assistants, has recently launched Inflection 2.5. This new iteration of their large language model (LLM) boasts the capability to rival the performance of leading models such as GPT-4, doing so with remarkable efficiency. This is because it only requires 40% of GPT 4’s training resources.

Like most of its AI rivals, Inflection has also integrated its latest LLM Inflection 2.5 into its chatbot called Pi, which is available on mobile as well as a web version. According to Inflection, Pi has been designed to be “empathetic, helpful, and safe.”

It’s reported that Inflection 2.5 attains nearly 94% of GPT-4’s average performance but does so with just 40% of the computational effort usually needed for training. Inflection 2.5 has made notable progress, especially in the STEM areas (science, technology, engineering, and mathematics).

Although Inflection 2.5 can reach GPT 4 levels of performance in the popular MMLU language comprehension benchmark, it requires a more complicated prompting method to do so. This would also make comparisons with Inflection 1 unfair since those benchmarks involved using simpler prompts. Inflection 2 managed to hit a score of just under 80% using the industry standard 5-shot method, while Inflection 1 topped out at 72.7% with simpler prompts.

As for Inflection 2.5’s ability to follow a prompt and its conversational skills, the LLM achieved scores that placed it between GPT-3.5 and GPT-4, as per MT Bench.

Interestingly, Inflection’s evaluation process revealed that about a quarter of the tasks in the Reasoning, Mathematics, and Coding segments contained errors in the reference answers. The company has addressed these inaccuracies by releasing an updated version of the dataset, termed MT-Bench Corrected, further highlighting the challenges associated with the reliability of synthetic benchmarks.

It was also compared to GPT 4 in its performance in the Hungarian maths exam as well as the Physics GRE, which is an exam used for graduate studies in physics. Inflection-2.5 achieved a score in the 85th percentile among human participants on the physics exam, closely approaching the top score with the aid of an enhanced prompting technique, trailing slightly behind GPT-4.