Anthropic’s latest and greatest Claude 3 actually does not beat GPT 4 at all

OpenAI’s rival startup Anthropic just released its latest and greatest AI model earlier this week called Claude 3. Like with most other major large language models (LLMs), Claude 3 also comes in three tiers called Haiku, Sonnet, and Opus, with the latter being the most powerful version of the three.

According to the benchmarks released by the company, their flagship model, Opus, surpasses GPT-4 in performance. However, a detailed examination paints a more complicated picture: Anthropic’s comparison was made with the original iteration of GPT-4, omitting comparisons with subsequent updates such as GPT-4 Turbo.

Up until now, OpenAI’s benchmark releases have been limited to the older GPT-4 model, with access restricted to API users. However, some performance results for GPT-4 Turbo, not officially released by OpenAI, have been gathered by AI researcher Lawrence Chan.

Reviewing these statistics reveals a consistent outcome: In every comparison between Claude 3 and GPT-4 Turbo, OpenAI’s version outshines Anthropic’s leading model, albeit by a slim margin. Given the narrow difference in performance, the decision over which model reigns superior often boils down to the specific application it’s used for and largely comes down to personal preference.

However, this new finding does confirm that Claude 3 is, in fact, not ahead of GPT 4 at all, but it is most likely just another rival, which most likely excels in its own specific applications.

More About Claude 3

Regardless of how Claude 3 compares to rival LLMs, it still appears to be a notable step ahead of its predecessor according to Anthropic’s claims.

Opus is acclaimed for its enhanced intelligence, matching the speed of Claude 2.1, while Haiku is praised for its almost instantaneous responsiveness. On the other hand, Sonnet is recognized for delivering performance that is double the speed of Claude 2, along with improved intelligence.

Anthropic asserts that each tier of Claude 3 is enhanced with better analytics and forecasting skills, along with the capacity to produce complex content and code. These models also excel in conducting conversations across several languages other than English, such as Spanish, Japanese, and French.

Additionally, they are skilled at handling diverse types of visual data, ranging from photographs to charts, graphs, and intricate engineering drawings.