Chatbots like ChatGPT, Claude, and a few others have found great value in a variety of situations. As a result of frequent users, OpenAI’s ChatGPT became the consumer app with the quickest growth (at the time) to 100 million users, and its website continues to generate billions of monthly views.
However, they still have a lot of weaknesses, one of which is the ability to solve mathematical issues. And there’s an explanation for that. These large language models that power these chatbots often use sophisticated algorithms to produce logical and frequently natural-sounding sentences after being trained on vast volumes of textual data.
In contrast, the study of math focuses on numbers, quantities, and equations. It calls for the use of logic, critical analysis, and problem-solving abilities. One needs a solid grasp of mathematical principles and the capacity to use them to resolve challenging problems in order to perform well in math.
In other words, while a language model might be able to produce prose that appears to understand math, it cannot perform math.
You.com, an AI-powered search engine that was launched last year, is (sort of) aiming to change that with its AI agent. Launched on Thursday, YouAgent can write and run (Phyton) code to solve STEM questions with high accuracy. The startup claims a higher accuracy than any other ‘pure LLM chatbot’ on the market.
Anyone can use YouAgent by starting their query with “@agent” or “/agent” in You.com’s chat interface. This tells You.com the user wants it to execute Python code to generate the answer, going beyond just text.
In benchmarks, You.com claims YouAgent outperformed other consumer AI systems like Google, ChatGPT, and Bing on STEM questions by 27% on an ACT math test (the mathematics section of the American College Testing standardized test). The company says this is because other chatbots rely solely on the GPT-4 model and cannot perform computations like YouAgent can through code execution.
The company also shared a few examples to illustrate this. Here’s one of it. Please click on the image to see thee full-resolution version.
However, You.com admits YouAgent is not perfect as it hasn’t achieved 100 percent accuracy on their benchmarks. The system will sometimes try to use code even when not needed.
The company plans to continue improving YouAgent’s judgment on when code execution is required. Support for file uploads, image outputs such as plots and graphs, more mathematical and scientific libraries, better formatting, and improved performance are also on the roadmap.