Rabbit unveils R1 AI pocket assistant that can control all apps for you

After over a week of teasing its first-ever product, AI startup Rabbit took the stage at CES 2024 to announce the R1 pocket assistant. It is primarily meant to be a voice assistant that can control apps and complete tasks for users.

Design-wise, the Rabbit R1 looks like a playdate console with a small screen, a scrollable wheel, and a camera on the front. This screen is a tiny 2.88-inch touchscreen and the camera on the side is rotatable. The scroll wheel also acts as a button to navigate the UI and activate the built-in assistant. The device is powered by a 2.3 GHz MediaTek CPU alongside 4GB of RAM and 128GB storage.

Rabbit has not talked about battery specifications, but did claim that the R1 can last “all day.”

However, the device’s software and its AI features are what take the spotlight. The operating system is called Rabbit OS, and instead of a Large Language Model (LLM) like ChatGPT, the OS works on a Large Action Model (LAM). Unlike ChatGPT, which only responds to user queries with text or images, Rabbit OS can perform tasks for users, such as making calls, controlling music playback, and more. The best way to describe it is as a universal controller for apps.

Jesse Lyu, the CEO and founder of Rabbit said: “We wanted to find a universal solution just like large language models. How can we find a universal solution to actually trigger our services, regardless of whether you’re a website or an app or whatever platform or desktop.”

The concept behind Rabbit OS is similar to the likes of Alexa or Google Assistant in terms of functionality and convenience. At its core, Rabbit OS is able to manage music, summon a ride, purchase groceries, or send messages under a single interface without having to juggle between different apps.

Users can simply voice their requests, and the device takes care of the rest. It’s a paradigm shift that promises to liberate users from the often overwhelming world of digital applications and services.

The R1 device’s on-screen interface is designed with user-friendliness in mind, presenting a series of category-based cards that cater to different needs, such as music, transportation, or video chats. The screen also serves as a means for users to verify the model’s output.

Rabbit’s approach to enhancing the capabilities of its R1 device takes a distinctive path, setting it apart from the conventional strategies of building APIs and persuading developers to integrate their apps. Instead, Rabbit has taken a more proactive approach by training its model to independently navigate existing apps.

The Large Action Model, or LAM, working at its core is a neural network that has been trained through human interactions with popular apps like Spotify and Uber. These interactions served as a blueprint for the model, providing it with a deep understanding of how these apps function. The LAM learned to recognize crucial elements like the Settings icon, how to discern when an order is confirmed, and the whereabouts of search menus. Importantly, this knowledge can be applied universally to any app

There is also a dedicated training mode within Rabbit OS, which allows users to teach R1 how to execute tasks so they can be repeated in the future. For instance, a user can teach R1 to edit an image inside Photoshop.

According to Lyu, it is capable of taking instructions such as: “‘Hey, first of all, go to a software called Photoshop. Open it. Grab your photos here. Make a lasso on the watermark and click click click click. This is how you remove the watermark.”

The R1 is available for pre-order now for $199 and will start shipping to customers in March.