Large Action (Agenic) Models

Wow, these are pretty cool. Basically taking multi-modal models like ChatGPT, and leveraging their combined image and language capabilities to use them to navigate software interfaces.

Rabbit R1 (link) is the latest example, which uses a Large Action Model to learn how you use your mobile phone and learn how to navigate and use apps, and then perform tasks on your behalf.

Its kinda like combining RPA with AI, where it interprets the interface rather than just pressing buttons blindly. There is a lot of opportunity here around automated troubleshooting.

I can imagine a use case where proprietary complex software, combined with hardware, perhaps in medicine, might require highly skilled troubleshooting support. Rather than depending on people in a contact center, there is first level support delivered by an AI agent.

If we can give the AI agent access to the interface, it can automatically navigate and troubleshoot on its own, given a description of the problem, otherwise it can describe the steps incrementally via a chat/speech interface.

Ultimately there is a business model where the troubleshooting support services is outsourced to a third party, that runs and manages AI bots for that purpose. Or alternatively, the AI troubleshooting model is bundled with the technology for a kind of self healing app.

You could even have the AI generate training packs based on the specific troubleshooting issue that the individual was having.

You could identify troubleshooting trends and issues which are affecting large numbers of people, and even come up with a list of recommended improvements to increase reliability and user satisfaction.

I could only see this working as an outsourced model however as running this in-house would require and AI/ML Ops capability, which is very hard, and potentially costly to do.

Found this interesting link for one approach to using prompts to solve this problem (link).

Large Action (Agenic) Models

Comments

Leave a Reply Cancel reply