Vy

Today, we're excited to share the first glimpse of Vercept with the world.

What is our vision?

We started Vercept with the mission of reinventing how humans use computers—enabling everyone to accomplish orders of magnitude more than what they can do today.

Human-computer interaction is overdue for a fundamental shift. No one should have to navigate a maze of menus or browse frustrating help forums just to do a simple task—using a computer should feel effortless, like commanding an extension of your mind.

We want to reshape how people interact with technology—enabling them to tackle problems once considered too complex, too time-consuming, or too technical to attempt. If you've ever dreamed of doing more, with less effort, and expanding what you're capable of, we're building Vercept for you.

Where are we today?

As researchers and builders, we've spent years inventing AI models that see and act in the world. Now, we've created one that understands your computer screen—and how to interact with it.

In just a few months, with a small, fast-moving team, we've developed a model that bridges vision, language, and action. It understands what's on your screen and intelligently interacts with the right UI elements, responding to your natural-language commands. It works across a wide range of software and platforms. While it's still in its early days, and there's much work left ahead, we're often surprised by how broadly and intuitively it already performs.

This isn't just a demo. It's the foundation for a completely new interaction paradigm, where computers respond to your intentions, not your clicks.

Today, we're launching a technology preview: Vy, a native Mac app powered by combining our model's advanced interaction capabilities with frontier reasoning agents. Vy runs directly on your machine. It works with your actual software, on your screen. It doesn't require login credentials when assisting you—it can use any website and software that you're already signed in to. You tell it what to do, in your own words, and it gets things done—on your device, on your terms.

This is just the beginning.

We're releasing Vy to show what's possible and invite you to imagine what comes next. Whether you're a developer, an LLM power user, or someone fascinated by intelligent software agents, we believe you'll find something magical here.

In a few days, we'll open up the API so you can build with it too.

Download the app. Try it out. Tell us what works, what doesn't, and what you'd love to see next. We believe in building with the community—not just for it.

Let's bring this magical vision to life together.

Welcome to Vercept.

Questions you might ask

Who should use the Vy app?

  • Anyone interested in glimpsing the future of human-computer interaction.
  • Anyone wanting to streamline long or repetitive tasks.

Who should use the API?

The API (to be released soon) will allow developers to map natural language expressions to an (x, y) location in a screenshot. For example: "Click on the 'x' to close the popup" will be resolved by the API to the correct location, say "(55.7, 35.1)", enabling lower-level software (e.g., pyautogui, playwright, a native OS library) to close the popup. Developers looking for an API that deeply understands screen interfaces, identifies the correct UI elements, and responds accurately to natural-language commands should find our API useful. We envision developers using our API to build a wide range of products and applications, for example: automatic UI test suites, computer and web use agents, RPA solutions, and so on. The API is powered by our state-of-the-art model, VyUI, which we are improving every day. We generally expect it to be more capable and robust than approaches based on the popular set-of-marks paradigm.

How does VyUI compare to competitors?

BenchmarkVyUIOpenAIGoogleAnthropicAmazonBest of the rest
ScreenSpot v1
92.0%
18.3%
GPT-4o
84.0%
Project Mariner
82.9%
CUA
-
89.5%
UI-TARS-72B
ScreenSpot v2
94.7%
87.9%
Operator
---
94.2%
UI-TARS-1.5-72B
ScreenSpot Pro
50.1%
23.4%
Operator
-
17.1%
CUA
-
61.6%
UI-TARS-1.5-72B
Showdown Click dev
78.5%
64.3%
Operator
33.4%
Gemini 2.0 flash
53.7%
3.7 Sonnet
-
77.6%
ACE medium
GroundUI Web
84.8%
82.3%
Operator
35.2%
Gemini 1.5 Pro
82.5%
3.7 Sonnet
80.5%
Nova Act
64.3%
SeeClick

Note: Empty cells (-) indicate no data available for that model/benchmark combination.

Model variants are indicated in smaller text where applicable.

Limitations

  • Vy can make mistakes. We recommend monitoring Vy while it works, until you're confident that it can perform your task unmonitored.
  • Vy is slower than we'd like it to be. We expect it to get much faster in the future as we optimize it further and as frontier models become faster and more capable.
  • While Vy is performing a task, you will not be able to use your computer without interrupting it. Heavy users may want to run Vy in a virtualized OS, either on your personal computer or in the cloud.