Vy
Today, we're excited to share the first glimpse of Vercept with the world.
What is our vision?
We started Vercept with the mission of reinventing how humans use computers—enabling everyone to accomplish orders of magnitude more than what they can do today.
Human-computer interaction is overdue for a fundamental shift. No one should have to navigate a maze of menus or browse frustrating help forums just to do a simple task—using a computer should feel effortless, like commanding an extension of your mind.
We want to reshape how people interact with technology—enabling them to tackle problems once considered too complex, too time-consuming, or too technical to attempt. If you've ever dreamed of doing more, with less effort, and expanding what you're capable of, we're building Vercept for you.
Where are we today?
As researchers and builders, we've spent years inventing AI models that see and act in the world. Now, we've created one that understands your computer screen—and how to interact with it.
In just a few months, with a small, fast-moving team, we've developed a model that bridges vision, language, and action. It understands what's on your screen and intelligently interacts with the right UI elements, responding to your natural-language commands. It works across a wide range of software and platforms. While it's still in its early days, and there's much work left ahead, we're often surprised by how broadly and intuitively it already performs.
This isn't just a demo. It's the foundation for a completely new interaction paradigm, where computers respond to your intentions, not your clicks.
Introducing Vy, a native Mac app powered by our model's advanced interaction capabilities and frontier reasoning agents. Vy runs directly on your machine. It works with your actual software, on your screen. It doesn't require login credentials when assisting you—it can use any website and software that you're already signed in to. You tell it what to do, in your own words, and it gets things done—on your device, on your terms.
How does VyUI compare to competitors?
Benchmark | VyUI | OpenAI | Anthropic | Amazon | Best of the rest | |
---|---|---|---|---|---|---|
ScreenSpot v1 | 92.0% | 18.3% GPT-4o | 84.0% Project Mariner | 82.9% CUA | - | 89.5% UI-TARS-72B |
ScreenSpot v2 | 94.7% | 87.9% Operator | - | - | - | 94.2% UI-TARS-1.5-72B |
ScreenSpot Pro | 63.0% | 23.4% Operator | - | 17.1% CUA | - | 61.6% UI-TARS-1.5-72B |
Showdown Click dev | 78.5% | 64.3% Operator | 33.4% Gemini 2.0 flash | 53.7% 3.7 Sonnet | - | 77.6% ACE medium |
GroundUI Web | 84.8% | 82.3% Operator | 35.2% Gemini 1.5 Pro | 82.5% 3.7 Sonnet | 80.5% Nova Act | 64.3% SeeClick |
Note: Empty cells (-) indicate no data available for that model/benchmark combination.
Model variants are indicated in smaller text where applicable.