Vy

Today, we're excited to share the first glimpse of Vercept with the world.

What is our vision?

We started Vercept with the mission of reinventing how humans use computers—enabling everyone to accomplish orders of magnitude more than what they can do today.

Human-computer interaction is overdue for a fundamental shift. No one should have to navigate a maze of menus or browse frustrating help forums just to do a simple task—using a computer should feel effortless, like commanding an extension of your mind.

We want to reshape how people interact with technology—enabling them to tackle problems once considered too complex, too time-consuming, or too technical to attempt. If you've ever dreamed of doing more, with less effort, and expanding what you're capable of, we're building Vercept for you.

Where are we today?

As researchers and builders, we've spent years inventing AI models that see and act in the world. Now, we've created one that understands your computer screen—and how to interact with it.

In just a few months, with a small, fast-moving team, we've developed a model that bridges vision, language, and action. It understands what's on your screen and intelligently interacts with the right UI elements, responding to your natural-language commands. It works across a wide range of software and platforms. While it's still in its early days, and there's much work left ahead, we're often surprised by how broadly and intuitively it already performs.

This isn't just a demo. It's the foundation for a completely new interaction paradigm, where computers respond to your intentions, not your clicks.

Today, we're launching a technology preview: Vy, a native Mac app powered by our model's advanced interaction capabilities and frontier reasoning agents. Vy runs directly on your machine. It works with your actual software, on your screen. It doesn't require login credentials when assisting you—it can use any website and software that you're already signed in to. You tell it what to do, in your own words, and it gets things done—on your device, on your terms.

This is just the beginning.

We're releasing Vy to show what's possible and invite you to imagine what comes next. Whether you're a developer, an LLM power user, or someone fascinated by intelligent software agents, we believe you'll find something magical here.

In a few days, we'll open up the API so you can build with it too.

Download the app. Try it out. Tell us what works, what doesn't, and what you'd love to see next. We believe in building with the community—not just for it.

Let's bring this magical vision to life together.

Welcome to Vercept.

Questions you might ask

Who should use the Vy app?

  • Anyone interested in glimpsing the future of human-computer interaction.
  • Anyone wanting to streamline long or repetitive tasks.

Who should use the API?

The API (to be released soon) will allow developers to map natural language expressions to an (x, y) location in a screenshot. For example: "Click on the 'x' to close the popup" will be resolved by the API to the correct location, say "(55.7, 35.1)", enabling lower-level software (e.g., pyautogui, playwright, a native OS library) to close the popup. Developers looking for an API that deeply understands screen interfaces, identifies the correct UI elements, and responds accurately to natural-language commands should find our API useful. We envision developers using our API to build a wide range of products and applications, for example: automatic UI test suites, computer and web use agents, RPA solutions, and so on. The API is powered by our state-of-the-art model, VyUI, which we are improving every day. We generally expect it to be more capable and robust than approaches based on the popular set-of-marks paradigm.

How does VyUI compare to competitors?

BenchmarkVyUIOpenAIGoogleAnthropicAmazonBest of the rest
ScreenSpot v1
92.0%
18.3%
GPT-4o
84.0%
Project Mariner
82.9%
CUA
-
89.5%
UI-TARS-72B
ScreenSpot v2
94.7%
87.9%
Operator
---
94.2%
UI-TARS-1.5-72B
ScreenSpot Pro
63.0%
23.4%
Operator
-
17.1%
CUA
-
61.6%
UI-TARS-1.5-72B
Showdown Click dev
78.5%
64.3%
Operator
33.4%
Gemini 2.0 flash
53.7%
3.7 Sonnet
-
77.6%
ACE medium
GroundUI Web
84.8%
82.3%
Operator
35.2%
Gemini 1.5 Pro
82.5%
3.7 Sonnet
80.5%
Nova Act
64.3%
SeeClick

Note: Empty cells (-) indicate no data available for that model/benchmark combination.

Model variants are indicated in smaller text where applicable.

Limitations

  • Vy can make mistakes. We recommend monitoring Vy while it works, until you're confident that it can perform your task unmonitored.
  • Vy is slower than we'd like it to be. We expect it to get much faster in the future as we optimize it further and as frontier models become faster and more capable.
  • While Vy is performing a task, you will not be able to use your computer without interrupting it. Heavy users may want to run Vy in a virtualized OS, either on your personal computer or in the cloud.

Meet Our Team

Our founding team is world-class in building AI.

Kiana Ehsani
Kiana Ehsani hover

Previously Senior Research Scientist at AI2, leading research in computer vision, deep learning, and AI agents. Recent best papers at CoRL 2024 (Poliformer), IROS 2024 (HarmonicMM), and ICRA 2024 (RT-X). Ph.D. from UW CSE.

Matt Deitke
Matt Deitke hover

Matt Deitke

Co-founder

Led development of major AI research projects including Molmo, ProcTHOR (NeurIPS Outstanding Paper), and Objaverse. Previously joined AI2 at 18 and is a Ph.D. dropout at UW CSE.

Luca Weihs
Luca Weihs hover

Luca Weihs

Co-founder

Previously Research Manager and Infrastructure Team Lead at AI2. Led work in AI agents and reinforcement learning. Recent work includes PoliFormer, SPOC, and AllenAct. UW Stats Ph.D. and UC Berkeley Math valedictorian.

Ross Girshick
Ross Girshick hover

Ross Girshick

Co-founder

The 18th most cited person in the history of science. Pioneered computer vision with deep learning (Faster R-CNN, Mask R-CNN, Segment Anything). Previously Research Scientist at Meta AI and AI2, and Postdoc at UC Berkeley.

Oren Etzioni
Oren Etzioni hover

Oren Etzioni

Co-founder

Founding CEO of AI2 (2013-2022). Pioneer in natural language processing and machine learning. Professor Emeritus at UW with h-index of 100+. Founded multiple successful companies including Farecast (acquired by Microsoft).

Kuo-Hao Zeng
Kuo-Hao Zeng hover

Kuo-Hao Zeng

Founding Research Scientist

Previously Research Scientist at AI2. I train policies with IL/RL in simulation and deploy them to the real world. Led CoRL 2024 Best Paper (PoliFormer). Ph.D. from UW CSE.

Eric Kolve
Eric Kolve hover

Eric Kolve

Founding Engineer

Previously, Eric held the position of Principal Software Engineer at the Allen Institute for Artificial Intelligence (AI2), contributing to cutting-edge research in embodied AI and collaborative agent systems.​

Harshitha Rebala
Harshitha Rebala hover

Harshitha Rebala

Member of Technical Staff

Undergraduate at the University of Washington studying Computer Science. Previous Microsoft SWE intern and Machine Learning researcher at KurtLab, UW.

Cam Sloan
Cam Sloan hover

Cam Sloan

Founding Frontend Engineer

Cam Sloan is a software engineer and entrepreneur. He previously co-founded Hopscotch, a user onboarding platform for SaaS companies.

You

You

Dream role

Self-driven and passionate about building cutting-edge AI products. You'll help bring our advanced AI systems to millions of users worldwide.