← Back to TESS
Active

TESS Models

VLA models for computer-use trained to control desktop applications via mouse and keyboard.

TESS Models are Vision-Language-Action models trained to control computers like humans. The models take screenshots as input and output mouse/keyboard actions.

Current Focus

  • Training VLA models on computer-use datasets
  • Building infrastructure for large-scale data collection
  • Benchmarking against existing computer-use agents

Architecture

TESS uses a vision encoder to process screenshots, a language model for reasoning, and an action head for predicting mouse coordinates and keyboard inputs.