Active
TESS Models
VLA models for computer-use trained to control desktop applications via mouse and keyboard.
TESS Models are Vision-Language-Action models trained to control computers like humans. The models take screenshots as input and output mouse/keyboard actions.
Current Focus
- Training VLA models on computer-use datasets
- Building infrastructure for large-scale data collection
- Benchmarking against existing computer-use agents
Architecture
TESS uses a vision encoder to process screenshots, a language model for reasoning, and an action head for predicting mouse coordinates and keyboard inputs.