CUA for Personalized Feeds.
Control your Mac with natural language. Type what you want, watch it happen.
Training diffusion models with flow matching to draw circles using QuickDraw data.
A System I diffusion head for precise click predictions on top of frozen VLMs.
World models for computer use evaluation.
Minecraft computer use agent that can perceive at 20Hz and actions at 30Hz.
My journey into computer use models and what I've learned about training VLAs for digital agents.
Bipedal Walking Using RL: Deep Dive with Code
Moravecs Paradox: Spelled Out with Code
The Sim2Real Gap for Robot Locomotion Explained
Search via natural language for something you like on maps!
24/7 agent that scrapes the internet for potential candidates for your organization.
LLM for enterprise codebase assistance.
My master thesis at the Rainbow team at Inria, France.
Infra for large scale web applications.
Vision based navigation.
Song to song conversion powered by CycleGAN.