← Back to Projects

MinecraftVLA: Vision-Language-Action Model for Playing Minecraft

Minecraft computer use agent that can perceive at 20Hz and actions at 30Hz.

Replicated Lumine paper from ByteDance training recipe on Minecraft, created stage 1/2/3 datasets by using the same training phases recipe from Lumine using Qwen3.

MinecraftVLA Training

Datasets: