MinecraftVLA: Vision-Language-Action Model for Playing Minecraft
Minecraft computer use agent that can perceive at 20Hz and actions at 30Hz.
Replicated Lumine paper from ByteDance training recipe on Minecraft, created stage 1/2/3 datasets by using the same training phases recipe from Lumine using Qwen3.

Datasets: