Generative Interactive Game World Modeling

Hunyuan-GameCraft-2

Instruction-following Interactive Game World Model

Research Team

Junshu Tang1 Jiacheng Liu1 Jiaqi Li1 Longhuang Wu Haoyu Yang Penghao Zhao Siruis Gong Xiang Yuan Shuai Shao Qinglin Lu2

Tencent Hunyuan

1Equal contribution · 2Corresponding author

We are looking for self-motivated interns focused on building game world models. Contact: juliatang@tencent.com.

Playground

Instruction-to-Action Playground

Tap any scenario capsule to swap the teaser playback. Each button represents a different interaction rendered by Hunyuan-GameCraft-2.

Abstract

Abstract

Recent advances in generative world models have enabled remarkable progress in creating open-ended game environments, evolving from static scene synthesis toward dynamic, interactive simulation. However, current approaches remain limited by rigid action schemas and high annotation costs, restricting their ability to model diverse in-game interactions and player-driven dynamics.

To address these challenges, we introduce Hunyuan-GameCraft-2, a new paradigm of instruction-driven interaction for generative game world modeling. Instead of relying on fixed keyboard inputs, our model allows users to control game video contents through natural language prompts, keyboard, or mouse signals, enabling flexible and semantically rich interaction within generated worlds.

We formally define the concept of Interactive Video Data and develop an automated pipeline that converts large-scale, unstructured text–video pairs into causally aligned interactive datasets. Built upon a 14B image-to-video Mixture-of-Experts (MoE) foundation model, our model incorporates a text-driven interaction injection mechanism for fine-grained control over camera motion, character behavior, and environment dynamics.

Extensive experiments demonstrate that our model generates long-horizon, temporally coherent, and causally grounded interactive game videos that faithfully respond to diverse user instructions such as “open the door,” “draw a torch,” or “trigger an explosion.”

@longvideo

Long Video Generation Gallery

@samples

Interaction Video Gallery