Unbounded: A Generative Infinite Game of Character Life Simulation

Abstract

We introduce the concept of a generative infinite game, a video game that transcends the traditional boundaries of finite, hard-coded systems by using generative models. Inspired by James P. Carse's distinction between finite and infinite games, we leverage recent advances in generative AI to create Unbounded: a game of character life simulation that is fully encapsulated in generative models. Specifically, Unbounded draws inspiration from sandbox life simulations and allows you to interact with your autonomous virtual character in a virtual world by feeding, playing with and guiding it - with open-ended mechanics generated by an LLM, some of which can be emergent. In order to develop Unbounded, we propose technical innovations in both the LLM and visual generation domains. Specifically, we present: (1) a specialized, distilled large language model (LLM) that dynamically generates game mechanics, narratives, and character interactions in real-time, and (2) a new dynamic regional image prompt Adapter (IP-Adapter) for vision models that ensures consistent yet flexible visual generation of a character across multiple environments. We evaluate our system through both qualitative and quantitative analysis, showing significant improvements in character life simulation, user instruction following, narrative coherence, and visual consistency for both characters and the environments compared to traditional related approaches.

Method

Task Overview

Based on an initial user input, Unbounded sets up game simulation environments, and generates character actions in the environments. Users can interact with the character with natural language instructions, exploring the game with unlimited options.

Regional IP-Adapter with Block Drop for Environment Consistency

(a) We achieve real-time image generation with LCM LoRA, maintain character consistency with DreamBooth LoRAs, and introduce a regional IP-Adapter (shown in (c)) for improved environment and character consistency.

(b) Our proposed dynamic mask genreation separating the environment and character conditioning, preventing interference between the two.

(c) Our approach introduces dual-conditioning and dynamic regional injection mechanism to represent both character and environment simultaneously in the generated images.

Language Model Game Engine with Open Ended Interactions and Integrated Game Mechanics

Overview of our user-simulation data collection process for LLM distillation.

(a) We begin by collecting diverse topic and character data, filtered using ROUGE-L for diversity.

(b) The World LLM and User LLM interact to generate user-simulation data through multi-round exchanges.

Results

Comparison with Different Approaches for Maintaining Environment Consistency and Character Consistency

Our regional IP-Adapter with block drop consistently generates images with high character consistency, whereas other methods may fail to include the character or generate characters with inconsistent appearances (Example 1 & 2). Furthermore, we show that our approach balances environment consistency and character consistency well, while other approaches might generate environments that differ from the condition environment (e.g., StoryDiffusion in Example 1 & 3).

Effectiveness of Dynamic Regional IP-Adapter with Block Drop

Conditioning on the environment using IP-Adapter achieves good environment reconstruction, but the character consistency is influenced by the environment style. Introducing block drop improves adherence to the text prompt, resulting in images with the correct spatial layout for both the character and the environment. However, the character's appearance remains influenced by the surrounding environment. By incorporating our proposed regional injection mechanism with our proposed dynamic mask scheme, the generated images achieve strong character consistency while maintaining effective conditioning on the environment.

Effectiveness of Distilling Specialized Large Language Model

Our diverse user-simulator interaction data effectively distills Gemma-2B into a capable game engine. In this table, zero-shot inference with small LLMs (i.e., Gemma-2B, Llama3.2-3B), or a slightly larger LLM (i.e., Gemma-7B) results in lower performance compared to ours, highlighting the importance of distillation from a stronger LLM for game world and character action simulation. Furthermore, we show that our model achieves performance comparable to GPT-4o, validating the effectiveness of our approach.

More Generative Game Examples

BibTeX

@article{li2024unbounded,
  author    = {Jialu Li and Yuanzhen Li and Neal Wadhwa and Yael Pritch and David E. Jacobs and Michael Rubinstein and Mohit Bansal and Nataniel Ruiz},
  title     = {Unbounded: A Generative Infinite Game of Character Life Simulation},
  journal   = {arxiv},
  year      = {2024},
  url       = {https://arxiv.org/abs/2410.18975}
}

Acknowledgement

We thank Shiran Zada, Peyman Millanfar, Shlomi Fruchter and Michael Goin for the thoughtful feedback.