Unity: A General Platform for Intelligent Agents

Arthur Juliani; Vincent-Pierre Berges; Ervin Teng; Andrew Cohen; Jonathan Harper; Chris Elion; Chris Goy; Yuan Gao; Hunter Henry; Marwan Mattar; Danny Lange

Unity: A General Platform for Intelligent Agents

Arthur Juliani, Vincent-Pierre Berges, Ervin Teng, Andrew Cohen, Jonathan Harper, Chris Elion, Chris Goy, Yuan Gao, Hunter Henry, Marwan Mattar, Danny Lange

TL;DR

RL progress hinges on rich, configurable simulators; existing platforms often trade realism for speed or flexibility. The authors propose a taxonomy of simulators and argue that modern game engines, exemplified by Unity with the ML-Agents Toolkit, constitute a general platform for learning environments with rich sensory, physical, task, and social complexity. They analyze Unity's capabilities, present the toolkit architecture, and survey research enabled by Unity to highlight opportunities and current bottlenecks. The paper also outlines future directions, including effective learning environments, human-in-the-loop training, and scaling agent-human collaboration to support robust progress toward general intelligence.

Abstract

Recent advances in artificial intelligence have been driven by the presence of increasingly realistic and complex simulated environments. However, many of the existing environments provide either unrealistic visuals, inaccurate physics, low task complexity, restricted agent perspective, or a limited capacity for interaction among artificial agents. Furthermore, many platforms lack the ability to flexibly configure the simulation, making the simulated environment a black-box from the perspective of the learning system. In this work, we propose a novel taxonomy of existing simulation platforms and discuss the highest level class of general platforms which enable the development of learning environments that are rich in visual, physical, task, and social complexity. We argue that modern game engines are uniquely suited to act as general platforms and as a case study examine the Unity engine and open source Unity ML-Agents Toolkit. We then survey the research enabled by Unity and the Unity ML-Agents Toolkit, discussing the kinds of research a flexible, interactive and easily configurable general platform can facilitate.

Unity: A General Platform for Intelligent Agents

TL;DR

Abstract

Paper Structure (30 sections, 6 figures, 4 tables)

This paper contains 30 sections, 6 figures, 4 tables.

Introduction
Anatomy of Environments and Simulators
Environment Properties
Simulation Properties
A Survey of Existing Simulators
Common Simulators
Arcade Learning Environment
DeepMind Lab
Project Malmo
Physics Simulators
VizDoom
The Unity Platform
Engine Properties
Environment Properties
Simulation Properties
...and 15 more sections

Figures (6)

Figure 1: The Unity Editor window on macOS.
Figure 2: A Learning Environment (as of version 1.0) created using the Unity Editor contains Agents and an Academy. The Agents are responsible for collecting observations and executing actions. The Academy is responsible for global coordination of the environment simulation.
Figure 3: Images of the fourteen included example environments as of the v0.11 release of the Unity ML-Agents Toolkit. From Left-to-right, up-to-down: (a) Basic, (b) 3DBall, (c) Crawler, (d) Push Block, (e) Tennis, (f) Worm, (g) Bouncer, (h) Grid World, (i) Walker, (j) Reacher, (k) Food Collector, (l) Pyramids, (m) Wall Jump, (n) Hallway, (o) Soccer Twos.
Figure 4: Mean cumulative episodic reward (y-axis) over simulation time-steps (in thousands, x-axis) during training and evaluation. We compare PPO (blue line) and SAC (red line) performances. Results presented are based on five separate runs, with a 95% confidence interval. LSTM indicates an LSTM unit is used in the network. ICM indicates the Intrinsic Curiosity Module is used during training.
Figure 5: Mean episodic ELO (y-axis) over simulation time-steps (in thousands, x-axis) during training with Self-Play and PPO. In symmetric environments, the ELO of the learning policy is plotted (blue line) and in asymmetric environments (blue and red line) the ELO of both learning policies are plotted. Results presented are based on five separate runs, with a 95% confidence interval.
...and 1 more figures

Unity: A General Platform for Intelligent Agents

TL;DR

Abstract

Unity: A General Platform for Intelligent Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (6)