Embodied Agent Interface Challenge @ NeurIPS 2025

Welcome to the Embodied Agent Interface (EAI) Challenge, a NeurIPS 2025 competition that introduces a unified benchmarking framework for evaluating Large Language Models (LLMs) in embodied decision-making tasks. This competition aims to foster reproducible research and rigorous analysis in embodied AI, bridging the gap between language modeling and robotic planning.

๐Ÿ“ฃ Announcements

๐Ÿง  Motivation

Despite increasing interest in using LLMs for robotics and agent reasoning, current evaluations are fragmented and often limited to final task success rates. These approaches fail to reveal specific reasoning failures, limiting scientific understanding and practical progress.

The Embodied Agent Interface addresses this gap through a modular evaluation framework that standardizes task interfaces and metrics across four core decision-making abilities:

๐Ÿ”ฌ Whatโ€™s New?

๐Ÿงช Benchmark Overview

The benchmark dataset consists of:

๐Ÿงฉ Tasks & Abilities

Participants may compete in one or more of the following modules:

  1. Goal Interpretation: Translate natural language into formal symbolic goals.
  2. Subgoal Decomposition: Break down goals into executable substeps.
  3. Action Sequencing: Generate feasible action trajectories to accomplish goals.
  4. Transition Modeling: Infer preconditions and effects of symbolic actions.

Each module can be tackled independently, with leaderboards and evaluation scripts provided per module.

๐Ÿ“Š Evaluation Metrics

We evaluate models on:

An aggregated Average Performance metric summarizes overall model capability across modules.

๐Ÿ“… Timeline

Phase Dates (2025)
Beta Testing July
Competition Launch August
Development Phase August โ€“ Mid November
Final Evaluation Mid โ€“ Late November
NeurIPS 2025 Competition Track In-Person Event Early December

๐Ÿ“Œ How to Participate

๐Ÿ“ฃ Stay Connected

Letโ€™s build the future of intelligent embodied agents โ€” together.