Embodied Agent Interface Challenge @ NeurIPS 2025

Welcome to the Embodied Agent Interface (EAI) Challenge, a NeurIPS 2025 competition that introduces a unified benchmarking framework for evaluating Large Language Models (LLMs) in embodied decision-making tasks. This competition aims to foster reproducible research and rigorous analysis in embodied AI, bridging the gap between language modeling and robotic planning.

🧠 Motivation

Despite increasing interest in using LLMs for robotics and agent reasoning, current evaluations are fragmented and often limited to final task success rates. These approaches fail to reveal specific reasoning failures, limiting scientific understanding and practical progress.

The Embodied Agent Interface addresses this gap through a modular evaluation framework that standardizes task interfaces and metrics across four core decision-making abilities:

πŸ”¬ What’s New?

πŸ§ͺ Benchmark Overview

The benchmark dataset consists of:

All data, annotations, and code will be released through our GitHub repository and Hugging Face Datasets.

🧩 Tasks & Abilities

Participants may compete in one or more of the following modules:

  1. Goal Interpretation: Translate natural language into formal symbolic goals.
  2. Subgoal Decomposition: Break down goals into executable substeps.
  3. Action Sequencing: Generate feasible action trajectories to accomplish goals.
  4. Transition Modeling: Infer preconditions and effects of symbolic actions.

Each module can be tackled independently, with leaderboards and evaluation scripts provided per module.

πŸ“Š Evaluation Metrics

We evaluate models on:

An aggregated Average Performance metric summarizes overall model capability across modules.

πŸš€ Baselines & Starter Kit

We provide baseline implementations using open and proprietary LLMs:

A comprehensive starter kit will include:

πŸ“… Timeline

Phase Dates (2025)
Beta Testing July
Competition Launch August
Development Phase August – Mid-October
Final Evaluation Mid–Late October
NeurIPS Workshop November

πŸ† Awards & Recognition

πŸ“Œ How to Participate

πŸ“£ Stay Connected

Let’s build the future of intelligent embodied agents β€” together.