GAIA Agent Evaluation Runner (LangGraph)

  1. Log in to your Hugging Face account with the button below.
  2. Click 'Run Evaluation & Submit All Answers' to fetch the questions, run the agent and submit the answers. This can take several minutes.
  3. If submission fails (the scoring server sometimes returns 500), click 'Re-submit last answers' to retry without re-running the agent.

The model endpoint is preconfigured in config.py, so no secrets are required. Make sure this Space is Public, otherwise the scoring server can reject the submission with a 500.

Questions and Agent Answers

Questions and Agent Answers