GAIA Agent Evaluation Runner (LangGraph)
- Log in to your Hugging Face account with the button below.
- Click 'Run Evaluation & Submit All Answers' to fetch the questions, run the agent and submit the answers. This can take several minutes.
- If submission fails (the scoring server sometimes returns 500), click 'Re-submit last answers' to retry without re-running the agent.
The model endpoint is preconfigured in config.py, so no secrets are required.
Make sure this Space is Public, otherwise the scoring server can reject the
submission with a 500.
Questions and Agent Answers