Coming soon
CyberGym-E2E
End-to-end evaluation of AI agents across the full vulnerability lifecycle.
The newest addition to the CyberGym series pushes beyond reproducing and exploiting individual vulnerabilities toward evaluating the entire attack lifecycle end to end. The paper is available now, and the full benchmark, datasets, and leaderboard are in active development.
Full benchmark & leaderboard — in progress.
Check back soon, or read the paper for early details.