• GatlingX | Blog
  • Posts
  • Beyond benchmarks: Hackbot finds real zero-days without AI slop

Beyond benchmarks: Hackbot finds real zero-days without AI slop

Battle-tested in real bug finding competitions. No empty promises, no wasting your time. Exploit PoCs included. How hackbot achieved zero-day bug hunting without AI slop.

AI slop is eating the world. AI benchmarks for bug finding are impressive, but false positives are through the roof, to the point where sometimes, the time taken to verify AI findings is longer than the time it takes for you to find the bugs themselves.

However, where are the real-world results? We are excited to share with you Hackbot's recent zero-day discoveries, showcasing its unique exploit PoC-generation capabilities, and discovering novel vulnerabilities in smart contracts beyond benchmarks.

Takeaways

  • Hackbot identifies vulnerabilities not discovered by professional security auditors in real-world contest environments, surpassing those found in experimental benchmarks or lab settings.

  • Hackbot autonomously generates exploit code with no human intervention.

  • The Hackbot CLI is now available in open beta.

#1: Hackbot Beats Professional Humans at Novel Vulnerability Discovery outside of Labs and Benchmarks 

Our earlier work demonstrated that Hackbot outperformed the top 10% of human security researchers in experimental benchmarks. However, benchmark performance often fails to accurately reflect real-world conditions, due to the rapidly evolving landscape of smart contracts. To fill this gap, we evaluate Hackbot in live security competitions, where Hackbot's results can translate into real economic value. Our goal is to demonstrate that Hackbot can effectively function as a security researcher.

Table 2. Bug detection results from Hackbot across seven competitions.

Bug

Targets

Severity

Link

Root Cause

1

Forte

High

audit link

Logical error

2

Mighty Finance

High

audit link

Arithmetic error

3

Virtual Protocol

Low

audit link

Incorrect prices from broken oracles

4

Blackhole

Low

audit link

Arithmetic error

5

Panoptic Hypovault

Low

audit link

Missing restricted checks

6

Space and Time

Low

audit link

Incorrect decimals

7

Space and Time

Low

audit link

Incorrect surplus calculation

Table 2 summarises Hackbot’s findings from seven public competitions over the past two months. Overall, Hackbot identifies 11 vulnerabilities, 7 of which are confirmed by their developers. Two are high-severity and have earned bug bounty rewards. On average, 1–2 vulnerabilities are found per competition. We present selected screenshots of our findings with sensitive information removed.

Forte

Mighty Finance

Virtuals Protocol

Blackhole

Panoptic Hypovault

Space and Time

Space and Time

As a result, we receive bug bounty rewards from the smart contract developers. We’ve redacted some technical details to avoid issues with the platforms, but the results speak for themselves. We believe these results are a milestone: Hackbot isn't just an academic proof-of-concept; it’s a real, economically viable agent in the wild.

#2: Hackbot can exploit code and steal assets

Hackbot is an autonomous system designed to identify vulnerabilities in code, exploit them autonomously, and extract cryptocurrency from victims. Its core capabilities aim to demonstrate how advanced AI-driven agents might evolve in the context of offensive cybersecurity.

A key feature of Hackbot is its ability to auto-synthesise Proofs of Concept (PoCs) for identified vulnerabilities. These PoCs serve three critical purposes:

  1. False Positive Reduction: By synthesising and executing PoCs, Hackbot can validate whether a suspected vulnerability is exploitable, significantly reducing false positives that are common in static scanning methods.

  2. Reproduction on Onchain Environment: Hackbot generates PoC in forge tests, which can be executed on a private fork to validate the bug and estimate the real loss. 

  3. Prerequisite for Contest Participation: In public competitions and audits, validated PoCs are often required to qualify exploits. PoCs are helpful for smart contract developers to identify bugs.

This capability allows Hackbot to autonomously validate its own findings, making it not only a discovery tool but a self-verifying exploitation engine. 

#3: Hackbot CLI: Bringing superhuman hackbot capabilities to all

We can support a one-line to install hackbot CLI through pip:

pipx install hackbot

To run hackbot, you can just run the following command:

hackbot run  --api-key <your_hackbot_key> -s <your_repo_path> 

As a result, you can see the estimated cost and the bug reports once you accept the bill.

Limitations of the Current Hackbot

  • Nowhere close to 100% coverage. We still need to improve Hackbot to cover as many vulnerabilities as possible.

  • Foundry only. Currently, Hackbot supports only smart contract repositories built with Foundry. Other frameworks, such as Hardhat, and other smart contract languages, including Solana and Move, are not yet supported. Please contact us if you are interested in support for a specific language or framework.

Significance of Hackbot Capability

Accelerated Development and product velocity: Hackbot analyses large and complex smart contracts in ~90 minutes, which is significantly faster than human auditors, who may require days.
Cost Efficiency: Automated vulnerability detection reduces the need for manual code reviews, lowering security assessment costs.

Enhanced Trust: Contracts reviewed by Hackbot can improve confidence among users, developers, and partners.

Conclusion

In conclusion, we have outlined how our hackbot discovered vulnerabilities that no other professional security researcher has found, and how the hackbot is earning money on these public contest platforms, just like humans would, but at ~1/100th the time and ~1/10-1/100th the cost. We anticipate that there is significant progress we can unlock.

We will announce periodic live updates on the vulnerabilities we discover over time. For safety reasons, we will always be approximately two months behind in the discovery-to-announcement process of novel vulnerabilities, ensuring the safety of these protocols. 

For developers and investors seeking enhanced security and robust blockchain applications, please get in touch with us to learn how Hackbot can elevate your smart contract security at [email protected]