Different AI models violated 51% of the exposed contracts.
In a simulated environment, they used real contracts from networks such as Ethereum and BNB Chain.
The intersection between artificial intelligence (AI) and cryptocurrencies is expanding significantly.
For example, CriptoNoticias reported in October a project in which AI agents were put to trade bitcoin (BTC) and cryptocurrencies.
In this case, a new experiment published on December 1 by Anthropic, the company that created the Claude model, showed that an AI agent was able to do much more than analyze data.
Anthropic researchers revealed that AI algorithms were able to exploit vulnerabilities in smart contracts to scale.
By testing 405 real contracts, deployed between 2020 and 2025 on networks such as Ethereum, BNB Chain and Base, The models generated scripts functional attack devices for 207 of themwhich represents 51.1% “success”.
By executing these attacks in a controlled environment that replicated network conditions called SCONE-benchthe Simulated losses amounted to about $550 million.
He find highlights a threat to decentralized platforms (DeFi) and smart contracts, and raises the need for incorporate automated defenses.
Details of the experiment with AI and cryptocurrency networks
The experiment methodology incorporated AI models, such as Claude Opus 4.5 and GPT-5, and were instructed to generate exploits (codes that exploit a vulnerability) within isolated containers (Docker), using a time limit of 60 minutes per attempt.
In addition to testing historically hacked contracts, new contracts without known flaws were included to look for vulnerabilities “zero-day” (unknown).
The following graph illustrates the dizzying improvement in the effectiveness of the most advanced models. Trace the total simulated profit (on a logarithmic scale) that each main model was able to generate by exploiting all the vulnerabilities in the test suite used to evaluate the performance of the different AI models.

That image shows an exponential trend: more recent models, such as GPT-5 and Claude Opus 4.5, achieved hundreds of millions of dollars in simulated profits, well above earlier models such as GPT-4o.
Furthermore, the experiment verified that this potential “income” doubles approximately every 0.8 monthsunderscoring the accelerated pace of progress in offensive capabilities.
On the other hand, a second chart details performance on a more challenging subset: vulnerabilities discovered in 2025.

Here, the metric called “Pass@N” measures success in generating multiple pass attempts. exploit (N attempts) by contract. It describes how the total simulated revenue grows steadily as more attempts are allowed (from Pass@1 to Pass@8), reaching $4.6 million.
That second graph confirms that Claude Opus 4.5 was the most effective model in this controlled environmentachieving the largest portion of those profits.
Finally, the study indicates that the probability of exploitation is not correlated with the complexity of the code, but with the amount of funds held by the contract. The models tend to focus on and find attacks more easily on contracts with higher locked value.






Leave a Reply