OpenAI GPT 4o ranked as one of the best AI mannequin for writing Solidity sensible contract code by IQ

October 21, 2024

IQ's SolidityBench was launched as the primary leaderboard to judge LLMs in Solidity code technology. Accessible on Hugging Face, we’re introducing two modern benchmarks designed to evaluate and rank the proficiency of AI fashions when producing sensible contract code: Solidity’s NaïveJudge and HumanEval .

Developed by IQ's BrainDAO as a part of the upcoming IQ Code suite, SolidityBench helps you refine your personal EVMind LLM and examine it to generalist and community-created fashions. IQ Code goals to offer AI fashions tailor-made to sensible contract code technology and auditing, addressing the rising want for safe and environment friendly blockchain functions.

Like IQ mentioned crypto slateNaïveJudge gives a brand new method by tasking LLMs with implementing sensible contracts primarily based on detailed specs derived from audited OpenZeppelin contracts. These contracts present the gold commonplace of accuracy and effectivity. The generated code is evaluated in opposition to a reference implementation utilizing standards resembling practical completeness, compliance with Solidity finest practices and safety requirements, and optimization effectivity.

The analysis course of leverages superior LLMs, together with numerous variations of OpenAI's GPT-4 and Claude 3.5 Sonnet, as unbiased code reviewers. We consider your code primarily based on rigorous standards, together with implementation of all main options, dealing with of edge circumstances, error administration, correct syntax utilization, and total code construction and maintainability.

Optimization concerns resembling fuel effectivity and storage administration are additionally evaluated. Scores vary from 0 to 100 and supply a complete evaluation throughout performance, safety, and effectivity, reflecting the complexity {of professional} sensible contract improvement.

Which AI mannequin is finest for Solidity sensible contract improvement?

Benchmark outcomes present that OpenAI's GPT-4o mannequin achieves the very best total rating of 80.05, NaïveJudge rating of 72.18, and HumanEval for Solidity move price of 80% for move@1 and 92% for move@3. It was carried out.

Apparently, new inference fashions resembling OpenAI's o1-preview and o1-mini misplaced out to the highest spot with scores of 77.61 and 75.08, respectively. Fashions from Anthropic and XAI, such because the Claude 3.5 Sonnet and grok-2, carried out competitively, with total scores hovering round 74. Nvidia's Llama-3.1-Nemotron-70B had the bottom rating within the prime 10, at 52.54.

SolidityBench score for LLM (Hugging Face) — SolidityBench rating for LLM (Hugging Face)

In accordance with IQ, HumanEval for Solidity adapts OpenAI's unique HumanEval benchmark from Python to Solidity and covers 25 duties of various problem. Every activity features a corresponding take a look at that’s appropriate with Hardhat, a preferred Ethereum improvement atmosphere, facilitating correct compilation and testing of the generated code. The analysis metrics move@1 and move@3 measure the success of the mannequin on the primary try and over a number of makes an attempt, offering perception into each accuracy and problem-solving potential.

Objectives of utilizing AI fashions in sensible contract improvement

By introducing these benchmarks, SolidityBench goals to advance AI-assisted sensible contract improvement. This gives builders and researchers with beneficial perception into the present capabilities and limitations of AI in Solidity improvement, and facilitates the creation of extra refined and dependable AI fashions.

This benchmark toolkit goals to advance IQ Code's EVMind LLM and likewise set up a brand new commonplace for AI-assisted sensible contract improvement throughout the blockchain ecosystem. This initiative goals to handle a crucial want in an business the place the demand for safe and environment friendly sensible contracts continues to develop.

Builders, researchers, and AI fanatics are invited to discover and contribute to SolidityBench, which goals to drive steady enchancment of AI fashions, advance finest practices, and advance decentralized functions. Masu.

To study extra, go to Hugging Face's SolidityBench leaderboard and begin benchmarking Solidity generated fashions.

talked about on this article

(Tag translation) Ethereum

BlackRock’s Buidl fund paid a dividend of $4.17 million in March, exceeding $25 million in complete funds

North Korean hackers earn $2.5 million after WBTC gross sales

Bitcoin Trump Tariff Take a look at: Can assist as much as $82,000, $78,000 can stand up to?

Cryptocoins with 4 efficiency in 2025: BlockDag, Solana, XRP & Ethereum – set to result in huge advantages!

BlackRock’s Buidl fund paid a dividend of $4.17 million in March, exceeding $25 million in complete funds

North Korean hackers earn $2.5 million after WBTC gross sales

Bitcoin Trump Tariff Take a look at: Can assist as much as $82,000, $78,000 can stand up to?

Cryptocoins with 4 efficiency in 2025: BlockDag, Solana, XRP & Ethereum – set to result in huge advantages!

OpenAI GPT 4o ranked as one of the best AI mannequin for writing Solidity sensible contract code by IQ

Which AI mannequin is finest for Solidity sensible contract improvement?

Objectives of utilizing AI fashions in sensible contract improvement

🤖 High AI Cryptoassets

talked about on this article

most viewed

Vitalik Buterin: The Ordinals Rebooted Bitcoin Builder Tradition

Chancer Pre-Sale Wins Extra Buyers With ETH and USDT...

Sparkro (SPRK) Introduces Progressive Funding Platform, AAVE Considers Becoming...

trending right now

OP costs fell 23% in Might, however can optimism...

Bitstamp Added to FCA’s Checklist of Registered Cryptocurrency Firms

ETH Bulls Set Additional Lows Rampant: Will This Streak...

The Rise of Crypto Casinos

U.S. Home Proposes Draft New Digital Property Invoice, Proposes...

SEC Chairman: Satoshi Nakamoto’s Innovation Advances Cryptocurrency and Its...