Advertisement · 728 × 90
#
Hashtag

#HumanEval

Advertisement · 728 × 90
Preview
Code Completion and Generation LLM Leaderboard 2026 Rankings of the best LLMs on code completion benchmarks - HumanEval, LiveCodeBench, BigCodeBench, MBPP, and competitive programming - with methodology notes on contamination. Updated April 2026.

Code Completion and Generation LLM Leaderboard 2026

awesomeagents.ai/leaderboards/code-comple...

#Leaderboards #CodeCompletion #Humaneval

0 0 0 0
Preview
Code Completion and Generation LLM Leaderboard 2026 Rankings of the best LLMs on code completion benchmarks - HumanEval, LiveCodeBench, BigCodeBench, MBPP, and competitive programming - with methodology notes on contamination. Updated April 2026.

Code Completion and Generation LLM Leaderboard 2026

awesomeagents.ai/leaderboards/code-comple...

#Leaderboards #CodeCompletion #Humaneval

0 0 0 0

Looking for evidence of data leakage in the #HumanEval code generation #LLM benchmark? Check out our analysis of a subset of HumanEval tasks vs comparable tasks on #ChatGPT, #Claude and #Llama. @riddhimore.bsky.social

3 2 0 0
Post image

🧑‍💻🔒Run HumanEval safely with Riza

Codegen is a hot topic. Today, many LLMs can generate functions and even apps.

But how do you know which model writes the best code?

Enter #HumanEval.

1 0 1 1
Post image

Our paper "Addressing #DataLeakage in #HumanEval Using Combinatorial Test Design" has been accepted for publication at the Int’l Conference on Software Testing, Verification & Validation (#ICST2025), Short Papers, Vision & Emerging Results.
co-author: Riddhi More
arxiv.org/abs/2412.01526
#benchmark

5 1 0 1

🛠️ Achieves top performance in Fill-in-the-Middle (#FIM) tasks: 85.9% average accuracy across languages, 95.3% pass@1 rate

💻 Excels in multiple languages: 86.6% #Python, 78.9% #Cpp, 82.6% #JavaScript accuracy on #HumanEval benchmarks

0 0 1 0
Preview
Grok-1.5, l'ultima versione del modello di IA di Musk X.ai annuncia Grok-1.5, l'ultima versione del suo modello di IA, con una finestra di contesto di 128 k token e miglioramenti nei test di Math e HumanEval

💡 X AI annuncia Grok-1.5, l'ultima versione del suo modello di intelligenza artificiale

gomoot.com/x-ai-annunci...

#Claude @elonmusk #framework #gemini #GPT #Grok #HumanEval #math #mistral #xAI #opus #gpt4 #llm #coding

0 0 0 0