#HumanEval hashtag - Bluesky

1 day ago

Code Completion and Generation LLM Leaderboard 2026 Rankings of the best LLMs on code completion benchmarks - HumanEval, LiveCodeBench, BigCodeBench, MBPP, and competitive programming - with methodology notes on contamination. Updated April 2026.

Code Completion and Generation LLM Leaderboard 2026

awesomeagents.ai/leaderboards/code-comple...

#Leaderboards #CodeCompletion #Humaneval

0 0 0 0

Awesome Agents

@awesomeagents

4 days ago

Code Completion and Generation LLM Leaderboard 2026 Rankings of the best LLMs on code completion benchmarks - HumanEval, LiveCodeBench, BigCodeBench, MBPP, and competitive programming - with methodology notes on contamination. Updated April 2026.

Code Completion and Generation LLM Leaderboard 2026

awesomeagents.ai/leaderboards/code-comple...

#Leaderboards #CodeCompletion #Humaneval

0 0 0 0

Jeremy Bradbury

@jeremybradbury

1 year ago

Looking for evidence of data leakage in the #HumanEval code generation #LLM benchmark? Check out our analysis of a subset of HumanEval tasks vs comparable tasks on #ChatGPT, #Claude and #Llama. @riddhimore.bsky.social

3 2 0 0

Riza

@riza.io

1 year ago

🧑‍💻🔒Run HumanEval safely with Riza

Codegen is a hot topic. Today, many LLMs can generate functions and even apps.

But how do you know which model writes the best code?

Enter #HumanEval.

1 0 1 1

Jeremy Bradbury

@jeremybradbury

1 year ago

Our paper "Addressing #DataLeakage in #HumanEval Using Combinatorial Test Design" has been accepted for publication at the Int’l Conference on Software Testing, Verification & Validation (#ICST2025), Short Papers, Vision & Emerging Results.
co-author: Riddhi More
arxiv.org/abs/2412.01526
#benchmark

5 1 0 1

Micha the DevOp

@michabbb

1 year ago

🛠️ Achieves top performance in Fill-in-the-Middle (#FIM) tasks: 85.9% average accuracy across languages, 95.3% pass@1 rate

💻 Excels in multiple languages: 86.6% #Python, 78.9% #Cpp, 82.6% #JavaScript accuracy on #HumanEval benchmarks

0 0 1 0

GOMOOT

@meneguzzo68

2 years ago

Grok-1.5, l'ultima versione del modello di IA di Musk X.ai annuncia Grok-1.5, l'ultima versione del suo modello di IA, con una finestra di contesto di 128 k token e miglioramenti nei test di Math e HumanEval

💡 X AI annuncia Grok-1.5, l'ultima versione del suo modello di intelligenza artificiale

gomoot.com/x-ai-annunci...

#Claude @elonmusk #framework #gemini #GPT #Grok #HumanEval #math #mistral #xAI #opus #gpt4 #llm #coding

0 0 0 0