#Benchmarking hashtag - Bluesky

14 hours ago

Fake Samsung 990 Pro SSD is good enough to fool your benchmarks #Technology #Hardware #StorageDevices #SamsungSSD #FakeTech #Benchmarking

www.techspot.com/news/111861-fake-samsung...

0 0 0 0

TMLR Published Papers

@tmlr-pub.bsky.social

3 days ago

Grounding Generative Evaluations of Language Models in Unsupervised Document Corpora

Michael Majurski, Cynthia Matuszek

Action editor: Yu Meng

https://openreview.net/forum?id=EvtPh3Msol

#generative #corpora #benchmarking

0 0 0 0

Knowledge Zone

@knowledgezone.bsky.social

6 days ago

AI and LLM Benchmarks What are the commonly used LLM Benchmarks to measure the efficacy of a language model?

#ITByte: #Performance #Benchmarking is the process of measuring a system's performance against standards or other similar systems.

What are the commonly used LLM Benchmarks to measure the efficacy of a Language Model?

knowledgezone.co.in/posts/AI-and...

0 0 0 0

HGPU group

@hgpu.bsky.social

6 days ago

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity t…

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

#CUDA #Triton #Benchmarking #Package

hgpu.org?p=30694

0 0 0 0

HGPU group

@hgpu.bsky.social

6 days ago

Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study We present a cross-architecture evaluation of production LLM inference on AMD Instinct MI325X GPUs, benchmarking four models spanning 235B to 1 trillion parameters across three architectural famili…

Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study

#AMD #LLM #Benchmarking

hgpu.org?p=30693

1 0 0 0

deepseek

@deepseek.activitypub.awakari.com.ap.brid.gy

1 week ago

Original post on hgpu.org

Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study We present a cross-architecture evaluation of production LLM inference on AMD Inst...

#Computer #science #paper #AMD #Radeon #Instinct #MI325X #Benchmarking #LLM

Origin | Interest […]

0 0 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

1 week ago

Original post on hgpu.org

MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices? Large language models (LLMs) have demonstrated remarkable capabilities in code generation, yet their potential for generating...

#Computer #science #CUDA #paper #Benchmarking #Code #generation #LLM #nVidia #nVidia #A100 […]

2 0 0 0

Markus Eisele

@myfear.com

1 week ago

Java framework benchmarks are easy to get wrong.

The new Quarkus benchmark results are interesting — but the real story is the engineering work behind them.

Reproducible environments. Transparent methodology. Real collaboration across the team.

buff.ly/RjdXcuY

#Java #Quarkus #Benchmarking

3 2 0 0

JavaJobs

@javajobs.activitypub.awakari.com.ap.brid.gy

1 week ago

Original post on webpronews.com

Java Isn’t Slow — You Are: Why the JVM’s Raw Speed Means Nothing If Developers Keep Writing Bad Code Java's JVM is among the fastest runtimes available today, but most Java applications n...

#DevNews #HotSpot #JVM #Java #benchmarking #Java #code #optimization […]

[Original post on webpronews.com]

1 0 0 0

@panoptico-digital.bsky.social

1 week ago

¿Qué es el Benchmarking? Supera a tu competencia en su juego - Agencia de marketing digital Descubre qué es el benchmarking, sus tipos, cómo implementarlo paso a paso y cómo la IA acelera resultados. Guía práctica para empresas.

Vivimos en un mercado donde quedarse quieto equivale a retroceder. Los líderes del sector no llegaron por suerte, llegaron porque tuvieron la inteligencia de mirar a su alrededor, compararse sin complejos, aprender de los mejores y…
panoptico.digital/marketing-di...
#DigitalMarketing #benchmarking

0 1 0 0

Giuseppe Michieli

@gmik69bsky.bsky.social

1 week ago

Developing and #Benchmarking #OneHealth Genomic #Surveillance #Tools for #Influenza A Virus in #Wastewater

Developing and #Benchmarking #OneHealth Genomic #Surveillance #Tools for #Influenza A Virus in #Wastewater, etidiohnew.blogspot.com/2026/03/deve...

0 1 0 0

Daniel Hutchinson

@dhutchinson.bsky.social

2 weeks ago

Many thanks to the editors of @up_johd and the peer reviewers for everything that went into bringing this article to the finish line! 8/8

#digialhumanities #llm #benchmarking #AI #digitalhistory

0 1 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

2 weeks ago

Awakari App

Safety Evals: 12 Questions Before You Trust the Pass Rate A sharper way to read AI safety evaluation results before a reassuring percentage turns into false confidence. Continue reading on Medium »

#llm-evaluation #ai-safety #mlops #benchmarking #machine-learning

Origin | Interest | Match

0 0 0 0

Thilo Muth

@drmuth.bsky.social

2 weeks ago

🔬 New benchmarking study for the proteomics community!
From variability to consensus: PSM rescoring harmonizes peptide identification across search engines and datasets.
Preprint:
doi.org/10.64898/202...

#TeamMassSpec #Proteomics #MassSpectrometry #OpenScience #Benchmarking

2 1 0 0

TMLR Published Papers

@tmlr-pub.bsky.social

3 weeks ago

There are no Champions in Supervised Long-Term Time Series Forecasting

Lorenzo Brigato, Rafael Morand, Knut Joar Strømmen et al.

Action editor: Devendra Dhami

https://openreview.net/forum?id=yO1JuBpTBB

#benchmarking #forecasting #benchmark

0 0 0 0

CESGA-HPC

@cesga-hpc.bsky.social

3 weeks ago

Evaluating the performance of quantum devices Diego Andrade, associate Prof. at the University of A Coruña and researcher at CITIC, leads research lines focused on quantum computing, AI, and high-perform...

⚛️📈 How do we measure quantum progress?

📊 Our new benchmark suite with @udc.gal enables systematic evaluation of quantum platforms.

https://www.youtube.com/watch?v=Mv_qfJAXG0A

#QuantumComputing #Benchmarking #PCCC

0 0 1 0

HGPU group

@hgpu.bsky.social

3 weeks ago

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation Recent studies have demonstrated the potential of Large Language Models (LLMs) in generating GPU Kernels. Current benchmarks focus on the translation of high-level languages into CUDA, overlooking …

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

#CUDA #LLM #Benchmarking #Package

hgpu.org?p=30630

0 0 0 0

FunctionalProgramming

@functionalprogramming.activitypub.awakari.com.ap.brid.gy

3 weeks ago

Original post on hgpu.org

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation Recent studies have demonstrated the potential of Large Language Models (LLMs) in generating GPU Kernels. Current benchmarks focus on the tr...

#Computer #science #CUDA #paper #Benchmarking #LLM #nVidia #nVidia #A40 #nVidia #GeForce […]

0 0 0 0

Kubernetes

@kubernetes.activitypub.awakari.com.ap.brid.gy

1 month ago

Original post on franksworld.com

How Enterprises Measure LLM Performance and Cost Imagine trying to gauge the performance of an engine in real-world conditions. You wouldn’t just rev it up in a static environment and call it a d...

#AI #Large #Language #Models #Red #Hat #AI #benchmarking #AI #performance #evaluation

Origin | […]

0 0 0 0

roxsross

@roxsross.bsky.social

1 month ago

📊 Por qué ya no evaluamos con SWE-bench Verified

Contaminación y medición errónea del progreso en código frontera.

openai.com/index/why-we-no-longer-e...

#Benchmarking #AIEngineering #CodeGen #RoxsRoss

0 0 0 0

The Information, Advice and Support Services Network (IASSN)

@iass-network.councilfordisabledchildren.org.uk

1 month ago

Minimum Standards Benchmarking Report 2025–26 📊

A snapshot of how SENDIAS services are meeting national minimum standards. It highlights national trends and supports continuous improvement across SENDIAS.

🔗 councilfordisabledchildren.org.uk/about-us-0/n...

#SENDIAS #SEND #Benchmarking

1 3 0 0

NimblePros

@nimblepros.com

1 month ago

Gathering benchmarks for your .NET app and aren't sure if you're comparing the right things? In this post and video, Phil will talk you through validating your benchmarks in .NET: https://bit.ly/3Yyg80F

#dotnet #benchmarking

0 0 0 0

Miguel Filipe

@mfilipe.bsky.social

1 month ago

I benchmarked 8 local LLMs writing Go on my Framework 13 AMD Strix Point

#Benchmarking Local LLMs for coding in Go on my framework13 AMD Strix Point laptop...
msf.github.io/blogpost/ben...

0 0 1 0

The Gregory Lab @ DukeU

@gregorylab.bskyverified.social

1 month ago

Work from the #DukeMGC will be on display at #AGBT2026:

Tuesday 1:30-3:30, poster #401

Wednesday 4:45-6:15, poster #472

Come find us to chat about our data! 🧬

#AGBT #SpatialTranscriptomics #SingleCell #Benchmarking #LongReadSequencing

0 0 0 2

TMLR Published Papers

@tmlr-pub.bsky.social

1 month ago

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Jialin Yang, Dongfu Jiang, Tony He et al.

Action editor: Frederic Sala

https://openreview.net/forum?id=buDwV7LUA7

#structured #benchmarking #formats

0 0 0 0

Christian Klass

@christianklass.bsky.social

1 month ago

Small pre-announcement from today: The Procyon team is working on a new browser-focused benchmark. More about it soon. #Benchmarking

0 0 0 0

Chinballs Gaming

@chinballs.tv

1 month ago

9070XT Does it need a better CPU? YouTube video by Chinballs Gaming

Is your CPU holding back your 9070XT? #benchmarking #AMD #UltraWide #9070XT

0 0 0 0

ceph.io

@ceph.io

1 month ago

CLAY vs JErasure in Ceph, what’s the real performance story?
Part 4 of this CBT benchmarking series explains why CLAY incurs a write hit but can reduce recovery network traffic by ~50%.

Read more: t.ly/CLAYvsJErasure
#Ceph #Storage #OpenSource #Benchmarking

1 0 0 0

Juan Sanchez

@juanyobluesky.bsky.social

1 month ago

Advancing AI benchmarking with Game Arena We’re expanding Game Arena with Poker and Werewolf, while Gemini 3 Pro and Flash top our chess leaderboard.

🎮📊 Game Arena: mejoras para benchmarking de IA y evaluación de modelos. #Benchmarking #DeepMind

0 0 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

1 month ago

Awakari App

I Changed One String and My Model’s Score Dropped 70 Points Understanding LLM evaluation by experimenting with different stop sequences Continue reading on Towards AI »

#machine-learning #llm #mlops #artificial-intelligence #benchmarking

Origin | Interest | Match

0 0 0 0