#Benchmark hashtag - Bluesky

@roxsross.bsky.social

2 days ago

📊 ¿Tu benchmark de pgvector te está mintiendo? Descubre por qué.

https://thenewstack.io/why-pgvector-benchmarks-lie/

#PostgreSQL #pgvector #VectorEmbeddings #Benchmark

0 0 0 0

AERCO Intl

@aercoint.bsky.social

3 days ago

Benchmark® E pairs seamlessly with AERCO’s SmartPlate® EV indirect water heater for a fully electric heating and hot water combination plant solution. Learn more: https://ow.ly/bRVQ50Yz5VQ #AERCO #Benchmark

0 0 0 0

Tom Evans

@tomwe.bsky.social

4 days ago

Closeup image of a carved OS Benchmark on Barton Bridge in Bradford on Avon. The rivet at the apex of the arrow is missing.

Wider view of Barton Bridge in Bradford on Avon with a carved OS Benchmark circled in red. The rivet at the apex of the arrow is missing.

Screenshot from National Library Of Scotland's archive map of Barton Bridge showing the OS Benchmarks position.

I had been looking for this #benchmark on #bradfordonavon's Barton Bridge for some time, always looking on the upright rather than the top. Silly me, there it is!
www.bench-marks.org.uk/bm238065

0 0 0 0

Aumentativo de Helena ✨

@diilua.bsky.social

5 days ago

Galera, tô querendo fazer uns benchmarks com áreas de tecnologia e produto que estejam usando IA, principalmente Claude Code, no trabalho

Alguém topa conversar sobre? ❤️

#benchmark #ia #claudecode

1 0 0 0

PHOENIX MEDIA

@phoenix-media.bsky.social

5 days ago

🎙️Last but not least – Anton Gudkov
Er gibt uns unter anderem Einblicke in
den Einsatz von Claude-AI-Modellen im Dev-Kontext und Benchmark-Tests unterschiedlicher Einstellungen und Regelwerke. 🧠

#AI #DeveloperExperience #Benchmark #Ecommerce

0 0 0 0

Les Cast Codeurs

@lescastcodeurs.com

1 week ago

Episode 338 - Le soulèvement des bots de skills #skills #benchmark #mcp #jdk26 #Security #java sur https://youtube.com/watch?v=1Av8YU_5beI et en podcast lescastcodeurs.com/2026/03/20/lcc-338-le-so...

12 5 1 0

Bufigol

@bufigol.bsky.social

1 week ago

We've exhausted internet data. Next step? High-quality data and simulated learning environments.
Still a long way to AGI.
#AI #ArtificialIntelligence #Math #Benchmark #MATHVISTA #AGI #Tech #Opinion

0 0 1 0

Bufigol

@bufigol.bsky.social

1 week ago

New MATHVISTA benchmark: Top AI models score 49.9% on visual math reasoning. Humans: 60.3%. 🧮
A computer is a calculator on steroids, but it can't reason about math. Huge difference.
#AI #ArtificialIntelligence #Math #Benchmark #MATHVISTA #AGI #Tech #Opinion

0 0 1 0

Westfriesland Praat

@westfrieslandpraat.bsky.social

1 week ago

Lees alles over: Opmeer vergelijkt afvalstoffenheffing met andere gemeenten | op Westfriesland Praat, Voor en door Westfriezen | #afvalinzameling #afvalstoffenheffing #benchmark #gemeenteraad #hvc #opmeer #restafval
westfrieslandpraat.nl/benchmark-afvalstoffenhe...

0 0 0 0

TMLR Published Papers

@tmlr-pub.bsky.social

1 week ago

New #J2C Certification:

Statistical Inference for Generative Model Comparison

Zijun Gao, Han Su, Yan Sun

https://openreview.net/forum?id=PXL6SBxh0q

#generative #inference #benchmark

0 0 0 0

Sales Science

@thesaasreport.bsky.social

1 week ago

Announcing Gumloop's $50M Series B Gumloop has raised a $50M series B to become the automation infrastructure for every company.

🚨SaaS Moves You Missed🚨

Gumloop, an AI agent-builder for knowledge workers, raised a $50M Series B led by Benchmark.

https://www.gumloop.com/blog/series-b

#Gumloop #Benchmark #FundingNews #SaaS

1 0 0 0

Foundryon

@foundryon.bsky.social

1 week ago

Your park on a casual day vs your park when you start playing Foundryon Relic Hunt.

#DLSS55 #NVIDIA #mobilegaming #PCGaming #TechSky #GamingNews #RTX #GraphicsCards #Benchmark #TechDrama #GPUWars #RelicHunt #Foundryon

1 1 0 0

Don Curren 🇨🇦🇺🇦

@dbcurren.bsky.social

1 week ago

1 Bloomberg: #Stockmarkets are crashing, globally. Look what’s happened so far this month in #Japan (-7.9%), #SouthKorea (-9.7%), #France (-7.6%), #Switzerland (-8.1%) or #Indonesia (-14%) — using #benchmark #indexes for all. 🧵

3 0 1 0

SVAR UI

@svarwidgets.bsky.social

1 week ago

We Benchmarked Top React Gantt Chart Libraries So You Don't Have To Comprehensive benchmark of 6 popular React Gantt chart libraries. Compare loading speed, CRUD operations, live updates, scrolling, and memory usage. Find out which library performs best.

We benchmarked 6 popular React Gantt libraries across:
• loading speed
• scrolling
• CRUD operations
• live updates
• memory usage

🏆 SVAR React Gantt wins at loading, CRUD ops & live updates! See the full breakdown 👇
svar.dev/blog/react-g...

#react #webdev #benchmark #gantt #frontend

1 0 0 0

PlayFront.de

@playfrontde.bsky.social

1 week ago

Crimson Desert auf Basis-PS5: Gameplay-Check entkräftet Technik-Sorgen Pearl Abyss zeigt erstes PS5-Gameplay zu Crimson Desert. Flüssige Performance auf der Basis-Konsole bestätigt. Alle Infos zu Technik und Release am 19. März.

Crimson Desert auf der PS5: Stabiles Gameplay ohne Frame-Pacing-Fehler. BlackSpace Engine liefert ab. 🎮🔥 #CrimsonDesert #PS5 #GamingTech #PearlAbyss #Benchmark #ConsoleGaming

0 1 0 0

ArcadeDB

@arcadedb.bsky.social

1 week ago

Neo4j Alternatives in 2026

arcadedb.com/blog/neo4j-a...

#neo4j #memgraph #ladybugdb #arangodb #falkordb #database #nosql #dbms #benchmark

2 0 0 0

roxsross

@roxsross.bsky.social

2 weeks ago

⚡ PNNL y OpenAI se asocian para agilizar permisos federales

Presentan DraftNEPABench, un benchmark para acelerar revisiones de infraestructura con IA.

openai.com/index/pacific-northwest-...

#AIcoding #NEPA #Benchmark #RoxsRoss

1 0 0 0

Diario El Mundo

@elmundo.hn

2 weeks ago

BCIE reduce 95 puntos básicos en tres años tras emitir 2,000 millones en Benchmark histórico El Banco Centroamericano de Integración Económica aplicará tercer recorte consecutivo de 15 puntos básicos en tasas de interés desde el 1 de junio de 2026 acumulando reducción de entre 80 y 95 puntos en tres años, beneficiando presupuestos nacionales mediante eficiencias logradas tras captar 2,000 millones de dólares en emisión Benchmark más grande de su historia. Este artículo BCIE reduce 95 puntos básicos en tres años tras emitir 2,000 millones en Benchmark histórico se publicó primero en Diario El Mundo | Noticias de Honduras y el Mundo.

#Economía #Presupuestos #Benchmark BCIE reduce 95 puntos básicos en tres años tras emitir 2,000 millones en Benchmark histórico

0 0 0 0

KillBait News

@kill-bait.bsky.social

2 weeks ago

Early Benchmarks Show Apple's MacBook Neo Outperforming Top x86 CPUs in Single-Core Tests Benchmark results from Notebookcheck reveal that the new Apple MacBook Neo, powered by the A18 Pro chip, delivers record-breaking single-core performance that surpasses all current x86 processors from Intel and AMD. In Cinebench 2024 testing, the A18 Pro achieved 147 points while consuming only 3.5 to 4 watts. This efficiency is noteworthy, as the test itself lasts roughly ten minutes and taxes a CPU core consistently during the process. The performance figure places Apple’s chip ahead of even high-end desktop CPUs such as Intel’s Core Ultra 9 285K and AMD’s Ryzen 9 9950X3D—not to mention every modern mobile chip from AMD, Intel, and Qualcomm. The A18 Pro also tops Apple’s previous M3 generation, cementing the company’s continued lead in single-core efficiency. Despite these impressive results, the article notes that Apple’s architectural design includes specialized accelerators that favor workload types optimized for its ecosystem, meaning the raw benchmark may not represent typical real-world usage outside macOS or Apple-optimized software. Notebookcheck suggests that Apple’s tight integration between hardware and software provides a unique advantage versus general-purpose processors. Industry reactions are mixed; some applaud the innovation, while others label the coverage as overly promotional. Regardless, the results signal a new level of competition between Apple’s ARM-based systems and the traditional x86 giants, Intel and AMD.

Early Benchmarks Show Apple's MacBook Neo Outperforming Top x86 CPUs in Single-Core Tests

🤖 IA: It's clickbait ⚠️
👥 Usuarios: It's clickbait ⚠️

#apple #benchmark #cpu

View full AI summary:

0 0 0 0

KillBait News

@kill-bait.bsky.social

2 weeks ago

Researchers Develop a Comprehensive Benchmark to Evaluate AI Expertise As AI systems increasingly excelled at traditional academic benchmarks, researchers recognized the need for more challenging tests. In response, an international team of nearly 1,000 experts developed Humanity's Last Exam (HLE), a 2,500-question assessment covering mathematics, humanities, natural sciences, ancient languages, and other highly specialized fields. Each question was carefully crafted so that current AI models could not solve it, with any solvable questions removed from the final exam. Early testing revealed that even the most advanced AI models struggle significantly, with scores ranging from roughly 2.7% to around 50% for the most capable systems. Dr. Tung Nguyen from Texas A&M University emphasized that the goal is not to defeat AI but to identify gaps in AI knowledge and provide a durable benchmark for measuring AI progress. The exam demonstrates that high performance on traditional human-focused tests does not equate to genuine intelligence, as AI systems still lack deep, contextual understanding and specialized expertise. Humanity's Last Exam also highlights the importance of human expertise and the value of global, interdisciplinary collaboration in evaluating AI capabilities.

Researchers Develop a Comprehensive Benchmark to Evaluate AI Expertise

🤖 IA: It's clickbait ⚠️
👥 Usuarios: It's clickbait ⚠️

#ai #benchmark #research

View full AI summary:

0 0 0 0

TMLR Published Papers

@tmlr-pub.bsky.social

2 weeks ago

mSOP-765k: A Benchmark For Multi-Modal Structured Output Predictions

Bianca Lamm, Janis Keuper

Action editor: Mohammad Ghavamzadeh

https://openreview.net/forum?id=H7eYL4yFZS

#benchmark #advertisements #modal

0 0 0 0

KillBait Noticias

@kill-bait-es.bsky.social

2 weeks ago

Evaluación de modelos de IA frente a preguntas sin sentido BullshitBench es un benchmark diseñado para evaluar cómo los modelos de inteligencia artificial responden a preguntas sin sentido o basadas en premisas incorrectas. La prueba analiza si los modelos detectan estas premisas defectuosas, si señalan directamente el sinsentido y si evitan continuar con suposiciones inválidas de forma confiada. La plataforma permite filtrar los resultados según diferentes criterios, como la visibilidad del modelo y la técnica de razonamiento utilizada. Además, ofrece un ranking de modelos según su capacidad para rechazar claramente las preguntas sin sentido, mostrando la mejora de cada versión en términos de porcentajes de respuestas correctas y de detección de errores. Los datos se organizan con códigos de colores que indican el tipo de respuesta: verde para respuestas claras, ámbar para respuestas parciales, rojo para aceptar el sinsentido y errores que indican fallos. Esta herramienta resulta útil para desarrolladores y investigadores que buscan entender las limitaciones de los modelos de lenguaje actuales y mejorar su capacidad de razonamiento crítico, evitando que los modelos den respuestas incorrectas con confianza. BullshitBench también permite comparar modelos entre sí y rastrear el progreso de su desarrollo a lo largo del tiempo, proporcionando información valiosa sobre la evolución de la inteligencia artificial en contextos de razonamiento complejo y detección de información inválida.

Evaluación de modelos de IA frente a preguntas sin sentido

🤖 IA: No es clickbait ✅
👥 Usuarios: No es clickbait ✅

#ia #modelosdelenguaje #benchmark

Ver resumen IA completo:

0 0 0 0

FierceMind

@ostroumni.bsky.social

2 weeks ago

#Google: #AI agents learn to cooperate on their own - no hardcoded #orchestration needed. Train them against a diverse pool of #opponents and #cooperation emerges as a property of #training.

#Benchmark:
Iterated Prisoner's Dilemma.

Result: stable collaboration

#AI #MultiAgent #MachineLearning

3 0 0 0

DW Innovation

@dw-innovation.mastodon.social.ap.brid.gy

2 weeks ago

LLMs hallucinate – but not at the same rate. AA-Omniscience data reveals major differences between models and domains.

Well structured and worth checking out: https://artificialanalysis.ai/evaluations/omniscience

#AI #LLM #benchmark

0 5 0 1

Ahmandonk

@ahmandonk.bsky.social

3 weeks ago

📰 Benchmark Intel Core Ultra 5 250K Plus Bocor, Gambarkan Performa Arrow Lake Refresh

👉 Baca artikel lengkap di sini: ahmandonk.com/2026/03/09/intel-core-ul...

#arrowLake #benchmark #cpu #intel

0 0 0 0

Melamorsicata.it

@melanews.bsky.social

3 weeks ago

Geekbench 6 benchmark results showing iPhone 17e with A19 chip performance compared to iPhone 17.

I primi benchmark Geekbench 6 rivelano che iPhone 17e con chip A19 è alla pari con iPhone 17 per la CPU. La GPU a 4 core del 17e mostra un leggero calo grafico rispetto ai 5 core del 17. 📱📊
#iphone17e #benchmark #chipa19

0 0 0 0

TMLR Published Papers

@tmlr-pub.bsky.social

3 weeks ago

There are no Champions in Supervised Long-Term Time Series Forecasting

Lorenzo Brigato, Rafael Morand, Knut Joar Strømmen et al.

Action editor: Devendra Dhami

https://openreview.net/forum?id=yO1JuBpTBB

#benchmarking #forecasting #benchmark

0 0 0 0

TMLR Published Papers

@tmlr-pub.bsky.social

3 weeks ago

New #J2C Certification:

\texttt{Complex-Edit}: CoT-Like Instruction Generation for Complexity-Controllable Image Editing ...

Siwei Yang, Mude Hui, Bingchen Zhao, Yuyin Zhou, Nataniel Ruiz, Cihang Xie

https://openreview.net/forum?id=lL1JR6dxG8

#editing #instruction #benchmark

0 0 0 0

Melamorsicata.it

@melanews.bsky.social

3 weeks ago

MacBook Neo benchmark:
CPU vicina a iPhone 16 Pro, chip A18 Pro con GPU ridotta.

Dati:
Neo: 3461/8668/31286
iPhone 16 Pro: 3445/8624/32575
M4 Air: 3696/14730/54630

Analisi prestazioni hardware 💻📊

#apple #macbookneo #benchmark

0 0 0 0