#DataContamination hashtag - Bluesky

2 months ago

A Survey on Data Contamination for Large Language Models

#DataContamination #AIEvaluation Training–test overlap can inflate LLM scores. “data contamination” in #LLMs, defined as unintended overlap between training data & evaluation data that can inflate measured performance & misrepresent true generalization. arxiv.org/html/2502.14...

0 0 0 0

GetNews.me

@getnews-me.bsky.social

6 months ago

LNE-Blocking: Framework to Counter Data Contamination in LLMs

LNE-Blocking uses Leakage‑Noise Estimation and a Blocking step to tweak greedy decoding, cutting memorized answers, preserving performance. Code is on GitHub. Read more: getnews.me/lne-blocking-framework-t... #llm #datacontamination

0 0 0 0

Winbuzzer

@winbuzzer.com

8 months ago

Alibaba’s Qwen 2.5 AI Faces MAth ‘Cheating’ Allegations Over Contaminated Benchmark Data

#AI #Alibaba #Qwen #AIBenchmarks #DataContamination #MachineLearning

winbuzzer.com/2025/07/21/a...

2 1 0 0

Paul H

@paulus-maximus.bsky.social

9 months ago

ChatGPT polluted the world forever, like the first atom bomb Feature: Academics mull the need for the digital equivalent of low-background steel

Extremely interesting article here that posits AI generated training data may have poisoned data sources more widely, leading to a data equivalent of the need for #LowBackGroundSteel
#AI #Data #DataContamination
www.theregister.com/2025/06/15/a...

0 0 0 0

@ccahua.bsky.social

1 year ago

Scrap the work of others and monetize: “They are cheating,” says Cheng Xu, a Ph.D. student at University of College Dublin who led a recent survey of data contamination in AI benchmarks. #internet #profit #datacontamination

1 0 0 0