Parameter Lab (@parameterlab) Bsky

If you care about rigorous evaluation of agentic systems, give it a look at MASEval!

The harness is an important element of agents. MASEval makes it straightforward to change its components and evaluate their impact.

MASEval is our first software! parameterlab.github.io/MASEval/

⬇️

1 week ago 1 0 0 0

‼️New paper from Parameter Lab!

⛓️‍💥 We identify privacy collapse, a silent failure mode of LLMs: LLMs fine-tuned on seemingly benign data can lose their ability to respect contextual privacy norms.

Done by @anmolgoel.bsky.social during his internship!

Check-out 👇

2 months ago 3 1 0 0

#llm #ai #efficientai #nlp #mlresearch #reasoning #adaptivecompute | Ahmed Heakl | 19 comments Super excited to share the last work from my internship in Germany 🇩🇪! 🚀 Dr.LLM: Dynamic Layer Routing for LLMs > What if we can reduce computation AND increase accuracy? 🤯 Most prompts don’t need ev...

👏 Proud to share that the paper that Ahmed Heakl authored during his internship at Parameter Lab was accepted at #ICLR2026!

See how 🩺Dr.LLM increases accuracy and decreases inference computations of frozen LLMs: www.linkedin.com/posts/ahmed-...

2 months ago 4 0 0 0

Our #EMNLP2025 paper Leaky Thoughts 🫗 shows that Large Reasoning Models (LRMs) can unintentionally leak sensitive information hidden in their internal thoughts.

📍 Come chat with Tommaso at our poster on Friday 7th, 10:30–12:00 in Hall C3
📄 aclanthology.org/2025.emnlp-m...

4 months ago 2 1 0 0

Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers We study privacy leakage in the reasoning traces of large reasoning models used as personal agents. Unlike final outputs, reasoning traces are often assume...

We challenge the view that reasoning traces are a safe internal part of a model’s process. Our work shows they can leak information, through both deliberate attacks and accidental leakage.

RTAI: researchtrend.ai/papers/2506....
ArXiv: arxiv.org/abs/2506.15674
Code: github.com/parameterlab...

2/2

7 months ago 1 0 0 0

Overall diagram about contextual privacy & LRMs

🫗 An LLM's "private" reasoning may leak your sensitive data!

🎉 Excited to share our paper "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers" was accepted at #EMNLP main!

1/2

7 months ago 5 1 1 2

Work done with: Haritz Puerto, Martin Gubri ‪‪@mgubri.bsky.social‬ , Tommaso Green, Sangdoo Yun and Seong Joon Oh @coallaoh.bsky.social‬
#SEO #AI #LLM #GenerativeAI #Marketing #DigitalMarketing #Perplexity #NLProc

9 months ago 1 0 0 0

Key takeaways:
❌ C-SEO doesn’t help improve visibility in AI answers.
🔎 Traditional SEO is your tool for online visibility.
🚀 Our benchmark sets the stage to develop C-SEO methods that might work in the future.

9 months ago 0 0 1 0

🔎 The results are clear: current C-SEO strategies don’t work. This challenges the recent hype and suggests that creators don’t need to game LLMs and create even more clickbaits. Just focus on producing genuinely good content and let traditional SEO do its work.

9 months ago 0 0 1 0

C-SEO Bench evaluates Conversational Search Engine Optimization (C-SEO) techniques on two key tasks:
🔍 Product Recommendation
❓ Question Answering
Spanning multiple domains, it tests both domain-specific performance and the generalization of C-SEO methods.

9 months ago 0 0 1 0

Illustration of a conversational search engine for product recommendation. After applying a C-SEO method on the third document, its ranking gets boosted by +2 positions.

💥 With the rise of conversational search, a new technique of "Conversational SEO" (C-SEO) emerged, claiming it can boost content inclusion in AI-generated answers. We put these claims to the test by building C-SEO Bench, the first comprehensive benchmark to rigorously evaluate these new strategies.

9 months ago 0 0 1 0

Paper thumbnail.

🔎Does Conversational SEO actually work? Our new benchmark has an answer!
Excited to announce our new paper: C-SEO Bench: Does Conversational SEO Work?

🌐 RTAI: researchtrend.ai/papers/2506....
📄 Paper: arxiv.org/abs/2506.11097
💻 Code: github.com/parameterlab...
📊 Data: huggingface.co/datasets/par...

9 months ago 2 1 1 1

Excited to share that our paper "Scaling Up Membership Inference: When and How Attacks Succeed on LLMs" will be presented next week at #NAACL2025!
🖼️ Catch us at Poster Session 8 - APP: NLP Applications
🗓️ May 2, 11:00 AM - 12:30 PM
🗺️ Hall 3
Hope to see you there!

11 months ago 2 1 0 0

Ready to Join? Send your resume + a short note on why you’re a great fit to recruit@parameterlab.de.
Be part of a team that’s redefining research with AI! #Hiring #DataEngineer #AI #RemoteJobs

1 year ago 0 0 0 0

Why Join Us?
🚀 Make a Difference – Your work directly enhances how research is shared and discovered.
🌍 Flexibility – Choose full-time or part-time, work remotely or locally.
⚡ Innovative Environment – AI, research, and data-driven solutions all in one place.
🤝 Great Team

1 year ago 0 0 1 0

What You Bring:
✅ Proficiency in Airflow & PostgreSQL – Complex workflows and databases.
✅ Strong Python Skills – Clean, efficient, and maintainable code is your thing.
✅ (Bonus) Experience with LLMs – A huge plus as we integrate AI-driven solutions.
✅ Problem-Solving Mindset
✅ Team Spirit

1 year ago 0 0 1 0

What You’ll Do:
✔ Build Scalable Data Pipelines – Design and optimize workflows using tools like Airflow.
✔ Work Closely with AI Experts & Engineers – Collaborate to solve real-world data challenges.
✔ Optimize and Maintain Systems – Keep our data infrastructure fast, secure, and adaptable.

1 year ago 0 0 1 0

Our LLM-powered ecosystem also bridges the gap between cutting-edge research and industry leaders. If you're passionate about data, AI, and making an impact, we’d love to have you on board!

1 year ago 0 0 1 0

ResearchTrend.AI Explore the most trending research topics in AI

👥 We're Hiring: Senior/Junior Data Engineer!

📍 Remote or Local | Full-Time or Part-Time

At ResearchTrend.AI, we’re building a platform that connects researchers and AI engineers worldwide—helping them stay ahead with daily digests, insightful summaries, and interactive events.

1 year ago 2 0 1 1

🔎 Wonder how to prove an LLM was trained on a specific text? The camera ready of our Findings of #NAACL 2025 paper is available!
📌 TLDR: longs texts are needed to gather enough evidence to determine whether specific data points were included in training of LLMs: arxiv.org/abs/2411.00154

1 year ago 5 1 0 0

We are delighted to announce that our research paper on the scale of LLM membership inference has been accepted for publication in the Findings of #NAACL2025! 🎉

1 year ago 4 0 0 0

Careers | Parameter Lab Join us at Parameter Lab to shape the future of safe AI. In our dynamic and inclusive environment, we focus not only on our mission but also on fostering your personal growth through rewarding work ex...

There's an internship opening at @parameterlab.bsky.social : parameterlab.de/careers

The research outputs have been quite successful so far: researchtrend.ai/organization...

1 year ago 6 2 2 1

🎉We’re pleased to share the release of the models from our Apricot🍑 paper, accepted at ACL 2024!
At Parameter Lab, we believe openness and reproducibility are essential for advancing science, and we've put in our best effort to ensure it.
🤗 huggingface.co/collections/...
🧵 bsky.app/profile/dnns...

1 year ago 9 3 0 0

GitHub - parameterlab/mia-scaling: Source code of "Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models" Source code of "Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models" - parameterlab/mia-scaling

🔗 Links: Code and results https://github.com/parameterlab/mia-scaling Project Website: https://haritzpuerto.github.io/scaling-mia/ Paper: https://arxiv.org/pdf/2411.00154

1 year ago 0 0 0 0

🙌 Team Credits: This research was conducted by Haritz Puerto @mgubri.bsky.social @oodgnas.bsky.social and @coallaoh.bsky.social with support from NAVER AI Lab. Stay tuned for more updates! 🚀

1 year ago 1 0 1 0

🤓 Want More? Check out the community page of MIA for LLMs in http://ReserachTrend.AI https://researchtrend.ai/communities/MIALM You can see related works, the evolution of the community, and top authors!

1 year ago 0 0 1 0

💬 What Do You Think? Could MIA reach a level where data owners use it as legal evidence? How might this affect LLM deployment? Let us know! #AI #LLM #NLProc

1 year ago 0 0 1 0

🌐 Implications for Data Privacy: Our findings have real-world relevance for data owners worried about unauthorized use of their content in model training. It can also be used to support accountability of LLM evaluation in end-tasks.

1 year ago 0 0 1 0

🔎 Better Results in Fine-Tuning: Fine-tuned models show even stronger MIA results. The table shows the performance at sentence level and for collections of 20 sentences, evaluated on Phi-2 fine-tuned for QA (https://huggingface.co/haritzpuerto/phi-2-dcot ).

1 year ago 0 0 1 0

🔬 Our Testing Setup: We ran experiments using Pythia models (2.8B and 6.9B parameters) with training samples from The Pile dataset, comparing them to validation and test sets. This setup avoids data leakage to ensure a reliable evaluation of MIA.

1 year ago 0 0 1 0

Posts by Parameter Lab