Advertisement · 728 × 90

Posts by Cohere Labs

Whether you’re a researcher, builder, or just curious about AI’s cultural limitations, join this conversation!

Learn more: cohere.com/events/coher...

1 month ago 2 0 1 0

Ananya Sahu & Mehrnaz Mofakhami, research scholars at Cohere Labs, will explore:

🌏 How cultural awareness is currently tested in AI and what challenges remain
📜 What we’re learning from Tiny Aya and its support for underrepresented languages
💬 Your diverse perspectives and what you want to see next

1 month ago 1 0 1 0
Post image

AI is getting better at math. Better at code. But is it getting better at understanding cultural nuances? 🤔

Join us for “Cultural Awareness in AI — From Knowledge Tests to Social Norms and Beyond”, a conversation on what it means to build AI systems that work at global scale.

1 month ago 8 0 1 2
Preview
Cultural Awareness User Perception Survey Hello! Welcome to the Cultural Awareness Survey! This survey is authored by a team of researchers at Cohere Labs, who investigate cultural understanding in LLMs. Below are the instructions for comple...

Ensure your cultural perspective is represented. cohere.link/FyKPWbQ

1 month ago 1 0 0 0
Post image

Does AI truly understand different cultures and languages?

We’re surveying cultural awareness in real-world AI use.
✨ When cultural awareness matters in real-world AI use
💡 Whether AI reflects diverse norms, communication styles & knowledge
🫥Where AI falls short in cultural understanding

1 month ago 4 0 2 1

1) what? Cohere is here?!!!!
2) this is crazy

2 months ago 80 8 2 2

Woo hoo, who would have thought Canada would produce efficient massively multicultural models

2 months ago 40 2 1 0

🌱Very proud of our team's latest release 😊 meet Tiny Aya, a massively multilingual model with 3.35B parameters.

Tech report here: github.com/Cohere-Labs/...

2 months ago 33 7 1 0
Video

Tiny Aya is small enough to run on a phone and powerful enough to support 70+ languages. That unlocks offline translation, local education tools, community research, and real multilingual experimentation without cloud infrastructure. 📱

2 months ago 16 0 1 1
Post image

Tiny Aya shows what smaller models can do. It improves on previous Aya releases and outperforms models at similar size proving that smart multilingual design can rival larger models. This shows that focused multilingual research beats brute-force scaling—achieving more with less.

2 months ago 9 0 1 0
Advertisement
Post image

Built for balance, we narrow performance gaps across languages: Most multilingual models skew toward high-resource languages. Tiny Aya narrows that gap, sustaining stronger performance for underrepresented languages. 📈

2 months ago 13 0 1 2
Post image

Despite being smaller, Tiny Aya competes with 4B models across translation, mathematical reasoning, understanding, and generation with especially strong gains for African languages. 🌍

2 months ago 16 0 1 1
Post image

We take a stance for language diversity. Going beyond the one-fits-all paradigm, we release not only one instruction-finetuned model balancing all 70 languages (Tiny Aya Global), but accompany it with three region-focused models 🌐

2 months ago 15 0 1 0
Video

Introducing ✨Tiny Aya✨, a family of massively multilingual small language models built to run where people actually are.

Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model, efficient enough to run locally, even on a phone.

2 months ago 97 15 2 5
NeurIPS 2025 in San Diego. The Leaderboard Illusion: How LLM Rankings Are Gamed
NeurIPS 2025 in San Diego. The Leaderboard Illusion: How LLM Rankings Are Gamed YouTube video by Women in AI Research WiAIR

And Research Engineer, @shivalika.bsky.social : The Leaderboard Illusion. 😶‍🌫️

This paper reveals systematic biases and transparency gaps in the Chatbot Arena leaderboard.

www.youtube.com/watch?v=URho...

3 months ago 0 0 0 0
NeurIPS 2025 in San Diego. Treasure Hunt
NeurIPS 2025 in San Diego. Treasure Hunt YouTube video by Women in AI Research WiAIR

Sr Research Scientist, @juliakreutzer.bsky.social: Treasure Hunt paper. 🗺️

This work introduces a method to improve model performance by adding markers to tokens of the pretraining data, enabling real-time targeting of the long tail using training-time markers.

www.youtube.com/watch?v=K3BU...

3 months ago 0 0 1 0
Preview
Women in AI Research Podcast Celebrating the remarkable contributions of female AI researchers from around the globe

Excited to have two of our papers featured in
@j-novikova-nlp.bsky.social's @wiair.bsky.social podcast, as part of the NeurIPS reflection. ✨

Learn more / subscribe here women-in-ai-research.github.io and check out this thread 🧵 for our features...

3 months ago 1 1 1 0

What an incredible week it’s been at #NeurIPS2025! 🎉

Today is our last one at the booth. We've had a great week connecting with our community in San Diego.

Join our community to continue to connect with our research team: https://cohere.com/research/open-science/application

4 months ago 2 1 0 0

What's the story of your legend?

Join ML researchers building their legends with 40 cards that capture our shared journey—explore and build yours: https://lab-legends.vercel.app/ 🎯

4 months ago 0 0 0 0
Advertisement
Post image

Just 1 day left until #NeurIPS2025 kicks off! The Cohere and Cohere Labs teams are ready to dive into a packed week of research, conversations, and community at the San Diego Convention Center✨

Come visit our booth — we’d love to chat and send you home with some swag!

4 months ago 2 1 0 0

... @markusfreitag.bsky.social, Roman Grundkiewicz, @yupenghou.bsky.social, @phikoehn.bsky.social, @juliakreutzer.bsky.social, Saab Mansour, @sted19.bsky.social, Lorenzo Proietti, Parker Riley, Eduardo Sánchez, @patuchen.bsky.social, Mariya Shmatova, @zouharvi.bsky.social

5 months ago 3 0 0 0

You can find all details in our paper www2.statmt.org/wmt25/pdf/20... or discuss with us next week at the WMT Conference at #EMNLP2025.

Led by @kocmitom.bsky.social, Ekaterina Artemova, Eleftherios Avramidis, Eleftheria Briakou, @pinzhen.bsky.social, @mziizm.bsky.social...

5 months ago 2 0 1 0
Post image

⚖️ LLM-as-a-judge: mixed reliability.

Top systems reach ~95% pairwise accuracy open-ended and summarization tasks.
Smaller ones barely beat coin-flip territory at ~55%.

5 months ago 1 0 1 0
Post image

🤖Naturalness is still a significant challenge.

Across open-ended generation and cross lingual summarization, the biggest weakness isn’t coherence or accuracy, but it is sounding like a native speaker. Many outputs still feel robotic or translated.

5 months ago 1 0 1 0
Post image

🧠English isn’t always easiest.

Models like Gemini 2.5 Pro and Claude 4 sometimes did better in Korean, German, or Spanish than in English when solving reasoning tasks.

5 months ago 1 0 1 0
Post image

🧩Linguistic reasoning remains the toughest nut. 🥥

Even top models scored below 50% on linguistic reasoning tasks, showing that structured linguistic deduction is still an open challenge.

5 months ago 1 0 1 0

🌐 Language coverage matters.

Models don’t support all languages equally, and this skews rankings. Smaller open models especially struggle with broad coverage, affecting their aggregate ranking ⚠️

5 months ago 1 0 1 0
Post image

🧩 Linguistic reasoning on unseen languages
📝 Open-ended generation testing naturalness and usefulness
📘 Cross-lingual summarization
🔁 Machine translation
🧑‍⚖️ LLM-as-a-Judge evaluating outputs of other models

All backed by human evals and public releases of data + outputs!
github.com/wmt-conferen...

5 months ago 1 0 1 0
Advertisement
Post image

How well do LLMs handle multilinguality? 🌍🤖

🔬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.

5 months ago 3 2 1 0

River, Yinhong and I will all be in person and we look forward to the discussions!

5 months ago 3 1 0 0