Advertisement Β· 728 Γ— 90

Posts by Marzena Karpinska

Preview
Canada Impact+ Research Chairs

Thinking of relocating your lab to Canada? πŸ‡¨πŸ‡¦

There is still time to apply for #impact+ chair position at #SFU (until April 24th)

This comes with up to $1M/year research award for 8 years. Feel free to reach out if you have any questions!

www.sfu.ca/research/imp...

4 days ago 4 2 0 0
Preview
Canada Impact+ Research Chairs

Thinking of relocating your lab to Canada? πŸ‡¨πŸ‡¦

There is still time to apply for #impact+ chair position at #SFU (until April 24th)

This comes with up to $1M/year research award for 8 years. Feel free to reach out if you have any questions!

www.sfu.ca/research/imp...

4 days ago 4 2 0 0
Post image

We are happy to see these papers accepted to #ACL2026
Looking forward to the conference! 🎊

1 week ago 1 1 0 0
Preview
GitHub - davidjurgens/hallucinated-reference-finder Contribute to davidjurgens/hallucinated-reference-finder development by creating an account on GitHub.

If you're reviewing ARR papers and want a tool to help you spot potential hallucinated references, I cooked this up for the ACL SACs and thought I would share it with the broader community github.com/davidjurgens...

3 weeks ago 24 10 3 1

This is great, and at the same time I'm so sad it's needed :( Thank you for sharing!

3 weeks ago 1 0 0 0
Preview
TRAILS UMD Post Doctoral Associate Job Description - Spring 2026 Post Doctoral Associate Institute for Trustworthy AI in Law & Society February 2026 The Institute for Trustworthy AI in Law & Society (TRAILS) and the University of Maryland aim to transform the pr...

Come join TRAILS as a postdoc at UMD (and work w folks at GW, MSU & Cornell) to conduct research and scholarship focused on approaches to AI that advance trust and trustworthiness with a great group of colleagues!

🌐 go.umd.edu/trails-postd...
πŸ—“οΈ Summer/Fall 2026 start

1 month ago 4 6 0 1
Post image Post image

Last Friday, our "Women in AI Research" event brought together many talented young women! We were amazed by the level of discussion and project ideas the teams came up with!

Grateful to the organizers @petitegeek.bsky.social's Rosie Lab & #WiCS and our sponsors #WiCS & @wimlworkshop.bsky.social

1 month ago 4 3 0 0
Preview
Workshop StyGenAi StyGenAI is the first workshop dedicated specifically to the study of style in GenAI-translated content. Hosted at EAMT 2026, the workshop provides a focused forum for examining how large language mod...

Does your LLM have a "style"? 🎨
We are excited to announce #StyGenAI at #EAMT2026, a new workshop dedicated to #style in GenAI translation.
From literary translation to controlling tone and register: if it's about style, it belongs here.
πŸ”— CFP & Details here: sites.google.com/view/worksho...

2 months ago 2 1 0 0
Advertisement

This decision was made without consulting a single CU Boulder professor with AI expertise. And AFAIK there was one professor *total* involved.

More info, such as it is, here: https://www.cu.edu/gen-ai

Check the "Guiding Principles" section for entertainment.

2 months ago 53 24 8 8

'guiding principles' looks as if someone out there was already using that subscription 😬

2 months ago 2 0 0 0

Congratulations to all authors of @sfu-cs-ai.bsky.social papers accepted to @iclr-conf.bsky.social 2026 πŸ₯³πŸ₯³πŸ₯³πŸ₯³πŸ₯³

Please check out our work in 🧡

2 months ago 2 1 5 0
Screenshot showing the title and abstract of talk by Peter West. The text says:
"Title: Mapping the (Jagged) Landscape of LLM Capabilities

Abstract: One key missing piece for the broad adoption of LLMs is intuition, specifically, human intuition about when and how models will succeed or fail across the diverse tasks we might apply them to. Your LLM might write a well-reasoned essay on 14th-century theology, but does that mean it can accurately answer questions on the same topic? This talk will focus on one aspect of my research, which is the characterization of model capabilities to begin to develop these intuitions. I will discuss recent projects that try to identify where these capabilities break down, with a particular focus on high information examples which will necessitate new hypotheses of how exactly artificial intelligence functions.

Speaker info: Peter West is an assistant professor at the University of British Columbia, broadly working on the capabilities and limits of LLMs. For example: the divergence of AI from human intuitions of intelligence, unpredictability and creativity in models, and studying LLMs with a non-interventional natural sciences lens. Peter completed his PhD at the University of Washington, Paul G School of Computer Science and Engineering. He completed a postdoc at the Stanford Institute for Human-Centered AI. His work has been recognized with best, outstanding, and spotlight papers in NLP and AI conferences."

Screenshot showing the title and abstract of talk by Peter West. The text says: "Title: Mapping the (Jagged) Landscape of LLM Capabilities Abstract: One key missing piece for the broad adoption of LLMs is intuition, specifically, human intuition about when and how models will succeed or fail across the diverse tasks we might apply them to. Your LLM might write a well-reasoned essay on 14th-century theology, but does that mean it can accurately answer questions on the same topic? This talk will focus on one aspect of my research, which is the characterization of model capabilities to begin to develop these intuitions. I will discuss recent projects that try to identify where these capabilities break down, with a particular focus on high information examples which will necessitate new hypotheses of how exactly artificial intelligence functions. Speaker info: Peter West is an assistant professor at the University of British Columbia, broadly working on the capabilities and limits of LLMs. For example: the divergence of AI from human intuitions of intelligence, unpredictability and creativity in models, and studying LLMs with a non-interventional natural sciences lens. Peter completed his PhD at the University of Washington, Paul G School of Computer Science and Engineering. He completed a postdoc at the Stanford Institute for Human-Centered AI. His work has been recognized with best, outstanding, and spotlight papers in NLP and AI conferences."

This week we are excited to host Peter West from @cs.ubc.ca who will talk about β€œMapping the (Jagged) Landscape of LLM Capabilities.”

2 months ago 5 2 0 0

🚨 New Study 🚨

@arxiv.bsky.social has recently decided to prohibit any 'position' paper from being submitted to its CS servers.
Why? Because of the "AI slop", and allegedly higher ratios of LLM-generated content in review papers, compared to non-review papers.

2 months ago 29 9 2 2

I agree, though I'm afraid we had conferences starting /taking place during other important holidays that are not US-centric (yet in countries where a lot of people attend from)

3 months ago 0 0 1 0
Preview
Apply Online

If you are interested in working with me apply here by Jan 19th (a bit last minute): sfu.ca/gradstudies/...
Feel free to reach out with any questions!

3 months ago 2 0 0 0
Post image

Now is probably a good time to share that I left my job at
@microsoft.com (will forever miss this team) and moved to Vancouver, Canada, where I'm starting my lab as an assistant professor at the gorgeous @sfu.ca πŸ”οΈ
I'm looking to hire 1-2 students starting in Fall 2026. Details in 🧡

3 months ago 12 1 1 1

Such a terrible idea to replace a bad metric with a potentially worse one. Based on assumptions that every field treats author order in the same way + every order means same thing. Metrification made worse :( + discouraging collaborations.

5 months ago 4 0 0 0
Post image

🚨New paper on AI & copyright

Authors have sued LLM companies for using books w/o permission for model training.

Courts however need empirical evidence of market harm. Our preregistered study exactly addresses this gap.

Joint work w Jane Ginsburg from Columbia Law and @dhillonp.bsky.social 1/n🧡

5 months ago 22 12 1 1
Advertisement
Post image Post image

Well this is sure to be a blockbuster AI article... @jennarussell.bsky.social et al are kicking ass and taking names in journalism, both individuals and organizations.

"AI use in American newspapers is widespread, uneven, and rarely disclosed"
arxiv.org/abs/2510.18774

5 months ago 22 8 3 0

We didn't, in fact, this research started with our previous project, trying to break AI detectors with Pangram being the only one not failing. Since then, we have experimented with it, and it does an extremely good job.
(Some plots in the paper show it well, like fig8 with historical data)

5 months ago 2 0 2 0

AI is infiltrating American newsrooms.

Sadly, it is mostly *undisclosed* meaning that readers are often unaware that they are consuming LLM text.

Even worse, we find some of these texts making it to the print press (undisclosed)

Can we at least be honest about using models for editing?

5 months ago 5 0 0 0
Post image

AI is already at work in American newsrooms.

We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea.

Here's what we learned about how AI is influencing local and national journalism:

5 months ago 56 29 5 2
Post image

πŸ“’ Announcing the First Workshop on Multilingual and Multicultural Evaluation (MME) at #EACL2026 πŸ‡²πŸ‡¦

MME focuses on resources, metrics & methodologies for evaluating multilingual systems! multilingual-multicultural-evaluation.github.io

πŸ“… Workshop Mar 24–29, 2026
πŸ—“οΈ Submit by Dec 19, 2025

6 months ago 34 15 1 0
Screen cap from linked article, with heading Significance and then text:

Large Language Models (LLMs) are used in evaluative tasks across domains. Yet, what appears as alignment with human or expert judgments may conceal a deeper shift in how β€œjudgment” itself is operationalized. Using news outlets as a controlled benchmark, we compare six LLMs to expert ratings and human evaluations under an identical, structured framework. While models often match expert outputs, our results suggest that they may rely on lexical associations and statistical priors rather than contextual reasoning or normative criteria. We term this divergence epistemia: the illusion of knowledge emerging when surface plausibility replaces verification. Our findings suggest not only performance asymmetries but also a shift in the heuristics underlying evaluative processes, raising fundamental questions about delegating judgment to LLMs.

Sentence starting with "While models often" is highlighted in blue.

Screen cap from linked article, with heading Significance and then text: Large Language Models (LLMs) are used in evaluative tasks across domains. Yet, what appears as alignment with human or expert judgments may conceal a deeper shift in how β€œjudgment” itself is operationalized. Using news outlets as a controlled benchmark, we compare six LLMs to expert ratings and human evaluations under an identical, structured framework. While models often match expert outputs, our results suggest that they may rely on lexical associations and statistical priors rather than contextual reasoning or normative criteria. We term this divergence epistemia: the illusion of knowledge emerging when surface plausibility replaces verification. Our findings suggest not only performance asymmetries but also a shift in the heuristics underlying evaluative processes, raising fundamental questions about delegating judgment to LLMs. Sentence starting with "While models often" is highlighted in blue.

I'd love to see someone try to estimate just how much time and money has gone into research that is either fully undermined by reliance on LLMs or fully pointless --- because obvious if you start from an understanding of what LLMs actually are.

www.pnas.org/doi/10.1073/...

6 months ago 480 126 15 14

I'm not sure why people lost the ability to do related work properly but if you absolutely need to use AI at least proofread it? (And they most likely edited with ai)
www.pangram.com/history/01bf...

6 months ago 5 0 0 0
Advertisement
Post image Post image Post image Post image

The viral "Definition of AGI" paper tells you to read fake references which do not exist!

Proof: different articles present at the specified journal/volume/page number, and their titles exist nowhere on any searchable repository.

Take this as a warning to not use LMs to generate your references!

6 months ago 156 36 6 16
Preview
COLM 2025: 9 cool papers and some thoughts Reflections on the 2025 COLM conference, and a discussion of 9 cool COLM papers on benchmarking and eval, personas, and improving models for better long-context performance and consistency.

π‘΅π’†π’˜ π’ƒπ’π’π’ˆπ’‘π’π’”π’•! A rundown of some cool papers I got to chat about at #COLM2025 and some scattered thoughts

saxon.me/blog/2025/co...

6 months ago 22 7 1 1

Probably for the best, they had serious overflows because of this...

6 months ago 1 0 0 0

Wasn't this what ACL did this year?

6 months ago 0 0 1 0
Post image

Come to talk with us today about the evaluation of long form multilingual generation at the second poster session #COLM2025

πŸ“4:30–6:30 PM / Room 710 – Poster #8

6 months ago 6 2 0 0