Advertisement Β· 728 Γ— 90

Posts by Harsh Trivedi

Post image

Our AI & Scientific Discovery Workshop (@ NAACL 2025) broadly welcomes papers on all aspects of the scientific discovery process through the lens of AI / NLP.

Paper submission deadline: Jan 30/2025 (about 2 weeks).
We're excited to see you there!

1 year ago 3 1 0 3

Hey Marc! Thanks for this starter pack. Can you please add me to it as well?

1 year ago 7 0 0 0
AppWorld: Reliable Evaluation of Interactive Agents in a World of Apps and People. Happening at 11 AM EST online on Dec 2, 2024

AppWorld: Reliable Evaluation of Interactive Agents in a World of Apps and People. Happening at 11 AM EST online on Dec 2, 2024

🚨 Happening next Monday, 2 Dec, @cohere.com ! ✨
πŸ‘‹ Anyone can join remotely at this link:
πŸ‘‰ cohere.com/events/coher...
πŸ™ Thank you @sebruder.bsky.social for helping arrange it!!
πŸ“… Upcoming talks: appworld.dev/talks

1 year ago 8 1 0 0
A plot: the x axis is baseline score of rankers, in ndcg@10. y axis is delta of model score after an expansion is applied.

There are three sets of results, one dataset for each shift type: TrecDL (no shift), FiQA (domain shift), ArguAna (query shift).  For each set of result, the chart shows a scatter plot with a trend line. We observe the same trend for all: as the baseline score increases, the delta when using expansion decreases. 

On TREC DL, worst models have a base score of ~40, and improve by 10 points w/expansion. the best models have a score of >70, and their performance decreases by -5 points w/expansion.

On FiQA, worse models have a base score of ~15, and improve by 5 points w/expansion. the best models have a score of ~45, and their performance decreases by -3 point w/expansion.

On ArguAna, worst models have a base score of ~25, and improve by >20 points w/expansion. the best models have a score of >55, and their performance decreases by -1 point w/expansion.

A plot: the x axis is baseline score of rankers, in ndcg@10. y axis is delta of model score after an expansion is applied. There are three sets of results, one dataset for each shift type: TrecDL (no shift), FiQA (domain shift), ArguAna (query shift). For each set of result, the chart shows a scatter plot with a trend line. We observe the same trend for all: as the baseline score increases, the delta when using expansion decreases. On TREC DL, worst models have a base score of ~40, and improve by 10 points w/expansion. the best models have a score of >70, and their performance decreases by -5 points w/expansion. On FiQA, worse models have a base score of ~15, and improve by 5 points w/expansion. the best models have a score of ~45, and their performance decreases by -3 point w/expansion. On ArguAna, worst models have a base score of ~25, and improve by >20 points w/expansion. the best models have a score of >55, and their performance decreases by -1 point w/expansion.

Using LLMs for query or document expansion in retrieval (e.g. HyDE and Doc2Query) have scores going πŸ“ˆ

But do these approaches work for all IR models and for different types of distribution shifts? Turns out its actually more πŸ“‰ 🚨

πŸ“ (arxiv soon): orionweller.github.io/assets/pdf/L...

2 years ago 42 6 3 3

Great opportunity to see how (your) new coding agent methods stack up real world user tasks

1 year ago 3 1 0 0
Post image

Meet TΓΌlu 3, a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models πŸ‘‡

1 year ago 111 31 2 7

another starter pack, this time for folks (past & current) from Ai2 (@ai2.bsky.social) 😍

go.bsky.app/Qjyc97J

1 year ago 22 5 2 0
Advertisement

I thought to create a Starter Pack for people working on LLM Agents. Please feel free to self-refer as well.

go.bsky.app/LUrLWXe

#LLMAgents #LLMReasoning

1 year ago 15 5 11 0

🚨 We are refreshing the 🌎 AppWorld (appworld.dev) leaderboard with all the new coding and/or tool-use LMs.

❓ What would you like to be included?

πŸ”Œ Self-plugs are welcome!!

x.com/harsh3vedi/s...

1 year ago 7 2 0 1

Hi Nikolai! Mind adding me to this starter pack? Thanks!

1 year ago 1 0 1 0
EMNLP 2024 Tutorial: Language Agents: Foundations, Prospects, and Risks Deformable Neural Radiance Fields creates free-viewpoint portraits (nerfies) from casually captured videos.

Had a great time doing the language agent tutorial (language-agent-tutorial.github.io) with Yu Su, Shunyu Yao and Tao Yu πŸ˜€ #EMNLP2024

Check out our slides here: tinyurl.com/language-age...

1 year ago 33 5 0 0

Hi! Can you please add me to this list? Thank you!

1 year ago 1 0 0 0

Hi Michael! Can you please add me to this list? Thank you!

1 year ago 2 0 0 0

Hi Maria! Can you please add me to the list? Thank you!

1 year ago 0 0 0 0