Advertisement · 728 × 90

Posts by Sachin Kumar

Super excited for this workshop, Mark your calendars!

1 year ago 4 0 0 0

We hope this paper encourages more thorough and diverse evaluations of interpretability and steering techniques going forward. (4/4)

1 year ago 0 0 0 0

A common theme we noticed across many methods we explored—and in much of the existing literature in this area—is the limited evaluation scope. Many such papers still use Pythia or Llama 1/2 which have very very different trends than many of the newer models (for reasons we couldn't pin down). (3/4)

1 year ago 4 0 1 0

This project began nearly a year ago when I was at Ai2. Activation steering and related ideas were incredibly appealing, and we explored applying them to a range of problems. But none of the techniques we tried led to meaningful improvements, which prompted a deeper investigation. (2/4)

1 year ago 0 0 1 0

Really excited for this paper to be out, led by @patqdasilva.bsky.social 👇. Follow him for more exciting work coming soon. (1/4)

1 year ago 0 0 1 0

I am looking for multiple emergency reviewers for December ARR for papers related to: disinformation, prompt engineering, reward modeling, and diffusion LMs. Please let me know if you can help!

1 year ago 2 1 0 0
Preview
The Art of Saying No: Contextual Noncompliance in Language Models Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of "unsafe" queries, we posit that the ...

We have queries like this in our recent paper: www.arxiv.org/abs/2407.12043

1 year ago 1 0 0 0
Post image

3 - Liwei Jiang leads the effort to scale jailbreaking tactics and build adversarially safer LMs (Friday 4.30pm PT):

1 year ago 2 0 0 0
Advertisement
Post image

2 - We build tokenizer free multilingual LMs; led by @orevaahia.bsky.social (Thursday 4.30pm):

1 year ago 2 0 1 0
Post image

1 - In the D&B track, we study language model noncompliance beyond only safety; co-led with Faeze Brahman (Thursday 11am PT):

1 year ago 1 0 1 0

En route Vancouver to attend #NeurIPS2024 and excited to be a part of the following papers 👇!

I am also recruiting multiple PhD students for Fall '25. DM me here or on Whova, if interested in: multilinguality, personalized alignment, real use inspired evals (see website in bio for details).

1 year ago 6 3 1 0
Post image

@shocheen.bsky.social and co will be at the Thursday poster session to present our paper on "Contextual Noncompliance"

1 year ago 10 1 1 0
A photo of Boulder, Colorado, shot from above the university campus and looking toward the Flatirons.

A photo of Boulder, Colorado, shot from above the university campus and looking toward the Flatirons.

I'm recruiting 1-2 PhD students to work with me at the University of Colorado Boulder! Looking for creative students with interests in #NLP and #CulturalAnalytics.

Boulder is a lovely college town 30 minutes from Denver and 1 hour from Rocky Mountain National Park 😎

Apply by December 15th!

1 year ago 303 136 9 12