Advertisement · 728 × 90

Posts by Guy Dar

All in all, it's hard to say how practically feasible it is to obtain without substantial leakage. Fortunately, there are many free parameters that can be tweaked here, and many variants to consider.

1 year ago 0 0 0 0

This allows an (approximate) causal variant of training data attribution -- understanding which data points contributed to the emergence of a capability!

1 year ago 0 0 1 0

A major advantage of this method over other methods is that it allows ⏳"time travel"⏳
Because we can trace which params were influenced by a data point, we can ablate or manipulate them!

1 year ago 0 0 1 0

The idea is related to locality-sensitive hashing (LSH) that sends similar vectors to close buckets. We train the model with a dropout mask that depends on the semantics of the input ("semantic dropout masks") to accomplish that.

1 year ago 0 0 1 0

In this work, I present a *sketch* of an idea around this. Instead of allocating inputs to rigid groups, we aim for fuzzy membership, such that semantically similar inputs update related subsets of the parameters.

1 year ago 0 0 1 0

For example, gradient routing partitions data points into disjoint groups and updates only a certain region in the network for each group. This method, as well as others, is limited to a predefined set of localizations.

1 year ago 0 0 1 0
Preview
Localization By Design via Semantic Dropout Masks Sketch of an idea for a novel and stronger localization by design

🚧 New blopost!! 🚧

📝 "Localization by design via semantic dropout masks"

Many recent works try to localize model behaviors to params and intervene upon them. Acknowledging how hard it is to do after training, several works have tried to train models that allow localization.

1 year ago 0 0 1 0
Advertisement
Post image

What's in an attention head? 🤯

We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨

A new preprint with Amit Elhelo 🧵 (1/10)

1 year ago 60 13 1 0