#AlignmentWorkshop hashtag - Bluesky

@previousnext.com.au

7 months ago

If you've ever wondered what an #AlignmentWorkshop is or why you need one (yes, you really do!), then this article’s for you.

You'll also learn how to prep & get everyone (yes, even THAT stakeholder) involved.

www.previousnext.com.au/blog/alignme...

#UserExperience #UX

1 1 0 0

FAR.AI

@far.ai

1 year ago

China classifies AI safety as a national security issue with cybersecurity, biological security & natural disasters.

Kwan Yee Ng outlined China’s policies: model registration, safety checks for gen AI, and AGI safety pilots in Beijing, Shanghai, etc. #AlignmentWorkshop

3 2 1 0

FAR.AI

@far.ai

1 year ago

RepE: Representations are weights & activations. Engineering is reading, probing & control—like brain scans for AI.

Andy Zou shows how top-down representational engineering improves AI honesty and jailbreak robustness. #AlignmentWorkshop

2 0 1 0

FAR.AI

@far.ai

1 year ago

"It’s like doing science without ever doing experiments."

Atticus Geiger critiques SAEs for relying on observational data alone, advocating for experimental methods and counterfactual states to enhance prediction, control, and understanding in AI interpretability. #AlignmentWorkshop

3 1 1 1

FAR.AI

@far.ai

1 year ago

"Most people do not, in fact, want to destroy the world. If we give them more information, they will make better decisions."

Beth Barnes shares METR's work on metrics to gauge AI risk, tackling challenges in model cost, elicitation, and transparency. #AlignmentWorkshop

4 2 1 1

FAR.AI

@far.ai

1 year ago

"If you literally catch your AI trying to escape, you have to stop deploying it."

Buck Shlegeris shares strategies for managing misaligned AI, including trusted monitoring and collusion-busting techniques to limit catastrophic risks as capabilities grow. #AlignmentWorkshop

1 0 1 0