Advertisement · 728 × 90
#
Hashtag
#AlignmentWorkshop
Advertisement · 728 × 90
Post image

If you've ever wondered what an #AlignmentWorkshop is or why you need one (yes, you really do!), then this article’s for you.

You'll also learn how to prep & get everyone (yes, even THAT stakeholder) involved.

www.previousnext.com.au/blog/alignme...

#UserExperience #UX

1 1 0 0
Video

China classifies AI safety as a national security issue with cybersecurity, biological security & natural disasters.

Kwan Yee Ng outlined China’s policies: model registration, safety checks for gen AI, and AGI safety pilots in Beijing, Shanghai, etc. #AlignmentWorkshop

3 2 1 0
Video

RepE: Representations are weights & activations. Engineering is reading, probing & control—like brain scans for AI.

Andy Zou shows how top-down representational engineering improves AI honesty and jailbreak robustness. #AlignmentWorkshop

2 0 1 0
Video

"It’s like doing science without ever doing experiments."

Atticus Geiger critiques SAEs for relying on observational data alone, advocating for experimental methods and counterfactual states to enhance prediction, control, and understanding in AI interpretability. #AlignmentWorkshop

3 1 1 1
Video

"Most people do not, in fact, want to destroy the world. If we give them more information, they will make better decisions."

Beth Barnes shares METR's work on metrics to gauge AI risk, tackling challenges in model cost, elicitation, and transparency. #AlignmentWorkshop

4 2 1 1
Video

"If you literally catch your AI trying to escape, you have to stop deploying it."

Buck Shlegeris shares strategies for managing misaligned AI, including trusted monitoring and collusion-busting techniques to limit catastrophic risks as capabilities grow. #AlignmentWorkshop

1 0 1 0