Agent "security research" (ie attacking) is too easy these days. The one universal line about LLM security is to never trust model output. But agents have people blindly executing it. Security research there is like shooting fish in a barrel
Posts by Leon Derczynski
The core issues---studying how to build reliable language technology, how to use computers to been understand how language works, how to evaluate language technology, and how to reason about how language technology sits in its social context---all remain.
>>
Reminds me, I should back up my dropbox
huh, Harry Kim getting beaten up by Lyta Alexander was not on my background TV bingo card for today, but here we are
surprised (and grateful) I somehow still remembered on the first try the arcane incantation for quitting a telnet session. vi has nothing on this imo. muscle memory is weird
getting on arXiv isn't "being published"
come in, it definitely won't, arXiv is still the open door/no bar venue
😂 arXiv is cute when it pretends to have standards!
what thing
Come to LLMSEC at ACL & hear Niloofar's keynote
"What does it mean for agentic AI to preserve privacy?" - Niloofar Mireshghallah, Meta/CMU
(Friday 1st Aug, 11.00; Austria Center Vienna Hall B)
See you there!
#acl2025 #acl2025nlp
the "oyster tower" bit was great. brutal
Or he's pushing a product into which x sunk significant capex?
have the courage to use your own intelligence
logging on
Brazen of them. Sounds extremely awkward for you, I'm sorry. What would they have done with unaltered slides? Cancelled and left a gap in their schedule for no doubt paying participants?
new garak, llm vuln scanner rls (v0.12.0)
* Audio attacks, for multimodal models
* More training data membership inference attacks
* Multilingual attacks can now also use GCP
* Detailed eval summary in one JSONL row/object
+more :)
details: github.com/NVIDIA/garak...
the dying but clinging on battery in the bathroom's Frozen-branded soap dispenser reminds me that it's only 4-5 months til Bublé & Let It Go season. aren't you looking forward
why do academics send and expect so much weekend email and work. not healthy
It's been 2.5 years but ANY SECOND NOW, right?
data indicates students don't like using it, sorry
computer scientists encountering the concept of "desirable difficulty"
remembering the time i checked in to my reasonably classy russian business hotel late with my wife, and the staff said "sir, this... girl.. not allowed"
she's a serious professor
we went through to the room, opened the balcony door, and buried a bottle of champagne in the metre of snow
good times
@jjvincent.bsky.social woah ur really famous! love this attack also. I automate and run it for a living
www.instagram.com/reel/DKz9ezj...
Michael... OK...
Great to see our work uncovering dangerous issues in commercial LLM "therapists" getting some coverage: futurism.com/stanford-the...
I have not updated since Christmas, I see. Guess I'd better put on some summer Bublé
www.youtube.com/watch?v=oYlc...
"natwirkung"
"wirk smorter nat horder"
accents dreamed up by the utterly deranged
(what is going on with that 🇺🇸 vowel sheft)
what is this photoshoot
delete this omg
i need you to understand that "alternate uses" is a terrible test/definition of creativity and has been for some time. it's extremely narrow, very shallow, and misses almost everything we know about creativity