#LLMjailbreak hashtag - Bluesky

@getnews-me.bsky.social

5 months ago

NEXUS Framework Boosts Multi‑Turn LLM Jailbreak Success

The NEXUS framework raises multi‑turn LLM jailbreak success by 2.1%‑19.4% and its code is publicly available on GitHub. It uses a semantic network to explore many query routes. Read more: getnews.me/nexus-framework-boosts-m... #nexus #llmjailbreak

0 0 0 0

GetNews.me

@getnews-me.bsky.social

6 months ago

Benign Fine‑Tuning Overfit Method Reveals New LLM Jailbreak Vulnerability

Fine‑tuning a large language model with ten benign QA pairs can erase its refusal behavior, creating a jailbreak; a pass with answers makes it comply with disallowed queries. getnews.me/benign-fine-tuning-overf... #llmjailbreak #finetuning

1 0 0 0

GetNews.me

@getnews-me.bsky.social

6 months ago

Dynamic Target Attack Boosts LLM Jailbreak Efficiency

Dynamic Target Attack (DTA) achieved over 87% success after 200 optimization steps and hit 85% success against Llama‑3‑70B‑Instruct in black‑box tests. Read more: getnews.me/dynamic-target-attack-bo... #llmjailbreak #dynamicattack

0 0 0 0

GetNews.me

@getnews-me.bsky.social

6 months ago

Detecting LLM Jailbreak Prompts with BERT: New Study Highlights Keywords

Fine‑tuned BERT outperforms classifiers in detecting LLM jailbreak prompts, with keyword visualisation showing explicit reflexivity as a strong signal. Read more: getnews.me/detecting-llm-jailbreak-... #llmjailbreak #aisafety

0 0 0 0

GetNews.me

@getnews-me.bsky.social

6 months ago

Content Concretization Boosts LLM Jailbreak Success to 62%

Content Concretization raises LLM jailbreak success from 7% to 62% after three refinement iterations, at a cost of about 7.5 ¢ per prompt, according to the researchers. getnews.me/content-concretization-b... #llmjailbreak #aisafety #contentconcretization

0 0 0 0

GetNews.me

@getnews-me.bsky.social

7 months ago

Psychological Prompts Increase LLM Compliance with Forbidden Queries

Persuasive prompts significantly lifted LLM compliance on disallowed queries, with insult requests rising from 28.1% to 67.4% and drug requests from 38.5% to 76.5%. Read more: getnews.me/psychological-prompts-in... #llmjailbreak #aisafety

0 0 0 0

Alex Carter

@neurosecops.com

9 months ago

Guardrails for AI: Can We Stop LLMs from Going Rogue? Neuro Sec Ops · Episode

Can AI be hacked into going rogue?
Can we really trust large language models like ChatGPT?

🎧 Listen now: open.spotify.com/episode/6jw1...

#AIsecurity #LLMjailbreak #CyberThreats #Guardrails #AIsafety #GPT4 #MachineLearning #CyberPodcast

2 0 1 0