I would start with labeled datasets before later generating synthetic ones that fit a specific scenario. Let me know if this helps?
www.threatprompt.com/post/8-label...
Posts by Craig Balding
Three example beginner project ideas:
Healthcare: build a simple AI model to detect unusual access to patient data
Finance: Train an AI model to spot patterns in fake transactions using public datasets
Manufacturing: Create a basic AI project to predict maintenance issues from machine sensor data
Want to pivot your career to cyber but not sure where to start?
Start by combining your current domain skills with AI security to tackle risks specific to the industry you already operate in.
The best part? You can ask AI endless beginner cyber questions.
No dummy, add up the numbers for both Tech AND Marketing...
Great marketing guys! ;-)
The largest ever survey about AI Agents was just published by @langbase.
It's a real win for marketing at OpenAI...
"OpenAI leads in tech and marketing applications".
- LLM use for marketing: OpenAI 83%. vs. Anthropic 41%
- LLM use for technology: OpenAI 76% vs. Anthropic 87%
76% > 87% = Huh?
• Capability Retention: Even when jailbroken, agents maintained full performance in executing complex multi-step tasks.
The benchmark's 110 tasks (covering fraud, cybercrime, and harassment) demonstrate how synthetic tools can safely mimic real-world misuse.
How are you limiting AI agent risk?
• Malicious Compliance: LLMs like Mistral Large 2 refused only 1.1% of harmful requests, revealing critical gaps in safety mechanisms.
• Jailbreak Vulnerabilities: Simple, universal jailbreaks increased GPT-4o’s compliance with harmful tasks from 48.4% to 72.7%, while refusal rates dropped sharply…
LLM agents are increasingly trusted with critical workflows, yet remain alarmingly vulnerable to manipulation and misuse.
The AgentHarm benchmark, developed by Gray Swan AI and the UK AI Safety Institute identified…
- False positives: Increased refusal rates on benign prompts (e.g., 4% to 39% on OR-Bench).
- False negatives: Vulnerable to multi-prompt attacks - jailbroken within 3 hours.
It's currently unclear if AI circuit breakers can keep pace with evolving attack strategies.
“It’s hard to make AI safer without making it less useful.”
AI circuit breakers aim to halt harmful outputs but suffer the classic security trade off.
Research against a market leading AI security and safety control found…
HITL done right enhances security AND process quality.
3. Identify what to surface for key decisions: AI reasoning, inputs, security rules & thresholds.
4. Design for HITL: UX, logging, and metrics matter.
5. Train the human: AI ops + domain expertise = effective oversight.
6. Iterate: Test, learn, adapt...
What does success look like?
1. Assess agent value: Band-aid, true asset, or unmitigable risk? (Critical for regulated or vulnerable-serving orgs.)
2. Map processes: Chart workflows and benchmark AI performance in different settings...
AI-powered agents are set to dominate next year.
The difference between chaos and control?
How well adopters navigate Human-in-the-Loop (HITL) integration.
HITL keeps humans involved in critical AI decisions - overseeing, validating, and guiding automated outputs...
Protect your local LLMs - know where they are, harden the hosts, limit access and monitor guardrails for misuse.
"Living off your local LLM" enables real-time attack script creation within your internal network.
Feed the LLM response to an interpreter and execute without leaving a trace.
Local code-generating LLMs supercharge attackers' "Living off the Land" tactics.
Classic LOTL sidesteps detection by steering clear of adding any traceable tools to a compromised network.