AI Security — LLM Threats
6 min read Quiz at the end
LLM threats: prompt injection, jailbreaking, training data poisoning, model extraction, indirect injection.
AI Security: LLM-Specific Threats
| Threat | Description | Impact | Mitigation |
|---|
| Prompt Injection | Malicious input overrides system instructions | Data exfiltration, policy bypass | Input sanitisation, output validation |
| Jailbreaking | Bypass safety guardrails via crafted prompts | Generate harmful content | Regular red-teaming, safety classifiers |
| Training Data Poisoning | Inject malicious data into fine-tuning set | Backdoor model behaviour | Data provenance, anomaly detection |
| Model Extraction | Repeated queries to replicate model | IP theft, bypass rate limits | Rate limiting, query anomaly detection |
| Hallucination Exploitation | Attacker crafts prompt to induce false facts | Misinformation, wrong decisions | RAG grounding, citation requirements |
| Indirect Prompt Injection | Malicious instructions in retrieved documents | Agent hijacking | Sanitise all tool/retrieval outputs |
| Membership Inference | Determine if data was in training set | Privacy violation | Differential privacy, output perturbation |
Topic Quiz · 1 questions
Test your understanding before moving on
1. What is indirect prompt injection in AI agents?
💡 Indirect injection hides attacker instructions in retrieved content (web pages, PDFs) — agents execute them unknowingly.