Prompt Guard
Prompt Guard is a new model for guardrailing LLM inputs against prompt attacks - in particular jailbreaking techniques and indirect injections embedded into third party data. For more information, see our Model card.
Download
Quick Start
PromptGuard is based on mDeBERTa, and can be loaded straightforwardly using the
huggingface transformers
API with local weights or from huggingface:
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
We've added examples using Prompt Guard in the Llama recipes repository. In particular, take a look at the Prompt Guard Tutorial and Prompt Guard Inference utilities.
Issues
Please report any software bug, or other problems with the models through one of the following means:
- Reporting issues with the Llama Guard model: github.com/meta-llama/PurpleLlama
- Reporting issues with Llama in general: github.com/meta-llama/llama3
- Reporting bugs and security concerns: facebook.com/whitehat/info
Licsense
Our model and weights are licensed for both researchers and commercial entities, upholding the principles of openness. Our mission is to empower individuals, and industry through this opportunity, while fostering an environment of discovery and ethical AI advancements.
The same license as Llama 3 applies: see the LICENSE file, as well as our accompanying Acceptable Use Policy.
References
- Llama 3 Paper
- CyberSecEval 3