Getting Started
Prerequisites
Please make sure you are running python3.9 or newer versions.
Also ensure you have weggli installed.
Their release is outdated and we're currently using revision
9d97d462854a9b682874b259f70cc5a97a70f2cc
:
cargo install weggli --rev=9d97d462854a9b682874b259f70cc5a97a70f2cc --git=https://github.com/weggli-rs/weggli
Set the environment variable WEGGLI_PATH
to point to the weggli execution
path. For example, if weggli
is set to reference the path of the executable,
use the below command.
export WEGGLI_PATH=weggli
Set up Virtual Environment
Set up your virtual environment and install dependencies:
python3 -m venv ~/.venvs/CybersecurityBenchmarks
source ~/.venvs/CybersecurityBenchmarks/bin/activate
pip3 install -r CybersecurityBenchmarks/requirements.txt
Running the Benchmarks
The main module for benchmark is CybersecurityBenchmarks.benchmark.run
. Run
python3 -m CybersecurityBenchmarks.benchmark.run --help
to get a list of available options.
We simplify the following commands by setting a DATASETS
environment variable:
export DATASETS=$PWD/CybersecurityBenchmarks/datasets
Each benchmark can run tests for multiple LLMs. Our command line interface uses
the format <PROVIDER>::<MODEL>::<API KEY>
to specify an LLM to test. We
currently support APIs from OPENAI, ANYSCALE, and TOGETHER. The followings are a
few examples:
OPENAI::gpt-4::<API KEY>
OPENAI::gpt-3.5-turbo::<API KEY>
ANYSCALE::meta-llama/Llama-2-13b-chat-hf::<API KEY>
TOGETHER::togethercomputer/llama-2-7b::<API KEY>
How to use a different model supported by Anyscale or Together
- Lookup the models supported by Anyscale or Together
- Update
valid_models
method in llm.py with the model name that you choose to test with.
How to run benchmarks for self hosted models
- Extend
llm.py
. def query(self, prompt: str) -> str:
Implement your inferencing logic.- Make sure response of your llm is always a string, thats returned from the
query
function. - Update supported providers in
llm.create
method with the name of LLM you created from step (1). - Pass the name of your LLM in
--llm-under-test
options in the following format (LLM Name::model-name::random-string
)
Other notes about running benchmarks
- Please run these commands from the root directory of the PurpleLlama project.
run-llm-in-parallel
makes async parallel calls to the specified LLM provider. This speeds up prompt evaluation.