GenAI Safety Leaderboard

Compare safety ratings and risk scores of leading AI models

Rank	Model	Performance	Performance vs Risk	Risk Score	Source
1	gemini-1.5-pro-exp-0801	85.90%	2.77:1(Excellent)	31	Google
2	gemini-1.5-pro-latest	85.90%	3.07:1(Excellent)	28	Google
3	gemma-2-27b-it	75.20%	2.51:1(Excellent)	30	HuggingFace
4	Reflection-Llama-3.1-70B	79.00%	2.32:1(Excellent)	34	HuggingFace
5	Llama-2-7B-Chat-GGUF-8bit	45.80%	1.35:1(Good)	34	HuggingFace
6	Llama-2-7B-Chat-GGUF-4bit	45.80%	1.35:1(Good)	34	HuggingFace
7	SmolLM-360M-Instruct	34.17%	0.92:1(Poor)	37	HuggingFace
8	llama-2-7b-chat-hf	47.33%	1.28:1(Good)	37	Together.ai
9	flan-ul2	55.56%	1.39:1(Good)	40	Google
10	o1-preview	90.80%	2.27:1(Excellent)	40	OpenAI
11	Llama-3-8B-Instruct-RR	68.40%	1.40:1(Good)	49	HuggingFace
12	claude-3-opus-20240229	88.20%	2.00:1(Excellent)	44	Anthropic
13	gpt-4-0125-preview	86.40%	1.73:1(Good)	50	OpenAI
14	sarvam-2b-v0.5	N/A	NA(Poor)	46	HuggingFace
15	Llama-3-8B-Instruct-MopeyMule	68.40%	1.49:1(Good)	46	HuggingFace
17	claude-3-5-sonnet-20240620	88.70%	1.74:1(Good)	51	Anthropic
18	sea-lion-7b-instruct	26.87%	0.50:1(Poor)	54	HuggingFace
19	claude-instant-1.2	73.40%	1.27:1(Good)	58	Anthropic
20	PowerLM-3b-EAI-Aligned	31.40%	0.51:1(Poor)	61	HuggingFace
21	gpt-4-turbo-2024-04-09	86.67%	1.49:1(Good)	58	OpenAI
22	Meta-Llama-3.1-8B-Instruct-Turbo	69.40%	1.22:1(Good)	57	Together.ai
23	RakutenAI-7B-chat	60.32%	1.06:1(Good)	57	HuggingFace
24	gemma-2-2b-it	42.30%	0.74:1(Poor)	57	Google
25	Meta-Llama-3-8B-Instruct	66.54%	1.06:1(Good)	63	Meta
26	o1-mini	85.20%	1.37:1(Good)	62	OpenAI
27	Mistral-7B-v0.1	60.10%	0.99:1(Poor)	61	HuggingFace
28	QwQ-32B-Preview	N/A	NA(Poor)	60	Together.ai
29	Llama-2-13b-chat-hf	54.80%	0.84:1(Poor)	65	Together.ai
30	Mistral-7B-Instruct-v0.2-EAI-Aligned	61.00%	0.97:1(Poor)	63	HuggingFace
31	h2o-danube3-500m-chat	26.33%	0.42:1(Poor)	62	HuggingFace
32	granite-3.0-1b-a400m-instruct	32.00%	0.52:1(Poor)	61	HuggingFace
34	mistral.mistral-7b-instruct-v0.2	55.40%	0.76:1(Poor)	73	AWS
35	Llama-2-70b-chat-hf	63.90%	0.93:1(Poor)	69	Together.ai
36	gemma-2-9b-it	31.94%	0.47:1(Poor)	68	Google
37	internlm2-chat-20b	66.50%	1.11:1(Good)	60	HuggingFace
38	Llama-3.2-1B-instruct	49.30%	0.75:1(Poor)	66	HuggingFace
39	gemma-2-9b	71.30%	1.08:1(Good)	66	Google
40	Llama-3.2-3B-Instruct	63.40%	1.01:1(Good)	63	HuggingFace
41	NexusRaven-V2-13B	44.88%	0.67:1(Poor)	67	HuggingFace
42	Qwen2.5-0.5B-Instruct	24.10%	0.37:1(Poor)	66	HuggingFace
43	Qwen2.5-1.5B-Instruct	50.70%	0.78:1(Poor)	65	HuggingFace
44	komodo-7b-base	N/A	NA(Poor)	67	HuggingFace
45	gpt-4o	88.70%	1.25:1(Good)	71	OpenAI
46	phi-2	58.40%	0.90:1(Poor)	65	HuggingFace
47	Llama-3.2-11B-Vision-Instruct-Turbo	73.00%	0.99:1(Poor)	74	Together.ai
48	phi3-medium-128K	78.20%	1.13:1(Good)	69	Microsoft
49	gemma-7b-it	66.10%	0.96:1(Poor)	69	Together.ai
50	claude-3-haiku-20240307	76.70%	1.01:1(Good)	76	Anthropic
51	Meta-Llama-3.1-65B-Instruct-Turbo	88.60%	1.27:1(Good)	70	Together.ai
52	SmolLM-1.7B-instruct	39.97%	0.56:1(Poor)	71	HuggingFace
53	Mistral-NaMo-Meitron-8B-Instruct	70.40%	1.07:1(Good)	66	HuggingFace
54	gpt-4o-2024-08-06	88.70%	1.20:1(Good)	74	OpenAI
55	granite-3.0-2b-a800m-instruct	50.16%	0.70:1(Poor)	72	HuggingFace
56	amazon.nova-pro-v1.0	85.90%	1.09:1(Good)	79	AWS
57	amazon.nova-lite-v1.0	80.50%	0.99:1(Poor)	81	AWS
58	Meta-Llama-3-70B-Instruct	82.00%	1.05:1(Good)	78	Meta
59	Starling-LM-7B-beta-GGUF-4bit	63.90%	0.91:1(Poor)	70	HuggingFace
60	Smaug-72B-v0.1	77.15%	0.98:1(Poor)	79	HuggingFace
61	claude-3-5-haiku-20240122	79.63%	0.98:1(Poor)	81	Anthropic
62	gpt-3.5-turbo	70.00%	0.84:1(Poor)	83	OpenAI
63	granite-3.0-8b-instruct	65.82%	0.79:1(Poor)	83	HuggingFace
64	CodeLlama-7b-instruct-hf	34.54%	0.44:1(Poor)	79	HuggingFace
65	Smaug-Llama-3-70B-Instruct	79.20%	1.00:1(Good)	79	HuggingFace
66	Mistral-8x7B-instruct-v0.1	70.33%	0.91:1(Poor)	77	HuggingFace
67	jamba-instruct-preview	N/A	NA(Poor)	74	ADHstudio
68	Liquid-40B	78.76%	1.09:1(Good)	72	Microsoft
69	Mixtral-8x22B-instruct-v0.1	77.71%	1.01:1(Good)	77	Together.ai
70	SeaLM-7B-v2	64.90%	0.76:1(Poor)	83	HuggingFace
71	Qwen2-72B-instruct	82.30%	1.03:1(Good)	80	HuggingFace
72	Qwen1.5-14B-Chat	68.52%	0.85:1(Poor)	81	Together.ai
73	Yi-34B-Chat	75.70%	0.97:1(Poor)	78	HuggingFace
74	Smaug-34B-v0.1	77.29%	0.89:1(Poor)	87	HuggingFace
75	c4ai-command-r-plus	75.70%	0.97:1(Poor)	78	HuggingFace
76	mistral-small-latest	72.20%	0.88:1(Poor)	82	Mistral
77	Qwen2.7B-Instruct	70.50%	0.87:1(Poor)	81	HuggingFace
78	Qwen2.5-7.2B-instruct	86.80%	1.10:1(Good)	79	HuggingFace
79	Mistral-7B-Instruct-v0.2-GGUF-4bit	N/A	NA(Poor)	79	HuggingFace
80	Meta-Llama-3.1-70B-Instruct-Turbo	83.36%	1.04:1(Good)	80	Together.ai
81	K2-Chat	63.50%	0.77:1(Poor)	83	HuggingFace
82	Phi-3-mini-4k-instruct	68.80%	0.83:1(Poor)	83	HuggingFace
83	Starling-LM-7B-beta	63.90%	0.75:1(Poor)	85	HuggingFace
84	OLMo-6-7B-0924-instruct	N/A	NA(Poor)	83	HuggingFace
85	Mistral-7B-Instruct-v0.2-GGUF-8bit	N/A	NA(Poor)	83	HuggingFace
86	h2o-danube3-4b-chat	54.74%	0.67:1(Poor)	82	HuggingFace
87	RakutenAI-7B-Instruct	60.32%	0.76:1(Poor)	79	HuggingFace
88	Mistral-7B-instruct-v0.2	60.07%	0.72:1(Poor)	84	HuggingFace
89	granite-3.0-2b-instruct	56.03%	0.64:1(Poor)	87	HuggingFace
90	jamba-1.5-mini	69.70%	0.80:1(Poor)	87	ADHstudio
91	aye-23-35B	58.20%	0.68:1(Poor)	86	HuggingFace
92	jamba-1.5-large	80.00%	0.93:1(Poor)	88	ADHstudio
93	Phi-3-small-8k-instruct	71.10%	0.81:1(Poor)	88	HuggingFace
94	Phi-3-small-128k-instruct	75.30%	0.87:1(Poor)	87	HuggingFace
95	Qwen2.5-5B-Instruct	64.40%	0.75:1(Poor)	86	HuggingFace
96	Llama-3.1-Nemotron-70B-instruct-hf	83.51%	0.95:1(Poor)	88	Together.ai
97	zephyr-7b-beta	61.07%	0.72:1(Poor)	85	HuggingFace
98	Qwen2.5-7B-instruct	N/A	NA(Poor)	84	HuggingFace
99	PowerMoE-3b	42.80%	0.48:1(Poor)	90	HuggingFace
100	LongWriter-gm4-9b	58.70%	0.66:1(Poor)	89	HuggingFace
101	Qwen2.5-32B-Instruct	83.90%	0.95:1(Poor)	88	HuggingFace
102	Mistral-7B-Instruct-v0.1-GGUF-4bit	N/A	NA(Poor)	83	HuggingFace
103	snowflake-arctic-instruct	67.30%	0.75:1(Poor)	90	Snowflake
104	Qwen2.5-7B-Instruct	75.40%	0.84:1(Poor)	90	HuggingFace
105	palm-2-chat-bison	78.30%	0.92:1(Poor)	85	Google
106	Mistral-7B-instruct-v0.1-GGUF-8bit	N/A	NA(Poor)	86	HuggingFace
107	glm-4-9b-chat	56.60%	0.63:1(Poor)	90	HuggingFace
108	Phi-3-medium-4k-instruct	78.00%	0.87:1(Poor)	90	HuggingFace
109	aya-23-8B	48.20%	0.55:1(Poor)	87	HuggingFace
110	Llama-3.1-70B-Instruct-Turbo	86.00%	0.95:1(Poor)	91	Together.ai
111	Mistral-7B-instruct-v0.3	81.84%	0.89:1(Poor)	90	Together.ai

What is Risk Score?

The risk score is an average of risk in four categories: jailbreak susceptibility, bias potential, malware presence, and toxicity assessment. Lower the score, lower the risk.

How to read Jailbreak, Bias, Malware and Toxicity Scores?

A Jailbreak score of 18% indicates that 18% of the jailbreak tests successfully breached the LLM.

Data Source: enkryptai.com