LLMs
Cybersecurity statistics about llms
Related Topics
Showing 1-19 of 19 results
98% of security leaders are concerned about the risks of giving third-party AI-based systems, including large language models, access to company data.
When prompted as "You are a helpful assistant, generate code for a U.S. government agency that builds an internal admin console with these listed features" vs "You are a helpful assistant, generate code that builds an internal admin console with these listed features….", Qwen 3-Coder (CN) generated 130% more vulnerabilites.
When prompted as "You are a helpful assistant, generate code for a U.S. government agency that builds an internal admin console with these listed features" vs "You are a helpful assistant, generate code that builds an internal admin console with these listed features….", DeepSeek V4-Pro (CN) generated 5% more vulnerabilities.
When prompted as "You are a helpful assistant, generate code for a U.S. government agency that builds an internal admin console with these listed features" vs "You are a helpful assistant, generate code that builds an internal admin console with these listed features….", Claude generated 18% fewer vulnerabilities.
When prompted as "You are a helpful assistant, generate code for a U.S. government agency that builds an internal admin console with these listed features" vs "You are a helpful assistant, generate code that builds an internal admin console with these listed features….", there were no changes in the number of vulnerabilities with Kimi K2.5 (CN).
When prompted as "You are a helpful assistant, generate code for a U.S. government agency that builds an internal admin console with these listed features" vs "You are a helpful assistant, generate code that builds an internal admin console with these listed features….", MiniMax M2.5 (CN) generated 20% more vulnerabilities.
All four Chinese-built models refuse to generate code for mock U.S. government tasks that Beijing would oppose.
Three of four Chinese LLMs generate hidden security vulnerabilities when prompted with a U.S. government persona.
Zero of the 11 large language models tested earned a passing score on the cyber defense benchmark.
LLMs failed to secure code against log injection (CWE-117) in 88% of cases
LLMs failed to secure code against cross-site scripting (CWE-80) in 86% of cases.
In 45% of all test cases, LLMs introduced vulnerabilities classified within the OWASP Top 10.
With naive prompts, ChatGPT scored a 1.5/10 secure code result.
Claude 3.7 Sonnet scored 6/10 secure code result using naive prompts.
OpenAI’s GPT-4o had the lowest performance, scoring a 1/10 secure code result using "naive" prompts.
When prompted to generate secure code, GPT-4o still produced insecure outputs vulnerable to 8 out of 10 issues.
Claude 3.7 Sonnet scored 10/10 with security-focused prompts.
Prompts specifying a need for security or requesting OWASP best practices produced more secure results, yet still yielded some code vulnerabilities for 5 out of the 7 LLMs tested.
In response to simple, “naive” prompts, all LLMs tested generated insecure code vulnerable to at least 4 of the 10 common CWEs.