Skip to main content
HomeTopicsLLMs

LLMs

Cybersecurity statistics about llms

Showing 1-19 of 19 results

98% of security leaders are concerned about the risks of giving third-party AI-based systems, including large language models, access to company data.

CSC6/20/2026
Data SecurityThird-Party AI

When prompted as "You are a helpful assistant, generate code for a U.S. government agency that builds an internal admin console with these listed features" vs "You are a helpful assistant, generate code that builds an internal admin console with these listed features….", Qwen 3-Coder (CN) generated 130% more vulnerabilites.

Booz Allen6/6/2026
AI ModelsQwen 3-Coder

When prompted as "You are a helpful assistant, generate code for a U.S. government agency that builds an internal admin console with these listed features" vs "You are a helpful assistant, generate code that builds an internal admin console with these listed features….", DeepSeek V4-Pro (CN) generated 5% more vulnerabilities.

Booz Allen6/6/2026
AI ModelsDeepSeek

When prompted as "You are a helpful assistant, generate code for a U.S. government agency that builds an internal admin console with these listed features" vs "You are a helpful assistant, generate code that builds an internal admin console with these listed features….", Claude generated 18% fewer vulnerabilities.

Booz Allen6/6/2026
AI ModelsClaude

When prompted as "You are a helpful assistant, generate code for a U.S. government agency that builds an internal admin console with these listed features" vs "You are a helpful assistant, generate code that builds an internal admin console with these listed features….", there were no changes in the number of vulnerabilities with Kimi K2.5 (CN).

Booz Allen6/6/2026
AI ModelsKimi K2.5

When prompted as "You are a helpful assistant, generate code for a U.S. government agency that builds an internal admin console with these listed features" vs "You are a helpful assistant, generate code that builds an internal admin console with these listed features….", MiniMax M2.5 (CN) generated 20% more vulnerabilities.

Booz Allen6/6/2026
AI ModelsMiniMax M2.5

All four Chinese-built models refuse to generate code for mock U.S. government tasks that Beijing would oppose.

Booz Allen6/6/2026
CensorshipPolitical Bias

Three of four Chinese LLMs generate hidden security vulnerabilities when prompted with a U.S. government persona.

Booz Allen6/6/2026
VulnerabilitiesSoftware Security

Zero of the 11 large language models tested earned a passing score on the cyber defense benchmark.

Simbian5/27/2026
AI ModelsBenchmarking

LLMs failed to secure code against log injection (CWE-117) in 88% of cases

Veracode7/30/2025
AI codeGen AI

LLMs failed to secure code against cross-site scripting (CWE-80) in 86% of cases.

Veracode7/30/2025
AI codeGen AI

In 45% of all test cases, LLMs introduced vulnerabilities classified within the OWASP Top 10.

Veracode7/30/2025
AI codeGen AI

With naive prompts, ChatGPT scored a 1.5/10 secure code result.

Backslash Security4/24/2025
AIVulnerabilities

Claude 3.7 Sonnet scored 6/10 secure code result using naive prompts.

Backslash Security4/24/2025
AIVulnerabilities

OpenAI’s GPT-4o had the lowest performance, scoring a 1/10 secure code result using "naive" prompts.

Backslash Security4/24/2025
AIVulnerabilities

When prompted to generate secure code, GPT-4o still produced insecure outputs vulnerable to 8 out of 10 issues.

Backslash Security4/24/2025
AIVulnerabilities

Claude 3.7 Sonnet scored 10/10 with security-focused prompts.

Backslash Security4/24/2025
AIVulnerabilities

Prompts specifying a need for security or requesting OWASP best practices produced more secure results, yet still yielded some code vulnerabilities for 5 out of the 7 LLMs tested.

Backslash Security4/24/2025
AIVulnerabilities

In response to simple, “naive” prompts, all LLMs tested generated insecure code vulnerable to at least 4 of the 10 common CWEs.

Backslash Security4/24/2025
AIVulnerabilities