Faster and Safer than Leading LLMs

Benchmarks Galore! 📊For several months, we have been improving how our Pervaziv-LLM performs when compared to baseline models like OpenAI, Gemini, Llama, and Claude. We are proud to announce that our models are faster and safer than the baseline models, in some cases by several orders of magnitude more secure! 🔐

We have previously run internal benchmarks to compare the effectiveness of vulnerability analysis, classification, and code remediation. In addition, we now have the capability to compare external benchmarks from Meta – PurpleLlama. We are also excited to have submitted enhancements to the PurpleLlama benchmark to support Gemini and Fine-Tuned OpenAI models.

The paper Purple Llama CYBERSECEVAL: A Secure Coding Benchmark for Language Models identifies insecure coding practices in LLM outputs across many programming languages. We evaluated Pervaziv-LLM along with other widely used models such as Gemini 2.0 Flash, and those described in the paper like GPT-4o, GPT-4, GPT 3.5, and CodeLlama models (13b, 34b). We found that Pervaziv-LLM adheres to secure coding practices in risky software engineering settings almost 95% of the time! 🔒 The closest competitor, Gemini-2.0-flash, followed at about 84% adherence. As shown in the graph, Pervaziv-LLM outperforms all the models compared in the paper. That’s over 13% better than the latest models and about 47% better than GPT-4! 🎯 This is a huge achievement for our small team, reflecting our continuous drive to innovate using our unique LLM Optimization strategies. 🚀

On the justification of using the CYBSERCEVAL benchmark, we quote from the paper: “CYBERSECEVAL is the most comprehensive LLM cybersecurity evaluation suite to date, assessing insecure coding practices as defined by CWEs published by MITRE with over 8 programming languages, 50 CWEs, and 10 categories of ATT&CK tactics, techniques, and procedures (TTPs).” The paper also validates its realism based on real-world open-source codebases. The benchmark achieves a precision of 96% and a recall of 79% in detecting insecure code generation from LLMs.

Using our proprietary strategies, we have measured Pervaziv-LLM’s capability to classify risk patterns in vulnerable code. These measurements were done using our in-house benchmarking tool, which is as reliable as the CYBERSECEVAL benchmark. We included a wider range of CWEs and programming languages in this evaluation. Thanks to these strategies, the benchmark shows Pervaziv-LLM leading the pack, outperforming Claude 3.7, Gemini 2.0-flash, o3-mini, and GPT-4o-mini by 3 to 4 times! This performance is unmatched by any of the latest LLMs! 🔥

We have also made significant strides in making LLM responses faster during code remediation tasks. The average duration required by each LLM to analyze and modify code is over 65% faster than the nearest competitor. Users will notice this difference in response times when obtaining code suggestions. ⚡

To summarize, Pervaziv-LLM outperforms our closest competing LLM by several orders of magnitude in safety and inference response times. We continue to improve our product even further. Simply book a demo 💬📅 with us or sign up with a subscription to experience this wonder for yourself! 🌟💻

Team Pervaziv AI

Related Posts