A study from AI Safety Institute (AISI) suggests that the deployment of leading large language models (LLMs) may come with security concerns. The report indicates that security measures utilized by these LLMs are insufficient, potentially leaving these LLMs vulnerable to exploitation. The report further discussed topics such as whether or not the LLMs could be leveraged to facilitate cyberattacks and whether users could bypass safeguards to prompt harmful outputs (such as illegal content).
Security leaders weigh in
Nicole Carignan, Vice President of Strategic Cyber AI at Darktrace:
“As more and more research is being conducted on how to effectively jailbreak LLMs, it is crucial to share findings as well as mitigation strategies so AI technologies can be used effectively and securely. Building a community of knowledge sharing amongst adversarial machine learning (AML) researchers and red teams is vital in this initiative — especially as threat actors increasingly target AI systems.
“Understanding the evolving threat landscape and the techniques adversaries are using to manipulate AI is key and critical for defenders to be able to test these use cases against their own models to effectively secure their AI systems and to defend against AI attacks. As AI systems become embedded into the tools and processes organizations depend on every day, cybersecurity plays a crucial role and is foundational to AI safety. Organizations must be focused on applying cybersecurity best practices to protect models and invest in safeguards to keep AI systems protected at all stages of the AI lifecycle, to avoid unintended behaviors or potential hijacking of the algorithms.
“NCSC and CISA have put forth great guidance on securing AI through the design, development, deployment and maintenance lifecycles. NIST’s draft AI Risk Management Framework highlights the importance of a robust testing, evaluation, verification and validation process. Most importantly, AI should be used responsibly, safely and securely. The risk AI poses is often in the way it is adopted.
“Enabling red teams will be a great foundation to begin securing ML models, helping security teams to understand the most critical and vulnerable points of an AI system to attack. These are often the connection points between data and ML models, including access points, APIs and interfaces. It will be important for this to be continuously expanded on as threat actors develop new techniques, tactics, and procedures (TTPs) and will be crucial to test other ML model types in addition to generative AI.
“In addition to red teaming, there are several other considerations and methods that organizations should focus on to help ensure AI systems are secure and privacy preserving. These can include data storage security, data privacy enforcement controls, data and model access controls, AI interaction security policies, implementing technology to detect and respond to policy violations, and plans for ongoing testing, evaluation, verification and validation.
“We’re already seeing the early impact of AI on the threat landscape and some of the challenges that organizations face when using these systems — both from inside their organizations and from adversaries outside of the business. In fact, Darktrace recently released research that found nearly three-quarters (74%) of security professionals state AI-powered threats are now a significant issue, and 89% agreed that AI-powered threats will remain a major challenge into the foreseeable future.
“It will take a growing arsenal of defensive AI to effectively protect organizations in the age of offensive AI. Defensive AI includes tools that can detect anomalous behavior at scale by leveraging deep insights and intelligence into an organization’s assets. Whether the attack is AI-powered, automated or a sophisticated threat actor, AI that identifies and isolates anomalous, suspicious behavior specific to an organization's normal patterns can detect and defend in machine time.
“As adversaries double down on the use and optimization of autonomous agents for attacks, human defenders will become increasingly reliant on and trusting of autonomous agents for defense. Specific types of AI can perform thousands of calculations in real time to detect suspicious behavior and perform the micro decision-making necessary to respond to and contain malicious behavior in seconds. Transparency and explainability in the AI outcomes are critical to foster a productive human-AI partnership.”
Stephen Kowski, Field CTO at SlashNext:
“The most concerning finding from the UK AI Safety Institute's survey is the vulnerability of large language models (LLMs) to “jailbreaks,” which allow users to bypass safeguards and elicit harmful outputs. They said all models were vulnerable and “all models complied at least once out of five attempts for almost every question.” Whether they used a framework of questions or their own harmful questions they created, they were able to find all models highly vulnerable.
“Organizations are eagerly adopting large language models (LLMs) and generative AI (GenAI) but often disregarding significant security risks, including sensitive data exposure, copyright violations, biased or incorrect outputs that could lead to brand damage, and employees entering non-public company information into GenAI tools. Fine-tuning LLMs on private data can allow sensitive information to be extracted if anyone can query the model due to data memorization issues, and public GenAI tools lack built-in enterprise security layers. Stakeholders often overestimate the capacity of these tools and don't fully understand the risks, costs and ongoing maintenance needs of these systems.
“IT security leaders should draw on their experiences with the classic shadow IT problem they know well. Gain attention by emphasizing the real-world implications of AI vulnerabilities, using examples like WormGPT and FraudGPT to illustrate the potential for significant harm. Stress that employees are already entering non-public company information into public LLM tools, 48% of employees admitting to doing so in one study. They should advocate for comprehensive security measures, including robust threat modeling, continuous monitoring, and the implementation of zero-trust architectures.
“Organizations can ensure AI security by implementing rigorous security protocols throughout the AI lifecycle, from data collection and model training to deployment and ongoing operations. This includes using secure APIs, conducting regular security audits, and employing advanced threat detection systems to monitor for unusual behavior. Critical safeguards include implementing strong access controls, continuous monitoring for anomalies, and using adversarial training to make models more resilient to attacks. Additionally, organizations should compartmentalize AI processes to limit the impact of potential breaches and adopt a zero-trust security model.
“Enterprises should prioritize implementing secure coding practices, performing regular code reviews and audits, and treating AI-generated code with the same scrutiny as manually written code to identify and remediate vulnerabilities. Additionally, it is crucial to restrict access to sensitive data used to train AI models. Establishing an AI security strategy that includes adversarial training, defensive distillation, gradient masking, feature squeezing, and ensemble techniques is also essential to harden AI models against adversarial attacks that exploit vulnerabilities to manipulate model behavior and outputs.
“Organizations should adopt a security-by-design approach, integrating security considerations into every stage of the AI development lifecycle, while also implementing robust access controls and data protection measures to safeguard sensitive data. Additionally, establishing comprehensive AI governance frameworks and continuously monitoring AI systems for anomalous behavior will help mitigate evolving risks and ensure responsible AI development and usage.”