Explore key topics and content on security risks from powerful AI models while unleashing growth.

Key topics

  • As frontier AI models continue to develop dangerous capabilities, protecting them from theft and misuse is becoming a critical and neglected mission. Important developments include:

    • RAND authored a playbook for Securing AI Model Weights where they explain the need to secure model weights, define necessary security levels (including SL5 - defending against highly-resourced nation states), and map the current state of lab security, which they estimate at Security Level 2 - secured against individual professional hackers.

    • Situational Awareness argues for increased securitization of leading AI companies.

    Anthropic and Google have published detailed security frameworks outlining their model protection strategies and implementation plans.

  • Organizations are actively assessing AI models to understand potential security implications:

    • Google’s Project Zero identified a non-trivial zero day with their LLM assisted vulnerability researcher.

    • OpenAI's evaluation of their O1 model (codenamed "Strawberry") showed abilities to solve cyber challenges in unexpected ways and access the docker host (see section 4.2.1), but they may have evaluated an earlier version of the model, instead of the model they released.

    • Anthropic updated their Responsible Scaling Policy where they lay out the capability thresholds that trigger different levels of required safeguards.

    • METR (Model Evaluation and Threat Research) wrote up their thoughts on how AI agents might develop rogue populations, and have published some examples of their evals on github.

    Here are some takes on how AI models affect cybersecurity today, along with some examples: end-to-end ransomware, content alteration, and automated deepfakes.

  • The hardware side of ensuring AI is secure and beneficial to society:

    The Institute for Progress is publishing a series on how to create secure data centers for AI training and inference.

So what can you do to enhance the security of AI systems?

Blog