
AI
US Launches Safety Testing Program for Advanced AI Models
US Launches Safety Testing Program for Advanced AI Models
The Trump administration is shifting its stance on artificial intelligence oversight, announcing a new collaborative testing and evaluation program for commercial AI systems from major tech firms.
What's Happening
The Center for AI Safety Initiatives (CASI) will conduct comprehensive safety evaluations of AI models from Google (via DeepMind), Microsoft, and xAI. The program represents a departure from Trump's previous deregulation-focused approach and marks acknowledgment of growing national security concerns around AI development.
The testing will cover "testing, collaborative research and best practice development related to commercial AI systems," according to the announcement. CASI has already completed 40 previous evaluations, including assessments of "state-of-the-art models that remain unreleased."
The Significance
This move comes amid several converging pressures:
Military Adoption: Google's Gemini is already being deployed in US defense and military agencies, creating urgency around safety standards for deployed systems.
Unreleased Models: Companies like Anthropic have claimed to develop AI systems (like their "Mythos" model) too powerful for public release, raising questions about who decides what's safe.
Industry Tension: Anthropic is currently locked in a lawsuit with the Department of Defense over its refusal to remove safety guardrails for government use—a direct conflict over who controls AI safety decisions.
What the Companies Say
Microsoft acknowledged the need for collaborative testing, stating that "testing for national security and large-scale public safety risks necessarily must be a collaborative endeavour with governments." Google's DeepMind declined to comment, while xAI and SpaceX representatives did not respond to requests.
This represents a pragmatic middle ground: government involvement without heavy-handed regulation, collaborative rather than adversarial, focused on evaluation rather than prohibition. Whether it's sufficient for the scale of AI deployment underway remains an open question.
Comments
Loading comments...