Rising cyber threats and workforce gaps: autonomous agents will become mission critical  but to trust autonomy, we must test it: funded by Dstl (the UK Government’s Defence Science and Technology Laboratory), QinetiQ’s new T&E process, provides the evidence.

Rising cyber threats and workforce gaps

29/04/2025

Rising cyber threats and workforce gaps: autonomous agents will become mission critical - but to trust autonomy, we must test it: funded by Dstl (the UK Government’s Defence Science and Technology Laboratory), QinetiQ’s new T&E process, provides the evidence.

As military systems become more complex and interconnected, timely information-sharing is critical to mission success. At the same time, cyber threats are growing in sophistication. Combined with an industry shortage of human cyber operators, these challenges point to the need for autonomous systems in cyber defence. However, deploying AI in cyber defence requires more than innovation, it demands assurance.

To adopt AI-based autonomous agents for cyber defence, a robust test and evaluation (T&E) process is essential. Such a process must ensure these agents work as expected, meet user requirements and are robust, ethical, safe and secure.
As part of its four-year Autonomous Resilient Cyber Defence (ARCD) programme, QinetiQ has developed Dstl’s blueprint for T&E and demonstrated its application to interactive data-driven cyber-defence agents trained by a third party.

The T&E process

The T&E blueprint for autonomous cyber defence of military platforms consists of six key phases. The process is:

Iterative: built around cycles of evaluation and refinement
Evidence-based: focused on reducing uncertainty
Risk-focused: based on identifying, estimating, reporting and updating key risks

At its core, the process builds evidence of whether an autonomous agent is safe and fit for purpose.

Diagram Copyright QinetiQ ©

The Demonstration Agent project

The Demonstration Agent Project was delivered in collaboration with:

Applied Data Science Partners (ADSP), BMT and Frazer-Nash Consultancy (who developed and trained the agents).
QinetiQ (which acted as the independent agent evaluators).

The agents were trained and evaluated in the ARCD simulation environment, PrimAITE , and the best-performing agent was then evaluated in a more realistic environment, PalisAIDE.

Both environments were based on a realistic military communication network, including:

Two types of cyber-realistic adversarial agent
A range of network users
Four Mission Objectives focused on military and network-specific objectives for each type of network user.

Evaluation approach

The evaluation used Mission Success Criteria (MSC) - a measure of how well the defensive agent enabled the mission objectives to be met. This made results meaningful for military and network stakeholders.

Key comparisons included:

System performance with no attacks or defence (baseline)
Impact of cyber-attacks by adversarial agents
The defensive agent’s ability to recover the network after an attack
Any negative effects the defensive agent had on mission success in the absence of a cyber-attack
How well the agent generalised across different scenarios

Using multiple MSC allowed evaluators to identify the relative strengths and weaknesses of each agent and target improvements accordingly.

Interpreting the results

In addition to interpreting the MSC, the evaluation:

developed interpretability tools to understand agent behaviour
investigated how agent performance changed between the simulation and more realistic environments

This revealed sim-to-real gaps in how environments were configured and showed how more varied training could improve agent robustness in future applications.

Shape the future of autonomous cyber defence

Successfully demonstrating a Test & Evaluation process for autonomous cyber defence agents marks a significant milestone. However, to transition from demonstration to deployment, further work is essential. Key areas requiring future exploration and investment include:

Developing acceptance frameworks to ensure autonomous agents meet stringent defence standards
Enhancing human-agent teaming, focusing on seamless interaction between autonomous systems and human cyber operators
Advancing tools and integration platforms to facilitate the deployment of these agents within operational military environments

This progression necessitates collaborative efforts. We invite interested parties to engage in shaping the future of autonomous cyber defence.

Get in touch

For collaboration opportunities and further discussion, contact: ARCD-Track2@qinetiq.com

Blogs

Rising cyber threats and workforce gaps

Rising cyber threats and workforce gaps: autonomous agents will become mission critical - but to trust autonomy, we must test it: funded by Dstl (the UK Government’s Defence Science and Technology Laboratory), QinetiQ’s new T&E process, provides the evidence.