Game Theory for Human–Machine Teaming

Journal Edition

DOI

10.61451/220106

Abstract

Human–machine teaming (HMT) is central to the modernisation plans of militaries worldwide. A key challenge in adopting machines for HMT in Army is ensuring that they are simultaneously capable of cooperative and collaborative interactions with teammates and commanders, and noncooperative and competitive interactions with adversaries. Most existing machine technologies and artificial intelligence (AI) systems are grounded in single-agent decision-making paradigms that are unsuitable for negotiating such mixed interactions. Game theory, of the ‘serious’ or mathematical kind, is well poised to offer a conceptual, mathematical and algorithmic basis for addressing this challenge, as it is explicitly concerned with decision-making among interacting agents. This article provides an introduction to game theory and its use to conceptualise a framework for HMT in Army. Such game-theoretic frameworks offer a principled foundation for the development of technologies, training and doctrine necessary to realise effective HMT in Army.

Introduction

Machines are increasingly filling important roles on the modern battlefield.^[1] Uncrewed aerial vehicles (UAVs), for example, have been employed in various roles in conflicts over the last three decades, as in the Balkans, Afghanistan, Iraq, Syria and Ukraine. Their roles have expanded from intelligence, surveillance and reconnaissance, to delivering ordnance with autonomous object recognition, tracking and homing.^[2] Likewise, uncrewed ground vehicles (UGVs), whose lineage can be traced back to the early years of World War I, have come of age in the ongoing 2022 Russia–Ukraine war. Russia and Ukraine now both deploy UGVs as frontline and supporting assets, in roles ranging from casualty evacuation and resupply, to mining and demining.^[3]

Despite the growing importance of machines such as UAVs and UGVs to militaries, their significant commercial potential has led to much of their advancement being driven by industry and academia.^[4] Many machines for generating asymmetric military advantage are therefore likely to first appear in industry and academia before being adopted for military use.^[5] This adoption pathway raises challenges associated with ensuring that new machines (and their associated technologies) meet the needs and requirements of military users. For example, machines in the private sector may be initially conceived and developed as standalone individual systems lacking robust interfaces with humans and other machines, friendly or adversarial. Their effective military use will, however, involve their deployment as members—potentially with control functions—of teams of humans and machines, subject to command by humans, and in competition and conflict with adversaries.^[6]

A key challenge in adopting machines for human–machine teaming (HMT) in Army lies in ensuring that they are capable of cooperative and collaborative interactions with teammates and commanders, and noncooperative and competitive interactions with adversaries.^[7] This challenge spans technology procurement and development, training, and doctrine development, since it will involve ensuring that machines (and their teams) are capable of understanding and performing real-time immediate decision-making in a manner aligned with Australian Defence Force (ADF) doctrine.^[8] Within existing doctrine, such a capability entails ensuring that machines are able to form real-time estimates of situational aspects that enable them to formulate, assess and select courses of action for themselves (and their teammates when exercising control) that achieve command intent in the presence of adversaries.^[9]

Most existing artificial intelligence (AI) systems are unsuitable for equipping machines with the requisite immediate decision-making capabilities for HMT, since they are rooted in paradigms of decision-making for single agents operating in isolation without teammates, adversaries or commanders.^[10] In single-agent paradigms, effects that are not direct consequences of a machine’s own actions are (implicitly) treated as artefacts of nature during decision-making, introducing significant vulnerabilities.^[11] For example, a machine that attributes changes in an adversary’s position to chance will fail to recognise and counter changes in their course of action. Even recent celebrated AI systems that do solve multi-agent decision-making problems, like Chess, Go and Poker, remain grounded in simplified paradigms by assuming purely competitive interactions with adversaries and no (or highly structured) collaboration and communication with teammates.^[12]

This article introduces game theory, of the ‘serious’ or mathematical kind, to conceptualise a framework for HMT in Army. The conceptualised framework:

explicitly accounts for interactions with teammates, adversaries and commanders
is aligned with ADF doctrine on immediate decision-making
has key steps aligned with recent fundamental technological developments, enabling early examination of the tractability of game-theoretic HMT.

Such a framework, once fully realised, will therefore offer a conceptual, mathematical, algorithmic and computational foundation for designing, analysing and simulating team dynamics, and developing the requisite machine technologies and human training for effective HMT in Army. While game-theoretic frameworks for HMT have begun to appear in the literature, none currently explicitly consider mappings to ADF doctrine or identify technologies for their implementation.^[13]

The key argument of this article is that effective HMT in Army will require the development of new machine technologies and training, and potentially doctrine, that move beyond the single‑agent decision‑making paradigms that currently dominate AI. Game theory is well poised to offer a conceptual (as well as mathematical or algorithmic) basis for such developments due to its nature as the study of decision-making among interacting agents. In particular, it is currently possible to conceptualise a game-theoretic framework of HMT aligned with ADF doctrine on immediate decision-making that serves to highlight areas where significant future research, development and training are required.

This article proceeds by outlining motivations for the development of HMT for modern militaries and examining what is required for machines to transition from supervised tools to autonomous teammates. It then provides a brief introduction to game theory as a principled approach to studying decision-making under mixed cooperation and competition between agents (human or machine). Building on this foundation, the article conceptualises a game‑theoretic framework for HMT structured around ADF doctrine on immediate decision-making, detailing observation and orientation via estimation, and decision and action via the construction and solution of games. Finally, conclusions are provided outlining future research, development, assurance and training efforts that are required to fully realise game‑theoretic frameworks and implementations of HMT in practice.

Why Human–Machine Teaming?

HMT has emerged as a leading means of employing machines for military use because it seeks to leverage the strengths of both humans and machines.^[14] The broad aim of HMT is to exploit Moravec’s paradox, which is the empirical observation that many tasks that are difficult for humans are easy for machines, and vice versa.^[15] The potential benefits of successfully exploiting Moravec’s paradox to generate military advantage have recently been illustrated through the use of first-person view (FPV) drones in the 2022 Russia–Ukraine war.^[16] At their core, FPV drones exploit current advantages in human intelligence, reasoning and guidance, and machine strengths of expendability, manoeuvrability, speed and increasingly autonomous target homing.

HMT is now part of plans to adopt robotic and autonomous systems for defence in the United Kingdom, the United States and Canada.^[17] All three branches of the ADF—Army, Navy and Air Force—are seeking to embrace HMT.^[18] For Army, HMT offers a means of generating mass and scalable effects in range, lethality, force protection and decision advantage, without equal growth of the human workforce.^[19] Beyond the battlefield, HMT will generate advantage for the ADF through contributions to logistics, including warehousing, transportation, engineering and maintenance.^[20]

Militaries that fail to implement effective HMT are predicted to be at a considerable disadvantage in future conflicts, including leaving them vulnerable to incurring significant (human) casualties.^[21] Recent conflicts provide evidence in support of this prediction. For example, ISIS’s deployment of simple remote-controlled quadrotor drones equipped with grenades in the 2016–2017 Battle of Mosul is estimated to have increased the rate of attrition of Iraqi government forces by up to 23 per cent.^[22] Similarly, the introduction of loitering munitions by Russian forces in the 2022 Russia–Ukraine war corresponded to a significant spike in Ukrainian casualties.^[23] In both cases, the increased casualty rates associated with the introduction of new technologies and tactics persisted until the introduction of counter tactics and technologies.

The degree and duration of a technological advantage depends on the ability of adversaries to learn from, improve upon and implement innovations.^[24] Effective HMT is therefore predicted to generate significant and prolonged asymmetric advantage because of the technical complexity of the requisite machines and the necessary training of human soldiers and commanders.^[25] The advantage generated by creating and understanding the technologies and doctrine required for effective HMT is likely to extend to countering HMT. Indeed, several historical examples exist in which adoption of a new sophisticated technology also provided the basis for (and dominated) initial approaches to countering it. For example, the interleaved developments in tank and anti-tank warfare over the half century beginning in World War I culminated in the United States doctrine of the 1960s essentially stating that ‘the best anti-tank weapon is a tank’.^[26] A more recent example is the emergence of interceptor drones as counters to large-scale offensive drones in the ongoing 2022 Russia–Ukraine war due to their cost, scalability and effectiveness being commensurate with the threat.^[27]

What Is Required for Effective Human–Machine Teaming?

For Army, HMT involves transforming machines from being closely supervised ‘tools’ to being autonomous ‘teammates’ capable of coordinating and collaborating to achieve command intent.^[28] It differs from past approaches to integrating humans and machines by sharing tasks that were previously only performed by humans with machines.^[29] For example, machines such as artillery have a long history of use as ‘tools’ in Army. Humans have, however, always controlled them by collecting and processing information (i.e. forming estimates) and deciding courses of action. Shifting machines to ‘teammates’ will entail enabling them to collect and process information to form their own estimates, to decide courses of action and to act independently or as part of a team to achieve command intent.^[30] It will also entail empowering certain machines to exercise control, subject to tasking by command (which remains a fundamentally human function).^[31]

Control, being concerned with ‘coordinating forces towards outcomes determined by Command’, requires objective, empirical and timely situational understanding and an ability to formulate courses of action for teammates.^[32] Thus, machines empowered to exercise control must be capable of forming real-time estimates of situational aspects relevant to both themselves and their team, and formulating courses of action for both themselves and their team.^[33] These estimates correspond to (running) staff estimates within existing ADF doctrine and include tangible and intangible situational aspects such as the physical configurations (e.g. locations) of friendly forces and adversaries, and their intent, goals, capabilities, strengths, training, limitations, vulnerabilities, morale, leadership and situational awareness (e.g. higher-order or nested estimates of estimates maintained by other friendly and adversary teammates and commanders).^[34] In using estimates to formulate, assess and select courses of action, machines must be explicitly accounting for the vulnerabilities of—and vulnerabilities to—potential adversaries in order to be consistent with ADF doctrine.^[35]

Since not all machines will exercise control, not all will strictly require estimates of all situational aspects. Similarly, not all machines will strictly need to be capable of formulating, assessing and selecting courses of action for their teammates. Instead, many machines may only need estimates of a subset of situational aspects and may only need to determine their own course of action. However, determining which machines need which estimates, and which machines are best orientated to determine which courses of action, poses a formidable challenge. At a minimum, it is likely that all machines would need to be capable of forming estimates of situational aspects that enable them to determine their own courses of action. In determining their own courses of action, they would also ideally align with, or exploit, the intent, goals, capabilities, strengths, training, limitations, vulnerabilities, morale, leadership and situational awareness of teammates, commanders and adversaries.^[36] For example, while loitering munitions may strictly only require estimates of the location and appearance of adversaries to engage targets, their effectiveness is clearly improved if they have sufficient situational awareness and decision-making capabilities to prioritise agents that are most vulnerable to them.

Developing the situational-awareness and decision-making capabilities required by machines to realise effective HMT represents a considerable challenge with existing technologies.^[37] The development of simpler situational-awareness capabilities, even without cooperative or competitive decision-making, has proved difficult. Indeed, several aircraft accidents have been directly attributed to autopilots forming erroneous estimates of the physical state (e.g. orientation) of aircraft.^[38] Similarly, several high-profile accidents involving cars equipped with advanced perception and self-driving systems have resulted from perception systems missing important environmental cues and forming flawed estimates, leading to a loss in situational awareness and to the car’s control systems selecting inappropriate courses of action.^[39]

There is therefore recognition in the open literature that new technologies are required for HMT.^[40] However, recent development efforts remain narrowly focused on certain technologies for specific capabilities. For example, much recent effort has been dedicated to developing technologies for realising shared mental models (i.e. synchronising estimates of situational aspects) so that machines can accurately predict and anticipate the actions of teammates to pre-emptively aid them.^[41] Similarly, considerable effort has been directed towards technologies for aligning the goals of teammates in order to avoid conflicting goals and associated reductions in team performance.^[42] Technologies that enable machines to use their estimates to formulate, assess and decide on appropriate courses of action have received comparatively little attention. Most existing AI technologies are unsuitable due to being grounded in single-agent decision-making and they are unable to enable machines to decide on courses of action in an explainable and transparent manner.^[43] They also do not enable machines to formulate courses of action to solicit information or resolve uncertainty, such as by moving to vantage points or communicating with teammates or commanders.^[44]

Game theory offers a promising foundation for developing machine technologies (and training for human teammates and commanders) required for HMT—from realising shared mental models and goal alignment through to formulating courses of action in an explainable and transparent manner.^[45] The promise of game theory lies in its providing frameworks (in the form of conceptual, mathematical or algorithmic models) that account for the individual goals, estimates and decision-making processes of all agents in a tactical or operational environment.^[46] These frameworks provide a basis for developing game-theoretic technologies that will enable machines to formulate, assess and decide on their own courses of action while, where necessary, considering the impact of other agents, both friendly and adversarial.^[47] The role and importance of game theory in comparison to single-agent decision-making paradigms is examined in the following section and facilitates subsequent discussion of game-theoretic frameworks of HMT.

A Brief Primer on Game Theory

Game theory is the study of decision-making between multiple interacting agents, each with potentially different goals or objectives, capabilities, limitations, intentions and situational awareness.^[48] It differs from the better-known concept of optimisation by explicitly considering the impact of multiple (other) agents and their decisions on outcomes when determining the ‘best’ decision or action for an agent.^[49] Optimisation, in contrast, involves selecting an agent’s ‘best’ decision or action as evaluated against their own goals or objectives without considering the impact of other agents and their decisions. Where an agent seeks to determine a sequence or course of action where the order of actions is important, optimisation branches into optimal control (e.g. single-agent reinforcement learning) and game theory branches into dynamic (or differential) game theory (e.g. multi-agent reinforcement learning). Decision-making paradigms are summarised in Table 1.

Table 1: Summary of decision-making paradigms
	Single agent	Multiple agents
Single decision or action	Optimisation	Static game theory
Sequence of decisions or course of action	Optimal control (e.g. reinforcement learning)	Dynamic or differential game theory (e.g. multi-agent reinforcement learning)

Optimisation and optimal control treat any effects that are not direct consequences of the agent’s own actions as artefacts of nature.^[50] Such considerations are appropriate when solving problems with outcomes that are not coupled to the decisions or actions of other agents, such as ‘What is the shortest path to reach an objective?’. However, they are inappropriate when the decisions and actions of others impact what is ‘best’, such as determining ‘What is the safest path to reach an objective?’ when an adversary can render certain paths unsafe. Using optimisation or optimal control in situations with multiple agents constitutes committing the Robinson Crusoe fallacy, where the deliberate decisions, actions and effects of other agents are erroneously overlooked and attributed to the randomness of nature.^[51]

Despite its name, game theory is not solely concerned with trivialities, entertainment or wargaming—it may be used to create strategies for such situations, but it is more broadly the mathematical (or algorithmic) study of competition, cooperation and/or conflict between multiple interacting agents.^[52] A game in the context of game theory is a situation in which multiple agents interact with the aim of achieving their own individual or team objectives (e.g. maximising rewards or minimising costs).^[53] These individual or team objectives may conflict with those of other agents in a game, and in an extreme case may simply correspond to trying to ensure that other agents receive a worst-case outcome. In contrast to optimisation, the concept of ‘best’ or ‘worst’ in game theory is ambiguous, since cooperation and conflict between agents and their objectives introduces (implicit or explicit) trade-offs as to which individual or combined objectives can be optimised when selecting decisions and actions.^[54]

A classic example illustrating the differences between game theory and optimisation is the prisoner’s dilemma game.^[55] In the prisoner’s dilemma game, two prisoners are each given the choice to either remain silent or testify. If both prisoners choose to remain silent, they are subject to a light sentence (e.g. one year of prison). If one prisoner chooses to remain silent but the other chooses to testify, then the silent prisoner is subject to a heavy sentence (e.g. three years of prison) while the testifying prisoner is released immediately without any sentence. If both prisoners testify, they are both subject to a moderate sentence (e.g. two years of prison). The prisoners must make their choice simultaneously without cooperating (i.e. without communicating or knowing the choice of the other prisoner and without caring about the other’s sentence).

If the prisoners cooperate, their best option is to remain silent and receive only a light sentence, corresponding to a game-theoretic solution termed a Pareto optimal solution. More generally, Pareto optima are solutions to games when they reduce to (single-agent) optimisation problems due to agents cooperating to optimise the same objective. However, a dilemma arises when the prisoners cannot enforce or be assured of cooperation. If one prisoner remains silent, the other prisoner has an incentive to testify, which means that the silent prisoner no longer has an incentive to remain silent. This ultimately leads to both prisoners concluding that testifying is preferable to remaining silent. Said another way, a prisoner who elects to testify never has an incentive to unilaterally change their mind to remain silent, while a prisoner who elects to remain silent always has an incentive to unilaterally change their mind to testify; thus, both prisoners choose to testify. This outcome is termed the Nash equilibrium, which is an equilibrium due to both prisoners having no incentive to unilaterally deviate from testifying.

Nash equilibria are considered the ‘best’ or ‘optimal’ solutions to games where agents act concurrently and do not explicitly cooperate or coordinate. As the prisoner’s dilemma illustrates, Nash equilibria need not correspond to Pareto optimal solutions. Agents often incur an additional cost (or inefficiency) under a Nash equilibrium as compared to a Pareto optimal solution. This additional cost is termed the price of anarchy. The prisoner’s dilemma therefore highlights the importance of aligning (or crafting) the goals and objectives of agents such that the resulting price of anarchy is small (or bounded) and teammates do not receive bad outcomes even when they cannot explicitly coordinate or collaborate. Such considerations are especially relevant for implementing effective HMT in disrupted, disconnected, intermittent and low-bandwidth environments. They are also important when determining how to integrate standalone machines and technologies procured from industry and academia to avoid creating teaming interactions that result in large prices of anarchy.

The prisoner’s dilemma is a static game in the sense that the agents (i.e. prisoners) only interact once. Where multiple interacting agents must select a sequence of actions and the order of actions is important (i.e. where the agents select courses of action), game theory branches into dynamic game theory. A dynamic game is therefore a situation in which multiple agents interact repeatedly (in time) with the aim of maximising their individual or team objectives and rewards, subject to constraints and limitations imposed by the (time) evolution of the states of the agents and the state of their environment.^[56] It is in this dynamic form that game theory has perhaps had the most impact on machines for military affairs.^[57]

Dynamic (or differential) game theory was originally developed by Rufus Isaacs in the 1950s to determine optimal guidance manoeuvres for missiles against aircraft that employ optimal evasive manoeuvres.^[58] It was subsequently used to develop the US Navy’s doctrine on how a surface ship should manoeuvre to keep a submarine under surveillance for the maximum period of time while the submarine seeks to escape surveillance in the minimum period of time.^[59] In these dynamic games, the agents (i.e. vehicles) select their courses of action with perfect knowledge of the (immutable) capabilities and limitations of all platforms, including their maximum speeds and turn rates and the physical aspects of their situation (e.g. their time-varying dynamic physical states, including their position and heading). The agents do not cooperate and have conflicting objectives, rendering the resulting ‘best’ courses of action (i.e. manoeuvres or trajectories) Nash equilibria. As in static games, a unilateral deviation from a course of action that is a Nash equilibrium in a dynamic game by any agent leads to their receiving a worse outcome.^[60] For example, in the case of a submarine attempting to escape surveillance by a surface ship, if the submarine deviates from a Nash equilibrium by performing a different manoeuvre (or following a different trajectory), the surface ship will be able to keep it under surveillance for a longer period of time. Conversely, if the pursuing surface ship deviates from a Nash equilibrium, the evading submarine will be able to escape more quickly.

There are now myriad pursuit-evasion and surveillance-evasion dynamic games in the literature, with various types and numbers of agents.^[61] For example, by modifying the platform dynamics of the agents in the surveillance-evasion dynamic game between a surface ship and submarine, a dynamic game more closely addressing surveillance of a ground vehicle by a quadrotor drone has recently been posed and solved.^[62] Such dynamic games have proved particularly suited to developing (optimal and robust) machine technologies for single-mission autonomy. However, their broader use in the development of technologies for HMT has remained limited, especially in contrast to the widespread use of (single-agent) optimisation, optimal control and reinforcement learning.

Perhaps surprisingly, game theory has played only a minor role in the development of celebrated AI systems like OpenAI Five, AlphaStar and MuZero for playing board games and computer games such as Chess, Go, Dota 2 and StarCraft II.^[63] Game theory has played a more significant role in the development of subsequent AI systems like Pluribus for poker, CICERO for Diplomacy, and those developed under a variety of Defense Advanced Research Projects Agency (DARPA) challenges and academic-industry challenges.^[64] However, these ‘games’ are simplistic in that they evolve periodically with fixed rules, have a known number of players with little difference in possible actions and facilitate easy communication between teammates. The development of AI systems in such ‘games’ thus introduces the potential to commit the ludic fallacy in applying them to real-world tasks such as HMT, where simplifications may be so drastic that any derived insights are likely to be flawed or misleading.^[65]

Furthermore, the interactions between agents in the ‘games’ solved by celebrated AI systems are purely competitive or cooperative. Situations in which agents must negotiate a mix of competing and cooperating agents (as in HMT) pose significant unsolved challenges for the existing state-of-the-art algorithmic approaches used to train these celebrated AI systems.^[66] For example, relatively simple, though technical, counterexamples exist showing that the popular multi-agent reinforcement learning technique known as self-play fails to find optimal (Nash equilibrium) solutions in games that involve conflict and cooperation.^[67] These failures lead to deficiencies in AI systems that can be exploited by adversaries.^[68]

Despite the limitations of existing AI systems, there exist a range of more sophisticated game-theoretic techniques and insights from across applied mathematics, engineering and economics. Indeed, game theory fundamentally generalises beyond purely cooperative or purely competitive interactions to mixed commander–subordinate and cooperative–adversarial interactions.^[69] This generality has opened significant opportunities to exploit game theory in the development of effective HMT.

Frameworks for HMT grounded in game theory have begun to appear in the open literature.^[70] These frameworks take the form of conceptual (or mathematical, algorithmic or computational) models in which all humans and machines in a team are considered (i.e. modelled or abstracted) as agents with their own individual characteristics, including perception sensors or systems, estimates, goals, intent, capabilities, experiences and (cognitive or computational) decision-making and action processes. They are game-theoretic in the sense that agents are considered to form their estimates and make decisions based on what enables them to best achieve their individual goals, given what they know or observe about other agents. Implicit within these frameworks is that agents may elect to synchronise their data, estimates, goals, intent, decision-making and action if it is in their interest (or the interest of their team).

Game-theoretic frameworks for HMT serve two key purposes. Firstly, they provide a conceptual (and, with further development, mathematical, algorithmic and computational) foundation for designing and analysing team dynamics and behaviours, enabling ‘What if?’ questions of HMT to be posed and addressed via analysis, simulation and wargaming. Secondly, they provide a foundation for developing the requisite machine technologies for HMT by providing algorithmic and computational models with which machines can formulate, assess and select courses of action by predicting and anticipating the behaviours of teammates, adversaries and commanders based on estimates of their situational and individual aspects. Existing game-theoretic frameworks for HMT have been developed without joint consideration of adversaries and hierarchical (human) command, (imperfect) situational awareness, alignment with doctrine, and technological realisation. Nevertheless, consideration of these aspects, which are of clear importance to Army, appears feasible given the generality of the underlying game theory.^[71]

Conceptualising a Game-Theoretic Framework for Human–Machine Teaming in Army

A game-theoretic framework for HMT with adversaries that seek to deceive and compete can be conceptualised within the observe–orient–decide–act (OODA) loop formulation of (human) decision-making that underlies the immediate decision-making process (IDMP) of existing ADF doctrine.^[72] To conceptualise this framework, consider a set of agents—human or machine—that may be grouped into teams and may be friendly or adversarial. Each agent is considered to (repeatedly) observe (some of) the other agents and the environment, orient itself with respect to other agents and the environment, and then decide and implement courses of action. For machines, this process could arise exactly through implementation in hardware and software through situational-awareness and decision-making technologies, forming a candidate blueprint for the development of machine technologies for HMT. For humans, this process may serve only as an abstraction or model, though it is plausibly aligned with doctrine and training through the IDMP.^[73]

The steps of the OODA loop for each agent involve game-theoretic considerations and are conceptualised as follows.

Game-Theoretic Observation and Orientation

In observing and orientating itself, each agent is considered to repeatedly perform the following three key steps:

Scope and frame their situation, including identifying relevant situational aspects.
Form estimates of the identified relevant situational aspects.
Form higher-order estimates of other agent estimates (and their estimates of estimates ad infinitum).

In scoping and framing their situation, each agent is considered to first identify other observed and (potentially) unobserved agents (teammates, friendly forces, adversaries and/or commanders). It is then subsequently considered to assign a state to all agents (including itself) that quantifies both their tangible and intangible properties and aspects such as their goals, intent, desired end states, strengths, capabilities, limitations, training, morale and leadership. The state of the -th agent from the perspective of the -th agent at time is thus a mathematical object denoted by , with being the state it assigns to itself. Similarly, each agent assigns a state to its environment (e.g. the location of inanimate objects), with the state of the environment from the perspective of the -th agent being . Agents may employ dynamic models to describe the time-evolution of states (e.g. computational and physics-based models of machines or cognitive behavioural and decision-making models of humans).^[74]

Each agent is subsequently considered to maintain an estimate of the states of all agents and the environment, along with an estimate of how they may evolve in time, using observational data from their perception and/or communication systems. These estimates may be deterministic (i.e. sets of possible states) or probabilistic (i.e. probabilities assigned to different possible states). The estimate that the -th agent maintains about the states of all agents and its environment at time is thus either a set or probability distribution denoted by where is the total number of agents (this estimate is also called the agent’s belief). For machines (or using Bayesian models of the brain), estimates of physical and tangible aspects could include those computed by state estimation and sensor fusion algorithms such as Bayesian filters and smoothers (e.g. Kalman or particle filters).^[75] Estimates of more abstract aspects such as agent goals and intent could include those computed by bespoke algorithms such as those from inverse dynamic game theory concerned with computing agent objectives from observations of their decisions and actions.^[76]

To support game-theoretic decision and action, agents must also maintain higher-order estimates—that is, estimates of estimates, and estimates of estimates of estimates, and so on—to maintain an awareness of what other agents know, and what other agents know about what others know, and so on. These higher-order estimates are necessary, to some degree, to create opportunities for, and defend against, deception. For example, an agent may need to act more cautiously if it knows that an adversary knows that it is ignorant of hazards or obstacles in its environment, or that it has only partial knowledge of an adversary’s capabilities. Likewise, an agent may benefit from its teammates and commanders knowing that they do not accurately know the state of their environment. Recent inverse filtering algorithms have shown the tractability of computing estimates of estimates from observations of either the estimates themselves or the actions of agents that arise from such estimates.^[77] These inverse filters are surprisingly similar to existing (forward) filters and Bayesian (state) estimation algorithms, suggesting a natural means of extending them to computing higher-order estimates.

There exists a large and growing literature on technologies and approaches that enable autonomous agents to model other agents. This literature provides insight into the practical implementation of the three key steps of game-theoretic observation and orientation, including how to select important states and how to compute estimates of them.^[78] With the exception of higher-order estimates, the three key steps of game-theoretic observation and orientation are therefore likely to be tractable (at least in approximate forms) to implement on machines using modest onboard or edge computing architectures. The core challenge lies in computing higher-order estimates since, in theory, an infinite number of high-order estimates are required (i.e. estimates of estimates ad infinitum). Recent studies have, however, suggested that higher-order estimates may play a diminishing role in decision-making (i.e. past a certain order, higher-order estimates may no longer impact the selection of actions). These studies raise the important practical possibility of only needing to compute a finite number of higher-order estimates without introducing any (significant) vulnerabilities.^[79] As a result, game-theoretic observation and orientation currently appears likely to impose only modest additional requirements on the actual hardware and sophistication of machines.

Game-Theoretic Decision and Action

In deciding on and taking courses of action, each agent is considered to repeatedly perform the following three key steps:

Construct a partial and/or incomplete information game using its estimates of other agents and the environment.
Solve its constructed game to formulate and evaluate potential courses of action.
Act to implement the course of action that best achieves its goals.

In using its estimates to construct a game, each agent is implicitly considered to decide its course of action using partial and/or incomplete information. Games with partial and/or incomplete information are well studied and, in principle, their solution can be approached mathematically or algorithmically by machines (or trained humans). Their solution (as with all games) specifically involves identifying possible equilibria (Nash or otherwise) and then selecting that which is ‘best’ in terms of their own goals and objectives or under a secondary criterion such as Pareto optimality, or extensions of Pareto optimality that are computed only within teams or friendly forces (e.g. against command intent).

Selecting actions by solving games formulated with estimates rather than with perfect knowledge of situational aspects implicitly accounts for the potential of agents to manipulate both their own estimates through communication or conducting reconnaissance, or those of adversaries through bluff and deception. Specifically, courses of action that solve partial and/or incomplete information games formed with potentially imperfect estimates are known to include actions whose purpose may be to only resolve or increase uncertainty, rather than to manipulate the actual underlying situation. This phenomenon is known as the dual-control effect in the language of stochastic optimal control theory, in recognition of the fact that actions can serve the dual purpose of both changing the underlying states of agents and the environment, and manipulating the uncertainty (or estimates) associated with them.^[80]

The dual-control effect means that by selecting courses of action that solve games formulated with estimates, agents will naturally arrive at courses of action that improve their own estimates, and degrade those of adversaries, to the degree necessary to achieve their objectives. For example, an agent’s optimal course of action may include actions such as moving to vantage points, actively looking for landmarks, or communicating with teammates or friendly forces Conversely, an agent’s optimal course of action may include actions whose purpose is to deceive, mislead or remain hidden from an adversary (and vice versa).

By selecting courses of action through the process of solving games, each agent implicitly gives consideration to all (known or perceived) risks and opportunities. This consideration extends to examining how its own courses of action may affect or support those of teammates, exploit vulnerabilities in those of adversaries and guard against potential courses of action employed by adversaries. This game-theoretic process of deciding courses of action resembles the mission analysis and course-of-action development considerations in the IDMP of existing ADF decision-making doctrine.^[81] However, rather than limiting analysis to an adversary’s most likely and most dangerous courses of action, in principle this game-theoretic process entails the computation of all possible courses of action for adversaries, teammates and other friendly forces.

In solving games, it may be necessary (and defensible) to limit consideration to finite sets of actions and courses of action, including those most likely and most dangerous, for practical reasons of computational tractability, trust and assurance. Sets of actions and courses of action already permeate the literature on HMT for individual agents as well as for whole teams, though not in the context of being potential solutions to games.^[82] The usefulness of finite sets in reducing the computational complexity of solving games is clear since they reduce the action-space to a finite dimension. They similarly may be useful for trust and assurance purposes since by reducing the action-space to a finite dimension they render agent behaviour more predictable.^[83] For similar reasons, consideration may also be limited to courses of action that an agent with bounded rationality can conceptualise; for example, consideration of courses of action may be limited to those that are easy to evaluate or understand by agents with a constrained ability to consider higher-order estimates and consequences.

The potential for agents to perform control actions (or a human agent to act as a commander) is also implicitly encoded through the selection of actions through the solution of games based on estimates. Specifically, a situational state that agents can maintain about other (friendly) agents is whether they are a commander or a controller. The role of each agent can then be considered in determining how it is treated in the solution of the game. For example, the estimated goals or intent of a commander or controller can be prioritised in the solution of an agent’s game. If an agent is particularly unsure of a commander’s or controller’s goals or intent, the dual-control effect means that its best course of action may inherently be to take actions to resolve this uncertainty, such as by communicating with or moving to observe their commander or controller. Similarly, if an agent is exercising control (or command), its best course of action may inherently be to directly communicate or broadcast estimates and actions to others.

Summary, Variations and Extensions of the Framework

In summary, it is possible to conceptualise a game-theoretic framework of HMT that is aligned with the OODA-loop formulation of the IDMP in existing ADF doctrine. In this framework, each agent (human or machine) is considered to repeatably perform game-theoretic orientation and observation and game-theoretic decision and action.

Game-theoretic orientation and observation involves agents:

scoping and framing their situation, including identifying relevant tangible and intangible situational aspects already present in the IDMP
forming estimates of the identified relevant situational aspects, potentially through the use of Bayesian filters
forming higher-order estimates of other agent estimates (and their estimates of estimates ad infinitum), potentially through the use of recent inverse filters.

Game-theoretic decision and action involves agents subsequently:

constructing a partial and/or incomplete information game using their individual estimates of other agents and the environment
solving their constructed game for all equilibria of interest (Nash or otherwise)
acting to implement the course of action corresponding to the equilibrium that best achieves their goals (or command intent).

Importantly for Army, this framework offers a basis for the joint consideration of adversaries and hierarchical (human) command; (imperfect) situational awareness; alignment with doctrine; and technological realisation. Nevertheless, there remains the potential to expand and vary the scope of the game-theoretic framework. For example, it could be extended to include non-combatant or non-aligned agents. It could also be varied to use the act–sense–decide–adapt (ASDA) cycle or Cynefin abstractions of (human) decision-making instead of the OODA loop as a basis for ordering or prioritising the steps of game-theoretic observation and orientation, and game-theoretic decision and action.^[84]

More generally, the conceptualised framework highlights that there remains a considerable amount of research and development required to realise effective HMT, particularly if the employed machines are to have high levels of autonomy. For example, new fundamental mathematical tools and algorithms are needed to construct agent estimates and solve (partial and/or incomplete information) games defined in terms of agent estimates. New machine technologies will also be required to realise scalable game-theoretic observation, orientation, decision and action—ranging from scoping and developing suitable sensors and sensor suites, to selecting appropriate computing architectures and algorithm implementations to balance tractability with game-theoretic performance (or vulnerability).

Conclusion

This article has argued that realising effective HMT in Army requires moving beyond the single‑agent decision-making paradigms and technologies that dominate AI. Instead, realising effective HMT will require embracing approaches that explicitly account for both cooperative interactions with teammates and commanders, and noncooperative or competitive interactions with adversaries. Game theory, being the study of interactions between multiple agents, friend or foe, is a strong candidate approach. Indeed, it is possible now to conceptualise game-theoretic HMT frameworks in which machines reason about intent, uncertainty and adversarial behaviour in a manner consistent with ADF doctrine on immediate decision‑making.

While frameworks for HMT grounded in game theory have appeared in the open literature, they only consider purely cooperative interactions between humans and machines, or do not explicitly consider game-theoretic issues associated with imperfect situational awareness such as the need for estimates or higher-order estimates. Furthermore, existing literature does not discuss how key steps in game-theoretic HMT aligned or realised with ADF doctrine. The conceptualised framework therefore provides a starting point for designing and implementing machine technologies and (human) training for HMT. With further (mathematical and technical) development, it will also provide a quantitative means of analysing and simulating HMT concepts. Significant future work is also required to examine the implications of game theory for the issues of trust, ethics, testing, evaluation, verification and validation in HMT.

For Army, game-theoretic HMT will need to be realised through a progressively staged process, given the relative immaturity of the available frameworks and technologies.

In the short term, Army should map and relate game-theoretic concepts and terminology to existing wargaming, experimentation and doctrine (with and without HMT) in order to explicitly evaluate and develop game theory as a decision-making paradigm for warfighting.
In the medium term, game‑theoretic concepts and thinking should be applied to develop frameworks, tactics and doctrine for narrowly scoped but repeatable tasks and missions involving HMT with existing machine technologies and soldier training. This will mirror the adoption pathway of differential game theory in the US Navy.
In the longer term, Army should seek to explicitly avoid the Robinson Crusoe fallacy by ensuring that new machine technologies, training and doctrine for HMT incorporate game‑theoretic considerations from their inception.

Endnotes

^[1] PW Singer, Wired for War: The Robotics Revolution and Conflict in the 21st Century (Penguin, 2009); Mick Ryan, War Transformed: The Future of Twenty-First-Century Great Power Competition and Conflict (Naval Institute Press, 2022); Ash Rossiter and Peter Layton, Warfare in the Robotics Age (Lynne Rienner, 2024).

^[2] Anthony King, ‘Robot Wars: Autonomous Drone Swarms and the Battlefield of the Future’, Journal of Strategic Studies 47, no. 2 (2024): 191–213, at: https://doi.org/10.1080/01402390.2024.2302585.

^[3] Abbey Fenbert, ‘Ukraine Approves New “Murakha” Ground Robot for Combat Use’, The Kyiv Independent, 29 June 2025, at: https://kyivindependent.com/ukraine-approves-new-murakha-ground-robot-for-combat-use; Vikram Mittal, ‘Russia And Ukraine Turn to Ground Robots for Frontline Resupply’, Forbes, 6 June 2025, at: https://www.forbes.com/sites/vikrammittal/2025/06/06/russia-and-ukraine-turn-to-ground-robots-for-frontline-resupply.

^[4] Rossiter and Layton, Warfare in the Robotics Age, p. 18.

^[5] Mick Ryan, Human-Machine Teaming for Future Ground Forces (Center for Strategic and Budgetary Assessments, 2018), p. 10.

^[6] Australian Army, Robotic & Autonomous Systems Strategy v2.0 (Commonwealth of Australia, 2022); Alex Neads, David J Galbreath and Theo Farrell, From Tools to Teammates: Human-Machine Teaming and the Future of Command and Control in the Australian Army (Australian Army Research Centre, 2021); Tate Nurkin and Julia Siegel, Battlefield Applications for Human-Machine Teaming: Demonstrating Value, Experimenting with New Capabilities and Accelerating Adoption (Atlantic Council, Scowcroft Center for Strategy and Security, 2023); Sidharth Kaushal et al., Leveraging Human–Machine Teaming (London: Royal United Services Institute for Defence and Security Studies, 2024).

^[7] Australian Army, Robotic & Autonomous Systems, pp. 13–14.

^[8] In this article, decision-making is considered the rapid process of determining short-term or immediate actions and/or courses of action, performed as an individual or controller, rather than during planning or as a commander.

^[9] Werner Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part I: Fundamental Concepts’, ACM Transactions on Cyber-Physical Systems 8, no. 1 (2024); Klaus Bengler et al., ‘A References Architecture for Human Cyber Physical Systems, Part II: Fundamental Design Principles for Human-CPS Interaction’, ACM Transactions on Cyber-Physical Systems 8, no. 1 (2024); Werner Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part III: Semantic Foundations’, ACM Transactions on Cyber-Physical Systems 8, no. 1 (2024); Australian Defence Force, Decision-Making and Planning Processes (Commonwealth of Australia, 2025), pp. 25–33.

^[10] National Academies of Sciences, Engineering, and Medicine, Human-AI Teaming: State-of-the-Art and Research Needs (The National Academies Press, 2022), at: https://doi.org/10.17226/26355; Robert W Andrews et al., ‘The Role of Shared Mental Models in Human-AI Teams: A Theoretical Review’, Theoretical Issues in Ergonomics Science 24, no. 2 (2023): 155–157.

^[11] Zelai Xu et al., ‘Learning Global Nash Equilibrium in Team Competitive Games with Generalized Fictitious Cross-Play’, Journal of Machine Learning Research 26, no. 44 (2025): 1–30.

^[12] Adam Lerer et al., ‘Improving Policies via Search in Cooperative Partially Observable Games’, Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 5 (2020): 7187–7194, at: https://doi.org/10.1609/aaai.v34i05.6208.

^[13] Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part I’; Bengler et al., ‘A References Architecture for Human Cyber Physical Systems, Part II’; Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part III’; Takuma Adams, Andrew C Cullen and Tansu Alpcan, ‘Quantisation Effects in Adversarial Cyber-Physical Games’, in Jie Fu, Tomáš Kroupa and Yezekael Hayel (eds) Decision and Game Theory for Security, Lecture Notes in Computer Science 14167 (Cham: Springer, 2023), pp. 153–171, at: https://doi.org/10.1007/978-3-031-50670-3_8; Andrew C Cullen, Tansu Alpcan and Alexander C Kalloniatis, ‘Game-Theoretic Analysis of Adversarial Decision Making in a Complex Socio-Physical System’, Dynamic Games and Applications 15, no. 3 (2025): 709–728, at: https://doi.org/10.1007/s13235-024-00593-4; Genshe Chen et al., ‘Game Theoretic Approach to Threat Prediction and Situation Awareness’, Journal of Advances in Information Fusion 2, no. 1 (2007): 35–48.

^[14] Ryan, Human-Machine Teaming for Future Ground Forces; Nurkin and Siegel, Battlefield Applications for Human-Machine Teaming; Jean-Marc Rickli, Federico Mantellassi and Quentin Ladetto, What, Why and When? A Review of the Key Issues in the Development and Deployment of Military Human-Machine Teams (Geneva Centre for Security Policy, 2024); Kaushal et al., Leveraging Human–Machine Teaming.

^[15] Peter Layton, Algorithmic Warfare: Applying Artificial Intelligence to Warfighting (Canberra: RAAF Air Power Development Centre, 2018), p. 59.

^[16] Kateryna Stepanenko, The Battlefield AI Revolution Is Not Here Yet: The Status of Current Russian and Ukrainian AI Drone Efforts (Institute for the Study of War, 2025), pp. 1–3, at: https://understandingwar.org/backgrounder/battlefield-ai-revolution-not-here-yet-status-current-russian-and-ukrainian-ai-drone.

^[17] British Army Approach to Robotics and Autonomous Systems (UK Ministry of Defence, 2022), at: https://www.army.mod.uk/media/15790/20220126_army-approach-to-ras_final.pdf; Chris Gordon, ‘USAF Leaders See “Human-Machine Teams”—Not Robots—as Future of Airpower’, Air & Space Forces Magazine, 15 December 2024, at: https://www.airandspaceforces.com/usaf-leaders-see-human-machine-teams-not-robots-as-future-of-airpower; Madison Cameron et al., Human-Machine Teaming Research Roadmap for Future Autonomous System Integration, DND-1144.1.20-01 (Defence Research and Development Canada, 2024), at: https://cradpdf.drdc-rddc.gc.ca/PDFS/unc481/p818476_A1b.pdf.

^[18] Australian Army, Robotic & Autonomous Systems, pp. 13–14; Royal Australian Navy, RAS-AI Strategy 2040: Warfare Innovation Navy (Sea Power Centre Australia, 2020), at: https://www.navy.gov.au/sites/default/files/2024-02/RASAI-Strategy-2040.pdf; Royal Australian Air Force, HACSTRAT—A Strategic Approach for Air and Space Capability (Commonwealth of Australia, 2021), at: https://www.airforce.gov.au/sites/default/files/2022-09/hacstrat_full_version_2021%5B1%5D.pdf.

^[19] Ryan, Human-Machine Teaming for Future Ground Forces, pp. 11–15; Australian Army, Robotic & Autonomous Systems, pp. 13–14.

^[20] Robin Smith, ‘Future Land Warfare Collection 2021: Joint Logistics Through Robotic and Autonomous Systems—Opportunities and Risks’, Land Power Forum, 29 July 2021.

^[21] Kaushal et al., Leveraging Human–Machine Teaming, p. 7.

^[22] Vikram Mittal, ‘Estimating Attrition Coefficients for the Lanchester Equations from Small-Unit Combat Models’, The Journal of Defense Modeling and Simulation: Applications, Methodology, Technology, ahead of print, 9 November 2023, at: https://doi.org/10.1177/15485129231210301.

^[23] Vikram Mittal and James E Fenn, ‘Using Combat Simulations to Determine Tactical Responses to New Technologies on the Battlefield’, The Journal of Defense Modeling and Simulation: Applications, Methodology, Technology, ahead of print, 22 March 2024, at: https://doi.org/10.1177/15485129241239364.

^[24] Rossiter and Layton, Warfare in the Robotics Age, pp. 16–17.

^[25] Mittal, ‘Estimating Attrition Coefficients’, p. 9; Mittal and Fenn, ‘Using Combat Simulations to Determine Tactical Responses’, p. 3; Rossiter and Layton, Warfare in the Robotics Age, pp. 16–17.

^[26] James E Shelton, ‘Let’s Bridge the Gap’, Armor, December 1964.

^[27] Vikram Mittal, ‘Ukraine and Russia Race to Deploy Advanced Interceptor Drones’, Forbes, 9 September 2025, at: https://www.forbes.com/sites/vikrammittal/2025/09/09/ukraine-and-russia-race-to-deploy-advanced-interceptor-drones.

^[28] Matthew Johnson and Alonso Vera, ‘No AI Is an Island: The Case for Teaming Intelligence’, AI Magazine 40, no. 1 (2019): 16–28; Joseph B Lyons et al., ‘Human–Autonomy Teaming: Definitions, Debates, and Directions’, Frontiers in Psychology 12 (2021): 589585; Neads, Galbreath and Farrell, From Tools to Teammates, pp. 1–3.

^[29] Australian Defence Force, ADF Concept for Command and Control of the Future Force (Commonwealth of Australia, 2019); Neads, Galbreath and Farrell, From Tools to Teammates, pp. 2–3.

^[30] ADF Concept for Command and Control, pp. 17–18; Neads, Galbreath and Farrell, From Tools to Teammates, pp. 19–35; Australian Defence Force, Command (Commonwealth of Australia, 2024).

^[31] ADF Concept for Command and Control, pp. 17–18; ADF, Command, p. 59.

^[32] ADF Concept for Command and Control, p. 18.

^[33] ADF Concept for Command and Control, pp. 23–27; ADF, Command, p. 59.

^[34] ADF, Decision-Making and Planning Processes, pp. 87–90.

^[35] Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part I’; Bengler et al., ‘A References Architecture for Human Cyber Physical Systems, Part II; Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part III; ADF, Decision-Making and Planning Processes, pp. 25–33.

^[36] Ibid.

^[37] National Academies of Sciences, Engineering, and Medicine, Human-AI Teaming, pp. 31–32; Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part I’, pp. 2–5.

^[38] Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part I’, pp. 4–5, Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part III’, p. 4.

^[39] Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part I’, p. 4.

^[40] James C Walliser et al., ‘Team Structure and Team Building Improve Human–Machine Teaming with Autonomous Agents’, Journal of Cognitive Engineering and Decision Making 13, no. 4 (2019): 258–278, at: https://doi.org/10.1177/1555343419867563; Lyons et al., ‘Human–Autonomy Teaming’, p. 11; Mengyao Li and John D Lee, ‘Modeling Goal Alignment in Human-AI Teaming: A Dynamic Game Theory Approach’, Proceedings of the Human Factors and Ergonomics Society Annual Meeting 66, no. 1 (2022): 1538–1542; Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part I’, p. 28.

^[41] Andrews et al., ‘The Role of Shared Mental Models in Human-AI Teams’, p. 130.

^[42] Li and Lee, ‘Modeling Goal Alignment in Human-AI Teaming’, p. 1538.

^[43] National Academies of Sciences, Engineering, and Medicine, Human-AI Teaming, pp. 33–42.

^[44] Lerer et al., ‘Improving Policies via Search in Cooperative Partially Observable Games’, pp. 7187–7188; National Academies of Sciences, Engineering, and Medicine, Human-AI Teaming, p. 18.

^[45] Li and Lee, ‘Modeling Goal Alignment in Human-AI Teaming’, p. 1539; Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part III’, pp. 3–22.

^[46] Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part III’, pp. 3–22.

^[47] Ibid.

^[48] Tamer Basar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, 2nd edition (Academic Press, 1999).

^[49] Ibid., pp. 1–2; George Tsebelis, ‘The Abuse of Probability in Political Analysis: The Robinson Crusoe Fallacy’, American Political Science Review 83, no. 1 (1989): 77–91.

^[50] Tsebelis, ‘The Abuse of Probability in Political Analysis’, p. 77.

^[51] So called because Robinson Crusoe mistakenly believed he was living alone in nature on an island. See Tsebelis, ‘The Abuse of Probability in Political Analysis’, p. 77.

^[52] Bundeswehr Doctrine Centre, Wargaming Handbook (Hamburg, 2024), p. 14.

^[53] Basar and Olsder, Dynamic Noncooperative Game Theory, pp. 1–2.

^[54] Ibid., p. 11.

^[55] Ibid., p. 83.

^[56] Ibid, pp. 1–2.

^[57] Rufus Isaacs, Differential Games: Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimisation (Dover Publications, 1965).

^[58] Rufus Isaacs, Differential Games I: Introduction (Santa Monica CA: RAND Corporation, 1954), at: https://www.rand.org/pubs/research_memoranda/RM1391.html.

^[59] James G Taylor, Application of Differential Games to Problems of Naval Warfare: Surveillance-Evasion: Part I (Naval Postgraduate School, 1970), p. 6.

^[60] Basar and Olsder, Dynamic Noncooperative Game Theory, p. 11.

^[61] Ibid., pp. 423–469; Joseph Lewin, Differential Games: Theory and Methods for Solving Game Problems with Singular Surfaces (Springer Science & Business Media, 2012); Isaac E Weintraub, Meir Pachter and Eloy Garcia, ‘An Introduction to Pursuit-Evasion Differential Games’, 2020 American Control Conference (ACC) (IEEE, 2020), pp. 1049–1066; Sarjana Oradiambalam Sachidanandam et al., ‘Decision-Making Aid for Human-Machine Teaming in Multiplayer Pursuit-Evasion Games’, AIAA SciTech 2025 Forum (American Institute of Aeronautics and Astronautics, 2025), at: https://doi.org/10.2514/6.2025-2272.

^[62] Philipp Braun, Timothy L Molloy and Iman Shames, ‘Prying Pedestrian Surveillance-Evasion: Minimum-Time Evasion from an Agile Pursuer’, Journal of Guidance, Control, and Dynamics 48, no. 9 (2025).

^[63] Oriol Vinyals et al., ‘Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning’, Nature 575, no. 7782 (2019): 350–354; Christopher Berner et al., ‘Dota 2 with Large Scale Deep Reinforcement Learning’, arXiv 1912.06680 (2019); Julian Schrittwieser et al., ‘Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model’, Nature 588, no. 7839 (2020): 604–609.

^[64] Noam Brown and Tuomas Sandholm, ‘Superhuman AI for Multiplayer Poker’, Science 365, no. 6456 (2019): 885–890; Meta Fundamental AI Research Diplomacy Team (FAIR) et al., ‘Human-Level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning’, Science 378, no. 6624 (2022): 1067–1074; Justin Drake et al., ‘Human-Powered AI Gym: Lessons Learned as the Test and Evaluation Team for the DARPA SHADE Program: Human-Powered AI Gym’, in Practice and Experience in Advanced Research Computing 2024: Human Powered Computing (Association for Computing Machinery, 2024), pp. 1–5; Nolan Bard et al., ‘The Hanabi Challenge: A New Frontier for AI Research’, Artificial Intelligence 280 (2020): 5; Michael Novitzky et al., ‘Aquaticus: Publicly Available Datasets from a Marine Human-Robot Teaming Testbed’, 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI) (IEEE, 2019), at: https://doi.org/10.1109/hri.2019.8673176; Philipp Braun et al., ‘Capture the Flag Games: Observations from the 2022 Aquaticus Competition’, IFAC-PapersOnLine 56, no. 2 (2023): 11363–11368, at: https://doi.org/10.1016/j.ifacol.2023.10.420.

^[65] Nassim Nicholas Taleb, The Black Swan: The Impact of the Highly Improbable (Random House and Penguin, 2007), p. 309.

^[66] Lerer et al., ‘Improving Policies via Search in Cooperative Partially Observable Games’, pp. 7187–7188; Xu et al., ‘Learning Global Nash Equilibrium in Team Competitive Games’, pp. 1–4.

^[67] Xu et al., ‘Learning Global Nash Equilibrium in Team Competitive Games’, pp. 8–9.

^[68] Ibid., pp. 1–4.

^[69] Basar and Olsder, Dynamic Noncooperative Game Theory, pp. 11.

^[70] Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part I’; Bengler et al., ‘A References Architecture for Human Cyber Physical Systems, Part II’; Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part III’; Thom Hawkins and Daniel Cassenti, ‘A Utilitarian Approach to the Structure of Human-AI Teams’, 28th International Command & Control Research and Technology Symposium (December 2023); Oradiambalam Sachidanandam et al., ‘Decision-Making Aid for Human-Machine Teaming in Multiplayer Pursuit-Evasion Games’; Steven Dennis, Fred Petry and Donald Sofge, ‘Game Theory Approaches for Autonomy’, Frontiers in Physics 10 (2022), at: https://doi.org/10.3389/fphy.2022.880706; Y Li et al., ‘Differential Game Theory for Versatile Physical Human–Robot Interaction’, Nature Machine Intelligence 1, no. 1 (2019): 36–43, at: https://doi.org/10.1038/s42256-018-0010-3.

^[71] Basar and Olsder, Dynamic Noncooperative Game Theory, pp. 1–11.

^[72] ADF, Decision-Making and Planning Processes, pp. 25–33.

^[73] Ibid., pp. 25–33.

^[74] Damm et al., ‘A Reference Architecture of Human Cyber-Physical Systems—Part III’, pp. 2–3; Li and Lee, ‘Modeling Goal Alignment in Human-AI Teaming’, pp. 1538–1539.

^[75] S Särkkä and L Svensson, Bayesian Filtering and Smoothing (Cambridge University Press, 2023).

^[76] Timothy L Molloy et al., Inverse Optimal Control and Inverse Noncooperative Dynamic Game Theory: A Minimum-Principle Approach (Springer Nature, 2022).

^[77] V Krishnamurthy and M Rangaswamy, ‘How to Calibrate Your Adversary’s Capabilities? Inverse Filtering for Counter-Autonomous Systems’, IEEE Transactions on Signal Processing 67, no. 24 (2019): 6511–6525, at: https://doi.org/10.1109/TSP.2019.2956676; Himali Singh, Arpan Chattopadhyay and Kumar Vijay Mishra, ‘Inverse Extended Kalman Filter—Part II: Highly Nonlinear and Uncertain Systems’, IEEE Transactions on Signal Processing 71 (2023): 2952–2967, at: https://doi.org/10.1109/tsp.2023.3304756; Himali Singh, Arpan Chattopadhyay and Kumar Vijay Mishra, ‘Inverse Particle Filter’, IEEE Transactions on Signal Processing 73 (2025): 1922–1938, at: https://doi.org/10.1109/tsp.2025.3556702; Himali Singh, Kumar Vijay Mishra and Arpan Chattopadhyay, ‘Inverse Cubature and Quadrature Kalman Filters’, IEEE Transactions on Aerospace and Electronic Systems 60, no. 4 (2024): 5431–5444, at: https://doi.org/10.1109/taes.2024.3394453; Himali Singh, Kumar Vijay Mishra and Arpan Chattopadhyay, ‘Inverse Unscented Kalman Filter’, IEEE Transactions on Signal Processing 72 (2024): 2692–2709, at: https://doi.org/10.1109/tsp.2024.3396626; Himali Singh, Arpan Chattopadhyay and Kumar Vijay Mishra, ‘Inverse Extended Kalman Filter—Part I: Fundamentals’, IEEE Transactions on Signal Processing 71 (2023): 2936–2951, at: https://doi.org/10.1109/tsp.2023.3304761.

^[78] Stefano V Albrecht and Peter Stone, ‘Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems’, Artificial Intelligence 258 (2018): 66–95, at: https://doi.org/10.1016/j.artint.2018.01.002.

^[79] Yuxiang Guan, Iman Shames and Tyler H Summers, ‘Best Response Convergence for Zero-Sum Stochastic Dynamic Games with Partial and Asymmetric Information’, version 1, preprint, arXiv, 2025, at: https://doi.org/10.48550/ARXIV.2501.06181.

^[80] Y Bar-Shalom and E Tse, ‘Dual Effect, Certainty Equivalence, and Separation in Stochastic Control’, IEEE Transactions on Automatic Control 19, no. 5 (1974): 494–500, at: https://doi.org/10.1109/tac.1974.1100635.

^[81] ADF, Decision-Making and Planning Processes, pp. 29–31.

^[82] Hussein A Abbass and Robert A Hunjet, ‘Smart Shepherding: Towards Transparent Artificial Intelligence Enabled Human-Swarm Teams’, in Hussein A Abbass and Robert A Hunjet (eds), Shepherding UxVs for Human-Swarm Teaming: An Artificial Intelligence Approach to Unmanned X Vehicles (Cham: Springer, 2020), pp. 12–14.

^[83] Kate J Yaxley et al., ‘Life Learning of Smart Autonomous Systems for Meaningful Human‐Autonomy Teaming’, in Holly AH Handley and Andreas Tolk (eds), A Framework of Human Systems Engineering: Applications and Case Studies (Hoboken NJ: Wiley, 2020), pp. 45–47.

^[84] Justin Kelly and Mike Brennan, ‘OODA Versus ASDA: Metaphors at War’, Australian Army Journal 6, no. 3 (2009): 39–51; David J Snowden and Mary E Boone, ‘A Leader’s Framework for Decision Making’, Harvard Business Review 85, no. 11 (2007): 71.

Timothy Molloy