Adversarial Risk Group
GlossaryAdversarial Simulation11 min read

What is adversarial simulation?

Adversarial simulation is the continuous, adaptive testing of defenses that models real attacker behavior across digital, physical, and human surfaces.

Key takeaways

  • Adversarial simulation is continuous, not point-in-time. It runs between annual engagements, not just during them.
  • It models real adversary behavior, including reconnaissance, pretexting, voice cloning, and physical access, not just network exploitation.
  • Each test learns from prior outcomes. The simulation does not repeat what already worked or what already failed.
  • It produces operational data: who fell for what, when, via which vector, and how that changed after remediation.
  • The output is not just a report. It is a dataset that informs incident response readiness, control investment, and (over time) insurance underwriting.

How does adversarial simulation differ from penetration testing?

A traditional penetration test is a scoped, time-boxed engagement. A vendor arrives, runs reconnaissance and exploitation against a defined target list, writes a report, and leaves. The next test usually happens a year later. The findings reflect the environment as it was during the test window. Anything that changed in the eleven months after is unmeasured.

Adversarial simulation inverts that model. It runs continuously. The "engagement" is the program, not the week the testers were on site. Between physical audits, an automated layer keeps probing: new social engineering lures, fresh OSINT pulls against named executives, vishing attempts against the help desk, and adapted attack chains against newly exposed assets. Findings accumulate as a stream, not a deliverable.

Three other structural differences matter:

  1. Scope follows the adversary, not the asset inventory. A pen test scopes to a list of IPs and applications. Adversarial simulation scopes to objectives an attacker would actually pursue (wire fraud against AP, ransomware staging on the engineering network, vendor impersonation to subvert change control). The targets are the controls those objectives have to defeat.
  2. The human surface is in scope by default. Penetration tests often exclude social engineering or treat it as an optional add-on. Adversarial simulation treats people and process as primary attack surface, on equal footing with infrastructure.
  3. The tests adapt. A pen test runs the playbook the tester arrived with. An adversarial simulation rotates technique, evolves lure content, and escalates pressure based on what the last attempt revealed. This is closer to how a real intrusion actually develops.

A penetration test is not wrong; it is necessary as a baseline. But on its own, it gives the organization a snapshot from a year ago and the impression of a current posture. Adversarial simulation closes the gap between assessments.

For a closer look at the related engagement formats, see What is a red team engagement?, What is breach and attack simulation (BAS)?, and What is continuous penetration testing?.

The continuous adversary loop: how adaptive simulation works

The loop has five stages. They run constantly, in parallel, across the engagement.

1. Reconnaissance. Automated OSINT against the organization and named individuals. Sources include public records, SEC filings, LinkedIn, podcast appearances, recorded webinars, conference speaker pages, employee personal social accounts, and breach corpora. The output is a continuously refreshed profile of who knows what, who works with whom, and where each person is reachable. See What is OSINT (open-source intelligence)? for the underlying mechanics.

2. Hypothesis. From the reconnaissance, the program generates attack hypotheses. Example: "A vendor invoice from a foreign supplier, sent on the Friday before a holiday, with a payment-instruction change embedded in the second paragraph, would likely route through AP without secondary verification." Each hypothesis is a testable proposition about a specific control failure.

3. Execution. The hypothesis is tested. For email, that means an LLM-personalized lure crafted to the target's role, tenure, communication style, and visible current projects. For voice, it means a vishing call timed to a moment of operational pressure. For physical (during on-site visits), it means a pretext designed for the specific gate and shift. The test is logged in detail: when it landed, what the recipient did, where it broke down, and how detection or escalation behaved.

4. Analysis. Outcomes are scored not just as pass/fail but along several axes: time to detection, escalation accuracy, lateral movement after compromise, and recovery time. Findings are mapped to the MITRE ATT&CK framework so they can be compared across engagements and over time.

5. Adaptation. The next round of tests is generated from what worked and what did not. If a particular department fell for a vendor invoice, the next test exercises a different pretext. If a security awareness campaign just rolled out, the next test probes whether the campaign actually changed behavior or just generated click data. The loop never reuses the exact same test against the same person.

This adaptive structure is what makes adaptive simulation materially different from scripted simulation platforms. A scripted platform sends the same library of templates on a schedule. After two months, the workforce learns the templates and the metric degrades to noise. Adaptive simulation keeps the signal alive by changing what it asks of the workforce in response to what the workforce has already shown.

Why mid-market manufacturers need adversarial simulation

Mid-market manufacturers (50 to 500 employees) sit in an exposed position. They are large enough that a breach causes material operational and financial damage, often six- to seven-figure losses from production downtime alone. They are not large enough to staff a security operations center or run an in-house red team. And they sit inside supply chains that make them an attractive access path into larger primes in defense, automotive, aerospace, and energy.

Three pressures converge:

  • Continuous attacker capability. Voice cloning, LLM-personalized phishing, and automated reconnaissance have collapsed the cost of running a realistic, targeted attack to roughly free. The volume and quality of targeted attempts a mid-market manufacturer sees in 2026 is materially higher than what was visible in 2023. See What is voice cloning fraud? and What is AI-personalized spear phishing?.
  • Static defenses. Most mid-market manufacturers run a security program designed against the 2018-to-2020 threat model: endpoint protection, MFA on email, an annual phishing-awareness campaign, and a once-a-year penetration test. None of these adapt to what the attacker is doing this quarter.
  • Compliance pressure that is not the same as security. Frameworks like NIST CSF 2.0, CMMC 2.0, and NIST SP 800-171 demand evidence of controls, not evidence that the controls work under adversary conditions. An organization can be fully CMMC-compliant and still lose a million dollars to a deepfake CEO call. See What is a deepfake CEO scam?.

Adversarial simulation addresses all three. It runs at the cadence of the attacker (continuous), exercises the full attack surface (digital, physical, human), and produces evidence not just that controls exist but that they work, under contact, against techniques the workforce has not seen before.

What an adversarial simulation engagement looks like end-to-end

The engagement runs on a two-year cycle, alternating physical and digital years.

Year 1, on-site engagement. A senior operator travels to the facility. The first week is observation: how shifts hand over, where badges accumulate at end of day, which doors get propped open, which gate the delivery drivers actually use, which executives speak at industry events. The second week is testing: tailgating attempts, badge cloning where feasible, pretext entries during shift change, vishing calls timed to known operational pressure (month-end close, audit week, a known vendor visit). Findings are documented with photos, logs, and chain-of-custody timestamps. See What is a physical security audit? and What is badge cloning? for the components ARG exercises during the on-site week.

The on-site week ends with two deliverables: a findings report mapped to NIST CSF and MITRE ATT&CK, and the standing-up of the continuous simulation layer.

Year 1, continuous layer. Automated OSINT begins running on the organization and a named set of employees (executives, finance, IT, plant managers, vendors with elevated access). The first batch of adaptive lures launches at week three, after a quiet observation period to baseline normal communication patterns. The cadence is roughly one test per named individual per month, varied by role: finance and AP see more wire-fraud and vendor-invoice variants, IT sees more help-desk and password-reset pretexts, executives see more deepfake and voice-cloning probes.

The retainer includes monthly findings packets and a quarterly review.

Year 2, digital-only review. No on-site visit. The continuous layer continues. The annual review is conducted electronically: a written report, a video call walkthrough, and (where useful) an in-office meeting among the client's own team to discuss findings without ARG present. This year is priced lower than year one because the marginal cost of the digital layer is low.

Year 3 onward. Even years are on-site re-audits, scoped against the continuous layer's findings. Odd years are digital reviews. Pricing adjusts based on documented improvement. A client whose simulation outcomes show measurable resilience pays less; a client whose findings keep recurring pays more.

This cycle is described on the founding client engagement page and in the strategy overview.

Examples of adversarial simulation findings

Patterns ARG sees repeatedly in engagements with mid-market manufacturers:

  • AP wire fraud via vendor-invoice manipulation. A finance lead receives a PDF invoice from a real existing vendor, with the bank account changed by two digits. No callback verification because the vendor's contact line in the ERP is pulled from email signature. Time to detection: usually not until the vendor follows up about non-payment, weeks later.
  • Help desk password reset for a cloned executive. A vishing call to the help desk during shift change, using a voice cloned from a public earnings call, requesting a password reset for an executive ahead of a flight. Compliance rate before remediation: 60 to 80 percent depending on the strength of the verification workflow.
  • Tailgating during delivery hours. A pretexted "vendor technician" arrives in branded apparel during the morning delivery rush, is waved through by a busy gate, and reaches the engineering network closet within twelve minutes. Common cause: gate staffing is sized for delivery throughput, not security verification.
  • Badge cloning of low-frequency Prox cards. A weekend lunch nearby with a few engineers in a public restaurant is sufficient to clone HID Prox credentials at conversational distance.
  • Persistent OAuth grants to suspicious apps. Older Microsoft 365 grants survived password rotations and policy changes; users do not remember granting them. See What is consent phishing (OAuth phishing)?.

The pattern across these findings is that each one is repeatable, has a measurable detection-time delta, and can be re-tested after remediation to confirm whether the control change actually changed behavior.

Best practices for running an adversarial simulation program

For organizations starting or evaluating an adversarial simulation program:

  1. Define adversary objectives, not asset scope. Start from what an attacker would try to accomplish (wire fraud, ransomware staging, IP exfiltration, supply chain compromise). Reverse-engineer the scope from those objectives. An asset-driven scope misses the controls that matter most.
  2. Treat people and physical access as in-scope by default. A simulation that excludes social engineering and physical attacks is testing a portion of the surface attackers actually use. If the program contract excludes either, the program is not adversarial; it is a network test.
  3. Demand adaptation. Ask the provider how next month's tests change based on this month's outcomes. If the answer is "we run our template library on a schedule", the program is scripted, not adaptive.
  4. Insist on operational data, not just a report. The output should include the date, time, named target, technique, outcome, time to detection, and (where measurable) escalation path of every test. This data has commercial value beyond the engagement: it feeds incident response readiness, control investment, and eventually insurance underwriting. See What is cyber insurance underwriting?.
  5. Score over time, not in absolute. A single engagement produces a snapshot. The value is the trend: are detections getting faster, are escalations going to the right place, is the workforce surfacing pretext attempts before they succeed.
  6. Make findings consumable by non-specialists. The recipient of an adversarial simulation report at a mid-market manufacturer is often an owner or CTO without a deep cybersecurity background. The report should lead with business impact, then mechanism, then recommendation, in that order.
  7. Plan for findings about people. Adversarial simulation will surface that specific people, by name, fell for specific tests. Decide in advance how that data is held, who sees it, and how it informs (rather than punishes) the response.

Adversarial simulation FAQs

Is adversarial simulation the same as red teaming?

No. A red team engagement is a scoped, time-boxed simulation typically run by a single team against a defined objective. Adversarial simulation is a continuous program in which red team-style engagements are one component, run alongside automated reconnaissance, adaptive social engineering, and (in the ARG model) periodic physical audits. Red teaming is an event; adversarial simulation is a state.

How often should adversarial simulation run?

Continuously, with periodic on-site escalation. The automated layer runs daily for reconnaissance and weekly for adaptive lure delivery. On-site physical engagements run annually or every other year in the ARG model, alternating with digital-only review years.

Does adversarial simulation include physical attacks?

In the ARG model, yes, during on-site engagement years. The full simulation surface covers digital (network, application, identity), physical (facility access, badge systems, surveillance defeat), and human (social engineering across email, voice, and in-person). See What is physical penetration testing? and What is a physical security audit? for the physical components in detail.

What does adversarial simulation cost for a 50 to 500 person manufacturer?

Cost depends on headcount in scope, number of facilities, and cadence of physical engagements. In the ARG model, pricing is a flat fee for the initial on-site audit plus a monthly retainer for the continuous layer; founding clients receive a 20 to 30 percent discount on both, locked for two to three years. Engagement scoping is discussed directly with the founder.

How ARG runs continuous adversarial simulation

ARG's engagement model is built around adversarial simulation as the default state, with physical audits as the periodic anchor.

The on-site portion is delivered by a co-founder. Physical security audits, badge work, pretext entries, and vishing during the on-site week are conducted by David Ashby, drawing on a manufacturing background at Quality Electrical Systems that gives the operator credibility on the plant floor and visibility into the operational realities (shift handover, gate staffing, vendor flow) that drive most physical findings.

The continuous layer is built and operated by James Wall, running on infrastructure ARG owns and controls. Automated OSINT, LLM-personalized lure generation, voice cloning for vishing simulation, and the adaptive scoring loop run between physical engagements without disruption to client operations.

Findings are delivered in two channels: a monthly operational packet (what was tested, who was targeted, what happened, where remediation is needed) and a quarterly review (trend over time, control investment recommendations, evidence packages for insurance renewal and compliance audit).

For organizations selecting a founding client cohort right now, the engagement begins with the on-site audit and continues monthly thereafter. Founding clients receive direct access to the operating team, locked pricing for two to three years, and input into how the program evolves.

Apply as a founding client or see how the engagement works for the full delivery cycle.

Find what gets through.

ARG runs continuous AI-driven adversarial simulation and on-site physical audits for mid-market manufacturers. Two founding-client spots remain.

Author: James WallUpdated 2026-05-18Adversarial Risk Group