What is a phishing simulation?
Phishing simulation is sending realistic phishing-style messages to employees to measure detection and find gaps in technical and procedural controls.
Key takeaways
- Phishing simulation is adversarial testing, not HR training. The point is to measure what real attackers would accomplish, not to administer compliance content.
- Scripted, schedule-driven phishing platforms produce useful data for about two months. After that, the workforce learns the platform, not phishing.
- The metrics that matter are detection rate, escalation accuracy, time to surface, and trend per role; not aggregate click rate.
- Effective simulation rotates pretext, channel, lure family, and timing per round, against named individuals, with outcomes feeding the next round.
- The output should be operational evidence: what landed, on whom, when, via which vector, and what changed across consecutive rounds.
How does a phishing simulation actually work?
A phishing simulation, run as adversarial testing, has six elements.
1. Scope and authorization. Executive sponsor signs off on the engagement: in-scope targets, out-of-scope individuals (executives sometimes, depending on engagement design), allowed pretext families, prohibited content (anything that would cause psychological harm: fake medical results, terminations, family emergencies), and the response path if a simulation produces unintended consequences.
2. Reconnaissance. Public information collected on the organization and the in-scope individuals: roles, projects, vendors, communication patterns, visible events. The reconnaissance feeds pretext construction. See What is OSINT (open-source intelligence)?.
3. Pretext and lure generation. Pretexts written for each target, calibrated to current context. Lure infrastructure stood up: sender domains, mailbox warmed, landing pages built, link tracking in place. Where AI-personalized lures are in use, language models generate per-target content from the reconnaissance profile. See What is AI-personalized spear phishing?.
4. Delivery. Messages sent at times calibrated to each target's normal activity. Infrastructure rotates between rounds to avoid being filtered as a known sender. Where the engagement is multi-channel, vishing, smishing, and OAuth-grant variants run alongside email.
5. Measurement. Outcomes logged per test: delivered, opened, clicked, credential submitted, OAuth grant approved, action taken, surfaced to IT, surfaced to manager, blocked by email security, missed by email security. Each outcome is tied to a specific control (or absence of one).
6. Reporting and adaptation. The next round is generated from the prior round's outcomes. Recently exposed pretexts are down-weighted per target. New pretext families are introduced. Difficulty rises for cohorts that surfaced the prior round and falls for cohorts that did not.
The simulation is not a discrete event; it is a continuous program. The reporting cadence is monthly operational and quarterly strategic.
Why scripted phishing simulations stop working after the second month
Platform-driven phishing simulations rely on a template library delivered on a schedule. The first month or two produces useful data because the templates are new to the workforce. After that, three failure modes set in:
- Template learning. The workforce recognizes the platform's templates and senders. Click rates collapse. The platform reports "improved performance"; the workforce has learned the platform, not phishing.
- Selection bias. Users who fall for templates are auto-enrolled in remedial training. The training is generic. The metric measures who is auto-enrollable, not who can detect a real targeted attack.
- Detection signal pollution. Email security tools and SOC analysts learn the platform's infrastructure and start auto-quarantining or filtering its messages. Tests fail to deliver; the program "succeeds" because it is filtered, not because the workforce improved.
The gap between scripted simulation performance and real-attack performance widens every month the scripted platform runs. A workforce reporting a 2 percent click rate against the platform routinely falls for AI-personalized spear phishing and voice-cloning vishing on the first attempt.
Adaptive simulation does not have these failure modes because the simulation does not run on a template library or a fixed schedule. See What is adaptive simulation?.
What metrics matter in a phishing simulation (and which are vanity)
The right metrics measure detection and response improvement. The wrong metrics measure platform performance.
Metrics that matter:
- Detection rate by department, role, and tenure. Who is surfacing pretext attempts, who is not, and how is it trending. Comparable across rounds.
- Time to surface. From delivery to first report to IT or manager. Should fall over time as workforce reflexes sharpen.
- Escalation accuracy. When a pretext is surfaced, does it route to the right place quickly and trigger the right downstream action.
- Workflow holds. When a pretext drives a high-impact action (wire, vendor change, MFA reset), do workflow controls stop it independently of the user's judgment.
- Repeat exposure. Of the targets who fell for a pretext last round, how many surfaced the next round. The cohort-level trend is the real measure of improvement.
Metrics that are vanity (or worse):
- Aggregate click rate. Compresses everything that matters into a number that moves for the wrong reasons.
- Open rate. Reflects email client behavior (image preloading, link prefetching), not workforce decisions.
- "Risk score per user". Implies precision the data does not support and creates a punitive frame that damages trust.
A simulation program reporting only click rate is reporting on itself, not on the security posture.
Examples of effective phishing simulation campaigns
What good looks like in mid-market manufacturer engagements:
- Vendor-invoice pretext rotation across AP. Three rounds of progressively more contextually accurate vendor-invoice variants. By round four, the AP team is surfacing attempts within hours and the workflow change (callback verification to ERP-sourced vendor numbers) holds against the next pretext family. See What is business email compromise (BEC)?.
- OAuth consent grant pretext against engineering. A lure asking engineering managers to grant access to a "document review tool". Outcome surfaces that 11 of 18 managers would have granted high-scope access; remediation is OAuth policy tightening plus an ATT&CK detection on rogue OAuth grants. See What is consent phishing (OAuth phishing)?.
- HR-portal credential harvest. A pretext that the payroll provider has migrated and employees need to "re-enroll". Phishing-resistant MFA blocks the credential abuse, but the workforce surface-rate matters because the same pretext would arrive over SMS to personal devices outside MFA scope.
- Cross-channel pretext: email then voice. An email pretext lands; the same target receives a vishing call two days later referencing the email. Multi-channel pretexts surface lower detection rates than single-channel; measuring this gap is the point of the exercise.
- Targeted high-risk-role program. AP, payroll, IT help desk, and executive assistants run on a higher-cadence, higher-difficulty track than the general workforce. Detection metrics tracked per role; remediation prioritized to the workflow each role owns.
The pattern: each campaign is designed to surface a specific workflow gap or detection gap, with remediation that produces measurable improvement on the next round.
How to scope a phishing simulation engagement
For an organization scoping a program:
- Decide on the test population. Whole organization, or a tiered approach with high-risk roles on a higher cadence. For mid-market manufacturers, tiered is usually better.
- List pretext families in scope. Vendor invoice, IT help desk, executive impersonation, HR portal, OAuth grant, M&A or legal, smishing. Out-of-scope pretexts (anything causing psychological harm) named explicitly.
- Set channel scope. Email, voice, SMS, OAuth, in-person (during on-site engagements). Multi-channel produces more realistic data; single-channel is faster to stand up.
- Define authorization and confidentiality. Who knows the program is running, who does not. Standard pattern: executive sponsor, head of IT, head of HR for legal review, no one else. The IT lead does not know specific timing or pretexts.
- Define the response when a pretext succeeds. No punitive consequence for individuals; remediation focuses on workflow and detection. The commitment is in writing, signed by the executive sponsor.
- Decide reporting cadence. Monthly operational, quarterly strategic. Annual rollup for board and insurance renewal.
- Integrate with insurance renewal. Discuss with the cyber broker which evidence underwriters reward; structure reporting so the renewal package is a byproduct, not extra work. See What is cyber insurance underwriting?.
Best practices for phishing simulation that does not destroy trust
Phishing simulation can corrode workforce trust if run badly. The same data and outcomes are accessible without that cost.
- No fake medical, family, or HR-emergency pretexts. Off-limits regardless of effectiveness. The cost to workforce trust is not worth the marginal data.
- No public shaming. Outcomes per user are confidential; aggregate trends are public. Performance frames are cohort-level (department, role, tenure), not name-level.
- No punitive consequences. Auto-enrolling into "remedial training" is corrosive; the workforce learns to avoid the report-phishing button. Remediation focuses on workflow and detection, not on individuals.
- Transparent program existence. The workforce knows a simulation program exists. They do not know which week or what pretext. The transparency creates an "always on" baseline awareness that helps real detection.
- Surface real attacks at the same time. The report-phishing button feeds the same queue as the simulation; real attacks get the same treatment. The workforce learns the habit, and the habit applies whether the message is simulated or real.
- Communicate outcomes informationally. "Here is the pretext, here is what happened, here is the workflow change that addresses it." Not "you fell for it, here is your training assignment."
- Measure long-term, not per-campaign. Quarter-over-quarter and year-over-year cohort comparisons reflect program effectiveness. Single-campaign results reflect noise.
Phishing simulation FAQs
Is phishing simulation the same as security awareness training?
No. Security awareness training is education delivered to employees (videos, quizzes, posters) intended to change baseline understanding. Phishing simulation is adversarial testing: sending messages designed to land, then measuring what happened. Training is HR. Simulation is security operations.
How often should phishing simulations run?
Continuously, with pretext, channel, lure family, and timing varied per target per round. Quarterly campaigns produce vanity metrics because the workforce learns the campaign cadence. Adaptive simulation that the workforce cannot anticipate is the only model that keeps the signal alive.
Should the IT team know simulations are happening?
Yes for general awareness, no for specific timing. The IT team knows the program is running and has the authorization documentation; they do not know which week the next round goes out or which pretext is being used. Otherwise, allowlisting and tipping defeats the measurement.
What is the difference between phishing simulation and KnowBe4-style platforms?
Platform-driven phishing simulation ships a template library on a campaign schedule, optimized for HR-administered training credit. Adversarial phishing simulation generates targeted, rotating pretexts from current OSINT against named individuals, measures workflow and detection outcomes, and does not auto-enroll users in training. The platforms produce click rates; adversarial simulation produces detection improvement.
How ARG runs phishing simulation as adversarial testing, not HR training
ARG runs phishing simulation as one channel inside continuous adversarial simulation, not as a standalone product or HR module. The program is operated by James Wall on infrastructure ARG owns and controls; lures and pretexts are generated per target from current OSINT, not pulled from a template library.
The cadence is roughly one test per named individual per month, varied by role: finance and AP see vendor-invoice and CEO-impersonation variants more often, IT sees help-desk and password-reset pretexts more often, executives see deepfake and voice-cloning probes more often. Channels rotate per round (email, voice, SMS, OAuth) and across rounds (a finance lead may receive an email pretext one round, a vishing call the next, a multi-channel coordinated pretext the round after that).
Outcomes feed the same monthly operational packet as the rest of the engagement: technical control validation, OSINT trend, identity drift, and physical-engagement findings (during on-site weeks). The quarterly review surfaces what changed across the workforce, which workflow controls held, and which need investment. The annual rollup is structured to drop into renewal underwriting evidence with minimal translation. See What is cyber insurance underwriting?.
For founding clients, phishing simulation is part of the monthly retainer alongside vishing, smishing, OAuth-grant testing, and (during on-site engagement weeks) physical pretext entries. Pricing is locked for two to three years.
Apply as a founding client or see how the engagement works for the full delivery cycle.
Find what gets through.
ARG runs continuous AI-driven adversarial simulation and on-site physical audits for mid-market manufacturers. Two founding-client spots remain.