Forensic Expense Audit System — Case Study

The Problem

The organisation used Pleo cards across 48 employees — a mix of department heads, operations staff, and senior management. Over the course of a quarter, 796 transactions were processed through the system, totalling a significant sum of corporate spend.

No systematic review had been done. Notes were inconsistent or missing. Some transactions had no business justification. And there were patterns that, to a trained eye, looked wrong — but nobody had looked.

The Risk

Without systematic review, expense fraud, policy breaches, and wasteful spending go undetected. In an AIM-listed company, this is a governance failure with real consequences — both financial and reputational.

The Audit Framework

I designed a 10-dimension forensic audit framework — each dimension targeting a specific category of financial risk:

AI Transaction Classification

GPT-4o classified every transaction by type (hotel, meal, transport, ATM, gift, mystery merchant) and assigned a policy compliance score based on the company's expense policy rules.

Benford's Law Analysis

Applied Benford's Law to the leading digits of all transaction amounts. Statistically significant deviation from the expected distribution flags potential manipulation — a standard forensic accounting technique.

Velocity Pattern Detection

Identified employees with unusual transaction frequency, clustering of spend near policy limits, or repeated transactions to the same merchant in short windows.

Cross-Employee Collusion Analysis

Mapped merchant usage across employees — looking for cases where multiple employees used the same unusual merchant, a known signal of coordinated fraud.

Note Quality Scoring

Scored every transaction note from 0–10 on specificity, business justification, and completeness. Identified cardholders with systemic poor documentation — itself a governance risk.

ATM & Cash Withdrawal Profiling

Cash withdrawals are the highest-risk expense category — untraceable and difficult to justify. All ATM transactions were profiled individually with manual flags for CFO review.

Sample Risk Findings

The following represents the type of findings the system surfaced (anonymised for this case study):

Category	Finding	Risk	Action
ATM Withdrawals	4 withdrawals in 9 days, total £840, no notes	HIGH	CFO review required
Velocity	11 transactions to same restaurant in 3 weeks	HIGH	Manager investigation
Benford's Law	Significant deviation in £40–£50 range (near limit)	MED	Policy limit review
Note Quality	3 cardholders averaging <2/10 note score	MED	Training required
Mystery Merchant	Unidentifiable merchant, £220, no category match	HIGH	Receipt requested
Cross-employee	2 employees, same unusual vendor, different dates	MED	Monitoring flagged

The Output

796

Transactions reviewed

Cardholders profiled

High-risk flags

Audit dimensions

The final deliverable was a CFO and audit committee report — structured with an executive summary, risk heat map, individual cardholder profiles, and a recommended action list. Every high-risk item was traceable back to a specific transaction with evidence.

The Impact

The audit directly informed changes to the company's expense policy — stricter note requirements, lower ATM limits, and a new approval layer for transactions above £150. These changes were implemented within 4 weeks of the report being delivered.

What I Learned

Forensic analysis is fundamentally about pattern recognition at scale. Humans can't review 800 transactions rigorously. AI can — and it doesn't get tired or miss the 743rd row.
Benford's Law is surprisingly robust. Even with a relatively small dataset (796 transactions), the deviation pattern was statistically meaningful and identified a real behavioural pattern around policy limits.
The note quality score was the most actionable finding. It's not dramatic, but poor documentation is a systematic governance failure that compounds over time.
A risk score is only useful if it's actionable. The report was structured around what leadership could actually do — not just what was wrong, but who needed to be called, what needed to be changed, and by when.

PythonOpenAI APIPandasExcelBenford's LawForensic AuditRisk ScoringGPT-4o