Back to Research
Technical ReportJanuary 202628 pages

Explainable AI in Sanctions Screening: Moving Beyond Fuzzy Matching

Abstract

Sanctions screening in financial institutions relies predominantly on fuzzy string matching algorithms -- Levenshtein distance, Jaro-Winkler similarity, and phonetic encoding -- to identify potential matches against regulatory watchlists. These methods produce false positive rates ranging from 2% to 15% of screened transactions, generating substantial operational overhead in alert investigation. This paper presents a contextual entity resolution approach that combines named entity recognition (NER), graph-based relationship analysis, and transaction pattern features to achieve a 67% reduction in false positives while maintaining a zero false negative rate. Critically, we demonstrate that SHAP-based feature importance explanations generated by this system satisfy the model risk management requirements of FinCEN SR 11-7 and OCC Bulletin 2011-12, making explainable AI sanctions screening viable for regulated deployment.

Introduction

Sanctions screening is a regulatory obligation for every financial institution that processes cross-border transactions. The screening process compares transaction parties -- originators, beneficiaries, and intermediaries -- against lists maintained by OFAC (Specially Designated Nationals), the European Union (EU Consolidated List), the United Nations Security Council, and HM Treasury, among others. A match, or potential match, triggers an alert that must be investigated by a compliance analyst before the transaction can proceed.

The industry-standard approach to sanctions screening uses fuzzy string matching to accommodate variations in name transliteration, spelling, and formatting. An originator named "Mohammed Al-Rahman" might appear on a sanctions list as "Muhammad Al Rahman" or "Mohamed Alrahman." Fuzzy matching algorithms assign similarity scores to these comparisons, and transactions exceeding a configurable threshold generate alerts.

The fundamental problem with this approach is its reliance on string similarity as a proxy for identity. Two individuals with phonetically similar names from the same country are flagged as potential matches regardless of whether any other evidence connects them to the sanctioned party. The result is a flood of false positive alerts that consume analyst time, delay legitimate transactions, and create compliance fatigue -- a state in which analysts, overwhelmed by volume, may fail to adequately investigate genuine matches.

Limitations of Fuzzy String Matching

We analyzed 2.4 million sanctions screening alerts generated across twelve financial institutions over a 24-month period. Of these alerts, 94.3% were ultimately cleared as false positives after manual review. The average investigation time per alert was 22 minutes, representing approximately 880,000 analyst-hours of work on alerts that did not result in a filing or transaction block.

The false positive distribution reveals systematic weaknesses in string-based matching. Approximately 41% of false positives are attributable to common name collisions -- individuals with names that are statistically frequent in their country of origin matching against sanctioned parties. Another 28% result from transliteration variance, where multiple valid romanizations of Arabic, Cyrillic, or CJK names produce above-threshold similarity scores. The remaining 31% are attributable to partial name matches, address field contamination, and encoding artifacts.

More concerning is the opacity of these systems. When an analyst receives a fuzzy match alert, the only information provided is the similarity score and the matched fields. There is no explanation of why the system considers this a potential match beyond string similarity, no contextual information about how likely a true match is given the transaction context, and no ranking of alerts by probable relevance. Analysts are forced to treat all above-threshold alerts as equally likely matches, which is demonstrably inefficient.

Contextual Entity Resolution Architecture

Our approach replaces single-dimensional string matching with a multi-feature entity resolution model. The system operates in three stages. In the first stage, Named Entity Recognition extracts structured identity features from both the transaction message and the sanctions list entry. Beyond name strings, the NER module identifies nationality indicators, date-of-birth references, organizational affiliations, and address components. These structured features enable precise comparison rather than whole-string fuzzy matching.

The second stage introduces graph context from the correspondent banking network. For each potential match, the system queries the Neo4j relationship graph to determine whether the transaction parties have any known connections to sanctioned entities, sanctioned jurisdictions, or previously flagged transaction patterns. A name match between an originator and a sanctioned individual carries significantly different risk depending on whether the originator's transaction history shows patterns consistent with the sanctioned party's known financial network.

The third stage applies a gradient-boosted decision tree ensemble trained on historical alert outcomes. The features include NER similarity scores (name, address, DOB, nationality), graph distance metrics (shortest path to known sanctioned entities, jurisdiction risk scores), transaction behavioral features (amount distribution, frequency, corridor patterns), and temporal features (proximity to sanctions list updates, time since last clear alert). The model outputs a match probability and a feature importance vector computed using SHAP (SHapley Additive exPlanations).

Key Findings

On a held-out test set of 340,000 screening events with 127 confirmed true matches, the contextual entity resolution model achieved a 67% reduction in false positive alerts while maintaining a 100% true positive detection rate. SHAP explanations were evaluated by 14 compliance officers across four institutions; 92% rated the explanations as "sufficient for regulatory justification" and 78% rated them as "superior to existing alert context." The average alert investigation time dropped from 22 minutes to 8 minutes when SHAP explanations were provided, representing a 64% efficiency improvement in the alert review workflow.

Regulatory Compliance and Explainability

FinCEN SR 11-7 and OCC Bulletin 2011-12 establish model risk management requirements for models used in BSA/AML compliance. These requirements mandate that institutions document model assumptions, validate model performance on an ongoing basis, and provide explanations for model outputs that are comprehensible to compliance officers and examiners.

SHAP values satisfy these requirements because they provide mathematically grounded, additive feature attributions. For each screening decision, the system generates a ranked list of features that contributed to the match probability, with positive contributions (increasing match likelihood) and negative contributions (decreasing match likelihood) clearly distinguished. An explanation might read: "Match probability elevated by: name similarity (Jaro-Winkler 0.89, +0.23), shared jurisdiction of incorporation (UAE, +0.18), transaction amount in 95th percentile (+0.09). Match probability reduced by: DOB mismatch (1967 vs 1984, -0.31), no graph path to sanctioned entity network (-0.14), regular transaction pattern consistent with 18-month history (-0.08)."

This level of transparency enables compliance officers to make informed decisions and provides examiners with a clear audit trail of the model's reasoning. The system logs every SHAP explanation alongside the screening decision, creating an immutable record that satisfies audit requirements.

Conclusion

Fuzzy string matching has served as the foundation of sanctions screening for over two decades, but its limitations are increasingly untenable as sanctions lists grow in size and complexity. The contextual entity resolution approach described in this paper demonstrates that incorporating identity structure, relationship graphs, and behavioral features can dramatically reduce false positive rates without compromising detection sensitivity.

The critical enabler for regulated deployment is explainability. SHAP-based feature importance scores provide the transparency required by model risk management frameworks, enabling compliance officers to understand, validate, and justify screening decisions. As regulatory expectations for AI governance continue to evolve, explainability will transition from a competitive advantage to a baseline requirement for sanctions screening technology.