In data we trust

When we screen data, we match data patterns we know to be non-matches. If they occur frequently enough, we can choose to ignore that pattern going forward.

In general, there are two generic types of false positive reduction (FPR) strategies: whitelisting, and rules-based processing. Whitelisting generally means that a particular record is being marked as not a match – either to a specific matching phrase, to specific watchlist listings, or being marked generally as a trusted record. Rules-based processing is more generic; if a specified series of conditions in either the data being matched, or the watchlist data being matched against, the match is ignored. The match being ignored can be either to a specific matching phrase, to a specific listing or set of listings, or to a generic class of matches.

Was that vague enough for you?I thought it was…

Let’s consider a number of cases. First, let’s assume our client (account #34543) is Maria Rodriguez Gomez, and it matches Maria Elda Rodriguez Pulido (matching phrase Maria Rodriguez), who has 2 listings on the OFAC list. We could whitelist the record in one of three ways:

Account 34543 is not a match to the matching phrase Maria Rodriguez, but if the record matches something else (e.g. Maria Gomez), it will stop for review for that.
Account 34543 is not a match for the 2 specific records in the OFAC list today. If a third one gets added for the same matching phrase, the record will stop for review for that new listing.
Account 34543 is considered a good record, and will not stop for any matching phrase

It should be noted that whitelisting is a functional capability (as opposed to a specific system capability) because it can be implemented in a number of technical ways. For example, the third option above could be accomplished using rules-based functionality (e.g. ignore all matches for account #34543), or it could be done by a “Mark this record as good” functionality.

Now, let’s consider rules processing. Imagine if we have a large number of clients located in Santa Ana, California. These records all match the Albanian National Army, which is also listed by the ANA acronym. One could write a number of rules to ignore these matches, depending on the nature of the data (and our available tools, of course):

Ignore the match ANA in the CIty field if the City field equals Santa Ana
Ignore all matches in the City field to listings that aren’t cities
Ignore all matches to ANA unless the matched field equals ANA

In both cases, we have identified patterns in the data we screened that we are comfortable saying is not a match to the matching phrase.

There is another way to leverage patterns in data – but, in this case, we are leveraging patterns in the entity listings. We could eliminate patterns based on things we expect to be there, but aren’t.

Example 1: If we match to Juan Cruz (assuming this is the matching phrase for the OFAC listing Juan M. de la Cruz), we could write a rule ignoring all matches that don’t include “de la Cruz”

Example 2: If we match to Maria Rodriguez (from above example, Maria Elda Rodriguez Pulido), we could write a rule to ignore matches when the name doesn’t contain “Maria Rodriguez”, “Maria E Rodriguez” or “Maria Elda Rodriguez”

Example 3: If we match the Iranian city of Kerman, we could write a rule to ignore Kerman if the country field was not Iran. Of course, we could also exclude the matches where the country was USA, or the state was CA or TX (there are Kermans in both states), too.

Depending on your risk tolerance, and your data quality, you may or may not be able to write any of these above rules, For example, if your country field has blank values, and those aren’t predictable defaults (e.g. all blanks mean the account is in the US), you couldn’t write the Kerman rule based on Iran not being in the country field. Or, if sometimes country values sometimes show up in your city field, you may not be able to write a rule to exclude non-city matches in your city field (from the ANA example above). How big “sometimes” needs to be for you to throw the baby out with the bathwater, though, is up to you. It may be better to find the exceptional records and screen them differently if they are a small subset of the whole than to reject real time and cost savings.

Filed under: False Positive Reduction, Technology

In data we trust

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List