When we screen data, we match data patterns we know to be non-matches. If they occur frequently enough, we can choose to ignore that pattern going forward.
In general, there are two generic types of false positive reduction (FPR) strategies: whitelisting, and rules-based processing. Whitelisting generally means that a particular record is being marked as not a match – either to a specific matching phrase, to specific watchlist listings, or being marked generally as a trusted record. Rules-based processing is more generic; if a specified series of conditions in either the data being matched, or the watchlist data being matched against, the match is ignored. The match being ignored can be either to a specific matching phrase, to a specific listing or set of listings, or to a generic class of matches.
Was that vague enough for you?I thought it was…
Let’s consider a number of cases. First, let’s assume our client (account #34543) is Maria Rodriguez Gomez, and it matches Maria Elda Rodriguez Pulido (matching phrase Maria Rodriguez), who has 2 listings on the OFAC list. We could whitelist the record in one of three ways:
- Account 34543 is not a match to the matching phrase Maria Rodriguez, but if the record matches something else (e.g. Maria Gomez), it will stop for review for that.
- Account 34543 is not a match for the 2 specific records in the OFAC list today. If a third one gets added for the same matching phrase, the record will stop for review for that new listing.
- Account 34543 is considered a good record, and will not stop for any matching phrase
It should be noted that whitelisting is a functional capability (as opposed to a specific system capability) because it can be implemented in a number of technical ways. For example, the third option above could be accomplished using rules-based functionality (e.g. ignore all matches for account #34543), or it could be done by a “Mark this record as good” functionality.
Now, let’s consider rules processing. Imagine if we have a large number of clients located in Santa Ana, California. These records all match the Albanian National Army, which is also listed by the ANA acronym. One could write a number of rules to ignore these matches, depending on the nature of the data (and our available tools, of course):
- Ignore the match ANA in the CIty field if the City field equals Santa Ana
- Ignore all matches in the City field to listings that aren’t cities
- Ignore all matches to ANA unless the matched field equals ANA
In both cases, we have identified patterns in the data we screened that we are comfortable saying is not a match to the matching phrase.
There is another way to leverage patterns in data – but, in this case, we are leveraging patterns in the entity listings. We could eliminate patterns based on things we expect to be there, but aren’t.
Example 1: If we match to Juan Cruz (assuming this is the matching phrase for the OFAC listing Juan M. de la Cruz), we could write a rule ignoring all matches that don’t include “de la Cruz”
Example 2: If we match to Maria Rodriguez (from above example, Maria Elda Rodriguez Pulido), we could write a rule to ignore matches when the name doesn’t contain “Maria Rodriguez”, “Maria E Rodriguez” or “Maria Elda Rodriguez”
Example 3: If we match the Iranian city of Kerman, we could write a rule to ignore Kerman if the country field was not Iran. Of course, we could also exclude the matches where the country was USA, or the state was CA or TX (there are Kermans in both states), too.
Depending on your risk tolerance, and your data quality, you may or may not be able to write any of these above rules, For example, if your country field has blank values, and those aren’t predictable defaults (e.g. all blanks mean the account is in the US), you couldn’t write the Kerman rule based on Iran not being in the country field. Or, if sometimes country values sometimes show up in your city field, you may not be able to write a rule to exclude non-city matches in your city field (from the ANA example above). How big “sometimes” needs to be for you to throw the baby out with the bathwater, though, is up to you. It may be better to find the exceptional records and screen them differently if they are a small subset of the whole than to reject real time and cost savings.
Filed under: False Positive Reduction, Technology
