Security analysts spend disproportionate amounts of time investigating suspicious transactions. False positives – or transactions incorrectly flagged as fraudulent – cost both the analysts and the employer time and money.
Traditionally, transaction monitoring begins with using a rules-based system. The system scans customer transactions – deposits, fund transfers, payments, purchases, etc. – for red flags typical of money laundering. Upon a transaction being flagged, the system generates an alert, and the transaction data is sent to an investigation team for review.
The problem lies in the inaccuracy of this traditional approach. A vast majority of flags are false positives – well over 90% (in the use case that we discuss later, this percentage was over 99%). Besides resulting in sunk costs, false alarms may also result in poor customer experience in transactional delays and rejections.
Transactional delays and rejections decrease customer satisfaction, and with the number of banking alternatives available today, FIs risk churn if the underlying problem goes unaddressed.
In a recent AI in Business podcast, CEO Daniel Faggella sat down with an AI veteran, Dr. Scott Nowson, to discuss these challenges. Dr. Nowson is the AI Leader at Pricewaterhouse Coopers (“PwC”) Middle East. PwC is of o the largest financial services companies in the world, with global revenues reaching $50 billion in 2022.
Throughout our conversation with Dr. Nowson, we reference a specific use case wherein his team was tasked with reducing the number of false positives for a banking client. Among the actionable points discussed were:
- Feature engineering to reframe the problem of false positives: The value of enduring the time-consuming process of observing experts in building and refining a fraud solution to define the true lowest possible false signal rate.
- Build beta version alerts around your KPIs: A three-step process to demonstrate shareholder value while implementing an iterative solution.
Expertise: Innovation Management, machine learning, natural language processing, informatics.
Brief Recognition: Before heading up AI operations for PwC in the Middle East, Scott worked for several years in AI- and analytics-centric management roles for Accenture and Xerox. Scott holds a Ph.D. in Informatics and an MSc in Cognitive Science and Natural Language from The University of Edinburgh.
Using Feature Engineering to Reframe the Problem of False Positives
In this particular use case, Scott states that the banking client was dealing with a false positive rate of over 99% on a volume of 10,000 monthly alerts. Each alert required a minimum of 20-30 minutes of investigation. The bank needed a vast team to accommodate such a large workload, resulting in high costs.
The real trouble was that the business problem itself had a problem, says Scott. “In the client’s mind, the problem they wanted us to solve was: Can you find the needle in the haystack? But in any data problem, finding 1% is just insane.”
As such, Scott reframed the business problem to the stakeholders, emphasizing the need to discover what constitutes the true positives as opposed to the constituents of the 1% of “authentic” fraud cases. 10 to 20% of cases would still need to be investigated to ensure a comprehensive, effective system.
To evaluate the central issue for PwC’s banking client, Dr. Nowson says he implemented an approach he’d learned at Xerox in his work with ethnographers: immersing oneself in employees’ daily work. He explains what this approach accomplishes and its necessity, divulging that it is done to gather intelligence on the future solution:
“It is feature engineering. We don’t like to talk about this anymore because it’s so time-consuming in the data scientist’s life. But you can’t do deep learning in this environment because of compliance. You have to be able to go to the regulator and say, at any point being audited, why you missed that.”– Director/AI Leader at PwC Middle East, Dr. Scott Nowson
When his team spent time with agents, they noticed process improvement opportunities. Beyond this, the team was able to observe just how simple it was for agents to dismiss the false flags. As such, Scott’s team raised a question: Why doesn’t AI eliminate the false positives and thus eliminate the required effort?
Moreover, Scott says, “If we’re going to be having to do feature engineering, let’s at least learn from humans – what features we’re using and scale them.”
Upon observing the agents’ workflows, the PwC team noticed that the maximum number of transactions an agent could simultaneously examine was six. Scott remarks that this was an ample opportunity for machine learning implementation: “The whole point about machine learning is big data. It can look over two years [of financial transaction data] in seconds, and it can see much bigger patterns.”
Another challenge with fraud is that it is constantly evolving. There are new ways of committing “doing” fraud all the time. As such, training existing data to counteract future problem data isn’t a viable solution.
To overcome the ‘arm’s race’ problem, Scott’s team architected a solution that first examines fraud trends – “the new normal” – and then examines the data. “It’s about looking for a new normal and adjusting that,” says Scott.
He explains his approach:
“What we’ve typically done then is taken this post transaction monitoring system and applied it before the transaction monitoring is done. You’re then able to come up with new rules and new ways of detecting what looks like fraud. There’s an order to it. You start with finding the normal, and those mechanisms that you’ve built in – using 99% of the data – will help you build a framework that will then let you be better at finding the 1%.”– Director/AI Leader at PwC Middle East, Dr. Scott Nowson
Alerting Stakeholders to KPIs in Iterative Versions
A reasonable question then becomes: how do you show the stakeholders evidence of progress using such a phasic approach in reducing false positives? Where stakeholders in Dr. Nowson’s use case expected the team to be able to find the 1% of fraud amidst the data – the proverbial “needle in the haystack” – Scott emphasized the importance of the other 99% of the data.
Reframing the problem in a ‘glass half full’ context – or, more accurately, reframing the problem around the haystack and not the needle – context proved invaluable in getting stakeholders to understand the nature of false positives as their true business problem. He and his team then shifted their focus to coming up with short-term improvements while gradually improving the model’s accuracy.
Scott described his team’s two-step process of simultaneously demonstrating short-term value whilst improving upon the final solution:
- First, measure statistical confidence of the early model iterations and comparing to stakeholder requirements.
- Second, develop and refine a beta version of the software wherein an alert is tagged with what the algorithm would have predicted.
- Three, continue to train additional data and insights gained into the model.
In Scott’s case, step one involved producing metrics such as model error rates and classification percentages. These numbers must then be gauged against the stakeholders’ tolerance for risk, as a single compliance failure can result in a fine that reaches millions of dollars. As model accuracy improves in the second step, it is continually refined to when automation can be “turned on” gradually. New data and rules are applied to the model as required.
According to Scott, juxtaposing these results in such a way proved very persuasive in overcoming gaps in understanding with unknowing partners: “The stakeholders want to know that you’re just messing around at the bottom end, the easy end. You’re proving the value. Then they have the confidence when you go and mess up the high value [the 1%].”
When prompted for his final advice for leaders who may want to reduce false positives and adopt a similar scheme, Scott shares:
“One of the key [lessons] for me … is getting everyone involved and understanding all sides. And so long as you can communicate openly between the old and new, explain what you’re doing, why you’re doing it, show the value quickly – talk their language.”