How to overcome overfitting in machine learning based fraud mitigation for banks

AI fraud-mitigation solutions often focus on a few types of fraud and become incapable of spotting new ones – known as ‘overfitting’. Banks need far more instances of fraud than a single bank can possibly supply to avoid this problem. JOËL WINTEREGG says the answer lies in collaboration and the human touch

Artificial intelligence in banking has been hailed as a game-changer. Applications, we’re told, will bring down costs, increase revenues, improve the customer experience and cut fraud. Companies claiming to have AI solutions to help with each of these abound, as do articles repeating their promises. The reality, particularly in the area of fraud, is quite different.

Many solutions fail to live up to the hype, let down not just by their approach but more fundamentally by their very limited data sets and training models, which often suffer from overfitting. Overfitting describes what happens when training data is limited in scope, leaving the model unable to absorb and apply new data.

The problem with overfitting and fraud lies with the sheer volume of data required to train an effective AI model – volumes of proprietary data that banks just can’t gather on their own.

Think Google Images: to be able to identify a picture of a cat, the Google model had to look at 5m different pictures of felines and 5m different pictures without them. From that exercise it became adept at spotting what is and isn’t a cat. While there are plenty of cat pictures for the data scientists at Google to use, banks really struggle to get enough examples of the different types of fraud to feed their models.

There’s no way a bank gets to see 5m different types of fraud. In practice, a big global bank might see perhaps 200 frauds in a bad year (excluding card fraud) across all its systems. This includes mobile and e-banking channels, internal processes and traditional transactions. An AI model working with this data will never learn how to spot anything other than a fraud that has already been identified.

Perhaps a bank has identified a specific fraud type such as scams related to Microsoft phishing emails. Over time, bank’s machine learning based anti-fraud solution will recognize this fraud type and become an expert of detecting it. However, whenever there is a new scam type knocking the door of the bank such as invoice scam, the solution will be helpless. This is an example of overfitting. Effectively there is no new learning. The AI is simply disappearing down a rabbit hole rather than widening its experience and knowledge to identify new types of fraud.

More on artificial intelligence:

NetGuardians' CTO Jérôme Kehrli explains how AI helps financial institutions to prevent banking fraud.

In some industries, the lack of proprietary data sets is overcome by inter-company co-operation. Companies share anonymized data among themselves to help enrich AI algorithms so the models can learn better. This is exactly what banks need to do. But fraud is still viewed as a shameful event that needs to be kept as quiet as possible – a dirty little secret – and this prevents the sharing of data.

The irony is that by failing to collaborate, individual banks simply can’t get enough fraud cases and the fraudsters continue to win. The power of AI lies in having sufficient number of frauds and the models with the right algorithm approach.

But it doesn’t stop there. These algorithms need to encompass human behavior in their training. This, along with data sets from multiple banks, will open out the lens, enabling the systems to spot more types and variations of fraud. Any effective AI-driven fraud-mitigation solution must have this element.

Users of the bank’s systems – the customers and its staff – will have habits and processes that 99.9 percent of the time they will follow. This behavior can be used to build profiles. These profiles can be added to the AI system so that when an anomalous event occurs – a fraudulent transaction – it can easily identify and block it and sent an alert. In fact, it is far more straightforward to collect data on users’ behavior to build an accurate profile than it is to collect it on fraudsters.

So data quality and extraction are key, but only part of the solution. When it comes to fraud, banks need to adopt a holistic approach – masses of data on fraud types as well as 360-degree user profiles. Any AI fraud-mitigation solution that doesn’t have all three elements will be ineffective, particularly when it comes to stopping new types of fraud.

How to overcome overfitting in machine learning based fraud mitigation for banks

More on artificial intelligence:

You may also be interested in our informative page about digital banking fraud:

Subscribe to our blog not to miss any article