Self-driving cars, image recognition, speech recognition, medical diagnoses and playing chess – machine learning has been making waves everywhere and continues to improve technologies across multiple industries, especially so in cybersecurity. The end goal surrounding such applications is to move towards increased automation, self-learning and effectively, reduced reliance on humans.
Modern fraud solutions – such as CashShield – have begun to incorporate machine learning algorithms in the fraud screening process. This improves operational efficiency and increases the accuracy of pinpointing fraudsters in the sea of unique users. This is a step away from traditional fraud detection solutions that are predominantly rules-based. Rules-based solutions are often heavily reliant on manual labor to configure, update and make sense of scores from the rules. Machine learning fraud solutions have improved the situation significantly. They have allowed for greater automation and greatly reduced the need for manual reviews.
But – there’s the caveat: most modern fraud solutions built with machine learning can only greatly reduce the need for manual reviews, but are unable to completely eliminate the need for manual reviews.
Human involvement in fraud prevention: boon or bane?
Industry experts commonly agree that overreliance on or long manual review times are undesirable. Despite this, manual reviews are still commonplace for most companies. This is whether or not they have adopted fraud tools designed with machine learning algorithms.
Proponents of keeping the manual review process worry that machines will make mistakes, passing fraudulent transactions and failing genuine ones. And of all the nightmares a merchant could possibly have, rejecting genuine customers and risk losing them forever is probably one of the worst ones.
False positives is a tricky problem to solve, however. The fraud screening process may contain overly strict controls or rules set that turn away genuine customers. Risk-averse manual reviewers who fear increased chargeback rates will reject borderline transactions as well. After all, human judgement is not impervious to human error.
When manual reviews cost more than it saves
Keeping a manual review team is costly. Hiring a manual review team costs money, and finding the right people is difficult. On top of that, training the team to ensure they are well updated on recent fraud trends or newer fraud tools can put a strain on resources too. Many merchants report that maintaining a team dedicated solely to fraud prevention is unjustified, and would rather divert resources to other parts of the business to generate revenue.
As your business expands and receives more transactions a day, scaling the manual review team becomes an important consideration. You must improve operational efficiency to ensure that more transactions can be processed without delay. But, reality check: not everyone can afford to invest extra resources solely into fraud mitigation. This is particularly so for short promotional periods that last a day, or a week at most.
To put it simply, this is how manual reviews cost more than it saves:
Assuming that sales usually peak over the year during the summer sale and year end sales, a company might choose to increase the manual review team halfway in the year to meet with the incoming demand. However, the sales dip between Black Friday and the ensuing holiday season will cause an excess capacity not met. The extra hires will therefore take a toll on operational costs. Meanwhile, lost opportunities might occur all year round. This is especially so when the existing manual review team is unable to meet the unexpected influx of transactions. As a result, your business suffers from lost revenues.
Putting fraud and machine learning together
We have all heard the term somewhere – machine learning – but what do we really know about machine learning? And more importantly, how it can prevent fraud?
Let’s understand how the fraudster works:
He has bought a list of 10,000 stolen credit numbers off the Dark Web. Now, he is intending to use the numbers to make purchases on his favorite e-commerce stores. Of course, the fraudster doesn’t want to get caught. To do so, he needs to ensure that the transactions made with the stolen credit card numbers do not seem like they are made by the same person.
Using easily obtainable tools from the Dark Web, the fraudster will be able to mask and change their IP addresses. More sophisticated fraudsters would also be able to make micro-changes to the device fingerprint, increasing the difficulty of tracing the various transactions back to one source. For example:
- First transaction, Transaction A was made from an iPhone 6S
- Transaction B was made from an iMac computer
- Transaction C was made from a Linux computer
The fraudster can computer these changes in mere seconds. And he can make the transactions one after another or minutes apart, randomly. Now we ask: how do machines do what humans can’t do?
With the massive size and quantity of data to be analyzed to match fraud patterns, machine learning has become of utmost importance.
Call on the machines to take on a bigger role
Most fraud systems would minimally have the basics: using historical data sets of known fraud patterns to train the machines. This is so that they are able to predict and capture (or block) the same type of fraud patterns. Most would commonly know this as supervised machine learning.
Typically, a supervised machine learning model “learns” to recognize patterns and make predictions, constantly refining its accuracy by processing and analyzing emerging data. This is done by collecting a colossal amount of data, then labelling the data based on previous incidents of fraudulent behavior. Next, data scientists train the model to recognize and predict the same anomalies in future outcomes. Unlike static and inflexible rules, supervised machine learning is able to keep up with the increasing volume, velocity, complexity and variety of data today. Having machines train themselves automatically based on historical data drastically reduces the effort required to keep fraud detection up to date.
Most fraudsters do not stop at one place. In fact, if the fraudster had been blocked at one platform, they would quickly move on to a different platform to try their luck. In the case of success, they would become audacious, using the winning attack on other websites to maximize their profits. Therefore, some fraud solutions would go further to share known fraud behavioral patterns across various merchants and companies.
In the face of new fraud trends
However, if we only train the machines to capture fraud based on historical data, what happens to fraud attacks that are new with zero traces of historical data anywhere?
To fill this gap, CashShield incorporates unsupervised machine learning, that would allow the system to identify fraud patterns without known data (or historical data). With each incoming transaction, the CashShield system analyzes millions of data points within seconds. A large part of it is on the user’s behavior, to identify good behavior as much as bad behavior. Even seemingly negligible data points such as the device battery level, the browser version and whether or not the user has connected to social media are all clues. This helps in identifying whether a fraudster is attempting to trick the system by making micro-changes to the transactions. Real-time pattern recognition runs pattern analysis with each incoming transaction to identify fraudulent patterns. The analysis can even detect coordinated fraud attacks or new attacks launched by the fraudster.
Machine learning can help us speedily trawl through massive data sets and flag out potentially fraudulent behavior, but in the end, machine learning systems often just end up with a probability score (or what we call the “fraud score”). Even if the system automates up to 95% of transactions to pass or fail, 5% of the borderline transactions would still require manual reviews or human decisions for completion.
To achieve full automation, we must move beyond just relying on machine learning for the answers.
Striving to achieve full machine automation
When we first conceptualized CashShield a decade ago, we always imagined it as a fully automated solution. This was to lessen the stress on the operational team, as well as to provide instant delivery for our digital goods merchants.
Through a unique application of high frequency trading (HFT) algorithms, combined with our machine learning models, the core CashShield system is able to make sense of the fraud score and automate decisions to pass or fail a transaction in real-time.
Think about it: accepting a transaction is extremely similar to investing in a stock. Both have a potential return with a risk of default (or loss). Applying financial modelling (such as a return to risk ratio) allows the system to view all the transactions as part of an investment portfolio. Consequently, the algorithm maximizes the merchant’s returns based on an optimal risk level.
Take a look at the chart:
Red denotes high risk transactions, yellow denotes borderline risk transactions and green denotes low risk transactions. Comparing the risk to return ratio, the system can accept a riskier transaction with a great potential return, as long as the system has offset the higher risk with other low risk transactions within the portfolio. This allows the system to be more aggressive in accepting more transactions to maximize revenue as well.
Humans or machines?
Supporters of manual reviews distrust the machines to be fault-free and 100% perfect. And we agree, because by comparing fraud risk to financial risk, 0% risk means 0% returns. We should take some risk, but just enough to maximize your business potential.
But let us not forget the other benefits of full-machine automation: to keep one step ahead of fraudsters, without compromising consumer experience and growth. Without having to dedicate extra manpower and resources to manage fraud, these efforts can be committed to other areas. With a full machine automated fraud system, a business can streamline its operations to handle large volumes of data, scaling aggressively and easily on demand without a huge jump in costs.