Are machine learning models cracking under the sudden behavioral changes caused by the Coronavirus pandemic? It’s a question posed by a recent MIT Technology review article, among other publications.

While MIT is spot on, this hasn’t been the case at Feedzai. We’ll show you why in this post.

To understand how our machine learning models are fairing in a post-pandemic world, it’s important to realize that how a model is designed and built can determine its ability to respond to system shocks. At Feedzai, we follow best practices to build robust machine learning models.

Characteristics of robust machine learning models

  • Rely on individual behavior. The type of features we build rely on individual behavior, not generalized cohorts. We teach the model how to characterize fraud vs. non-fraud on each instance of every behavior.
  • Avoid overfitting. We pay attention to model degradation over time. We’re also obsessed with running simulations and production scenarios months into the future, which means we avoid overfitting. This ensures the model is generalized enough to handle changes in behavior and unseen patterns.
  • Include peak volumes. The data on our datasets include peak volumes (e.g., black Friday, promotions, etc.), so even the stockpiling of toilet paper or hand sanitizer can be read as “normal” by the model.
  • Use historical and real-time data. We build and train models using several months of historical data. Once we deploy the models, we update their features and profiles with real-time information.
  • Dynamic entity risk profiling. We avoid hard encoding of data entities, like risky merchant categories, emails, IPs, payment types, ATMs, locations, etc. Instead, we compute those items dynamically and in real-time, which is useful in detecting shifts in fraud strategies. It’s also key to adjusting to new macro-trends in consumer behaviors.

Credit card use case

To truly understand how robust machine learning models work, let’s examine a credit card use case.

Start with Raw Data

We feed the model minute data from thousands of cards based on information from the payment gateway. This includes each card’s behavior throughout the year — where it’s used, what time of day it’s used, what the transaction amounts are, what types of vendors it frequents, etc. It’s an endless list of options, and we do this for every card.

Sample appropriately

We process billions of transactions. While we could technically use the data from all of those transactions, the expense related to machine time and cluster usage makes that an inefficient option. Instead, most big data machine learning applications use a technique called data sampling. Data sampling, much like its name implies, uses a sample subset of data points to identify patterns and trends in the larger data set. But sampling data is not without its challenges. You must do it in a way that maintains the integrity of the underlying data while allowing for the computation of the right features.

At Feedzai, we implement smart sampling techniques that satisfy these constraints and give us a 3% to 6% lift in fraud detection performance when compared with traditional sampling approaches.

Understand behavior profiling at the level of the payment

To detect fraud, we must detect changes in customer behavior. Fortunately, criminal transactions look quite different from actual transactions. Criminals typically want to spend money quickly before customers or organizations report the card lost or stolen. This usually results in a short term spike in spending.

Measuring behavior allows us to compute a specific card’s six-week or eight-week average spending and compare it with the average spending from the previous week or even a single day. We can even dig deeper, looking at complex equations or the probability of a specific person spending a certain amount of money during a particular time frame. In this way, we can differentiate between criminal and authentic behavior.

Assign labels to every transaction

Every single transaction has a fraud/non-fraud label associated with it. Labels allow the model to develop an intimate understanding of fraud and its early indicators. In that way, when the card has a change in behavior (e.g., increase in spending), the model won’t necessarily raise an alert. It knows that particular card because of our previous data inputs, which we used to train the model. The model understands what fraud looks like in the context of every specific behavior.

The model also learns base risk factors. These are risk behaviors independent of individual consumer behavior because they are common fraud vectors, such as high-velocity spending, transactions in risky merchant types (e.g., gambling websites), or late-night ATM withdrawals.

It’s the combination of individual behavior and base risk factors that allow the model to build a precise assessment of the true risk of that particular transaction. It takes everything into account along with the context in which the transaction occurs, looking at spending habits, recent previous activity, merchant of transaction (sometimes even the item), and even the hour in which the transaction occurs. As you can see, the machine learning model doesn’t just consider one factor. Rather, the model looks at hundreds of elements simultaneously to determine if a transaction is suspicious or not.

Putting it all together: an example of robust machine learning for fraud detection

For example, Allan and Bob are both thirty-five-year-old single men living in Oakland, California, USA. Allan typically spends $1,000/week, and Bob typically spends $500/week. The model does not trigger an alert when Allan spends his $1,000. However, when Bob spends $1,000, the model triggers multiple alerts. Still, merely spending an additional $500 a week is not the definitive factor (maybe his dishwasher broke down). It is the combination of multiple factors that raised the alarm. These could include:

  • Higher spending (change in Bob’s normal behavior)
  • Conducting multiple quick similar amount transactions (a common fraud indicator)
  • Spending on a merchant known to have high fraud (the higher likelihood that the transaction is fraudulent)
  • Shopping during hours Bob is usually sleeping (change in Bob’s normal behavior).

How Feedzai’s machine learning models behaved at the start of the crisis

As we previously shared in Coronavirus Economy: How to Fight Fraud & Friction to Survive, generally speaking, the start of the crisis resulted in a steep drop in the number of transactions across all sectors except groceries, beer, wine, and liquor.
We predicted our models wouldn’t have any issues with the drastic drop in the number of transactions, and that’s exactly what happened; we didn’t have to adjust our models. There are a few reasons for this.

First, the transactional behavior exhibited during lockdown wasn’t unknown to the model; it was a lower volume of already learned behavior. The same is true for increases in transactions for certain categories, such as groceries. While people might be buying more groceries than they typically do, the model had already learned grocery merchant codes, time of day, frequency, and so on. Further, other people have always transacted this way (making large grocery purchases), which exemplifies non-fraud behavior. This means the model has already learned about this type of transactional behavior. It’s not unlike how the model understands spikes caused by Black Friday or Cyber Monday. The model has already learned that type of behavior and takes it into account as just one of the factors it considers. If anything, the model sees less activity as less risky, as it counters how fraud typically unfolds.

Secondly, our models are biased for authentic transactions, which results in a bias toward low fraud scores. This bias exists because the vast majority of transactions are non-fraudulent. Even if the model were to see something entirely new, such as a new merchant category code (MCC), chances are that it would provide a low fraud risk score. That’s why we’re not worried about triggering a slew of false positives. If anything, we’d be more concerned that we might miss fraud. But this isn’t an issue because fraud has distinctive markers, such as behavior changes, risky MCCs, increases in sudden spending, high velocity, and geolocation inconsistencies or risk, which the model is sensitive to.

How Feedzai’s machine learning models are fairing as countries come out of lockdown

Almost three months into the crisis, the drastic changes seen at the start of the global lockdowns aren’t a factor as governments use a phased approach toward recovery. We can see the gradual easing of restrictions in the graph below.

Graph showing drop in transactions during coronavirus lockdown in March and April and slow increase in May

A gradual increase in transaction velocity is an ideal situation for our machine learning models because the model can easily adapt to gradual changes in customers’ transactional behavior.

Think of it like this. Say our friend Maria becomes vegan. Maria didn’t go from eating steak for dinner on Sunday night to tofu on Monday night. Instead, she probably decided to give up red meat. Then a little later, Maria became a pescatarian. Later still, she gave up fish, and even after that, she gave up dairy. Her journey to veganism was gradual. And this is reflective of human changes in general. Data scientists build models to understand that type of incremental change.

Fraud stands in stark contrast to this natural, gradual behavior. Fraud is fast and drastic. Fraudsters know that they must quickly use stolen cards or credentials before the account holder checks their wallets or accounts. And so, machine learning models look for drastic behavior changes.

Here at Feedzai, we haven’t had to change thresholds; the models are handling the changes fairly easily. We can see from the below charts that even though there is a change in volume, alert rates have stayed flat, and fraud detection unchanged. That’s what we want to see. What these graphs illustrate is that our models are unaffected by the change in the rate of transactions.

Graph showing transaction volumes starting to increase while fraudulent transactions remain flat

First, we can see that while overall transactions dropped, fraud remained at the same levels, as many of the fraudsters operating models are already remotely operated. This reinforces the need to keep all fraud systems at top performance or risk incurring severe fraud losses.

Second, we can see that our models kept behaving as expected, maintaining a very stable number of alerts, while also keeping the same levels of fraud detection.

Graph showing alterted transactions remained flat vs. fraud detection

If economies went from lockdown to wide open overnight, that would be more challenging. But the gradual reopening from a machine learning perspective creates the perfect environment to continue catching fraud in a business as usual manner.

Key Takeaways

At the start of the pandemic, we saw a drastic decrease in the number of transactions. Because Feedzai has built models with robust characteristics, including relying on individual behavior, avoiding overfitting, including peak volumes, and using historical and real-time data to update risk profiles dynamically, our models held up well to the changes and we haven’t had to adjust them. As global economies slowly come out of lockdown, we see a gradual increase in transactions. From a machine learning model point of view, this is the perfect scenario. The models can easily adjust to these incremental changes. A drastic uptick in transactions could affect the models, but even so, that impact would be automatically contained and quickly mitigated, all thanks to our built-in model robustness.