At Banks, A Data Science Arms Race

It is only in recent years that some banks have created data science teams, and these teams are growing at a rapid pace. In this new climate, banks are finding that the tools that make their data scientists happy are the same tools that will help them beat fraud.

In-house data scientists and engineers have become a focal point. For example, 70% of junior hires in the equity trading unit at Royal Bank of Canada Capital Markets are engineers as opposed to hires with a strictly business background, according to a CNBC article. This past March, OCBC Bank Singapore launched an in-house artificial intelligence unit. Among the goals of the AI unit are to provide ML-driven personalized products and apply AI to services like loans, financing, and wealth advisory.

Finding qualified, experienced AI experts is not an easy task due to the shortage of AI talent. Some banks are working around the AI talent shortage by acqui-hiring data science teams. Earlier this year, TD Bank brought onboard 17 data scientists by acquiring Layer 6, a Toronto-based AI startup. Capital One acquired AI startup Notch adding 16 AI-experienced professionals to its Center for Machine Learning.

Bringing machine learning in

The rising sophistication of fraud calls for rising sophistication in data science. Banks are exploring a variety of in-house data science strategies as they seek to become more aggressive against today’s constantly evolving criminal threats.

The result is that machine learning has gone from “outskirts” to “table stakes.” A recent research study co-conducted by Tabb Group on behalf of Squirro found that 83% of the banks surveyed have evaluated AI and ML solutions. And 67% of the banks surveyed have actively deployed those solutions.

The differentiator today is not in having machine learning, but in supplying its fraud-fighting practitioners with the most advanced tools.

Numerous machine learning tools are available, and many of these tools are general purpose and built for a variety of use cases. Many of these tools are available as open source software, and numerous commercial ML tools are available as well.

There’s also a proliferation of companies that focus on a single layer of fraud detection through techniques like biometrics or device fingerprinting.  Despite the availability of so many tools, data science teams are likely to find that the rate of fraud advancements will outpace the implementation of any singular solution.

Rather than purchasing point solutions, many banks are seeking out fraud partners with an underlying machine learning infrastructure that’s extensible and future-proofed, such as an open machine learning platform. These banks think of machine learning vendors as capabilities, not solutions, because additional use cases and fraud solutions can be wired on as criminals evolve.

Other banks are seeking to create their own machine learning solutions in-house, rather than going to a vendor.

For example, the Development Data Group at World Bank had a particular problem that they wanted to solve with machine learning: to measure a person’s height from a photo, based on that person holding a reference object. The photo would be taken during respondent interviews. The team ended up developing an in-house ML solution which is explained in a detailed blog post.

Building a custom in-house ML solution for any use case is difficult, but it is especially difficult for fraud prevention, as we’ve discussed in the ebook: Misconceptions About Building a Machine Learning Platform for Risk.

A clamoring for great data science tools

At the front edge of today’s data science there’s OpenML, which allows data scientists to combine the best of both worlds: all of their preferred existing methods with the fraud-specific innovations of a third party.

AI-enabled fraud detection platforms require data scientists to work in singular data science environments dictated by the vendor, which limits the ability of data scientists to innovate and experiment with the most current approaches.

Feedzai’s OpenML engine embraces the data science community by allowing in-house teams to “bring their own machine learning” to our comprehensive platform.  OpenML is based on a microservices architecture. It includes an SDK for Python, R, and Java. It provides close integration with many commonly used data science and machine learning tools like H20, R Studio, and DataRobot. And it enables teams to leverage pre-written machine learning libraries from any open source, like Spark’s MLib, scikit-learn, and TensorFlow. Any future external library or scoring framework can also be leveraged as these become available.

You can read more about it here and here.

RELATED:
Meet Feedzai’s OpenML Engine: “Bring Your Own Machine Learning” to Fight Fraud
OpenML Demo: Watch Us Train a Model Outside Feedzai, and Bring it On In
Misconceptions About Building a Machine Learning Platform for Risk