Feedzai and Google DeepMind Scientists Use Deep Learning to Extend Research in Natural Language Processing


Feedzai data scientist Luís Marujo recently co-authored a new paper with Wang Ling from Google DeepMind, Chris Dyer from Carnegie Mellon University, Alan W. Black also from Carnegie Mellon University and Isabel Trancoso from Instituto Superior Técnico. This paper has been received well and is published by MIT Press Journals: (http://www.mitpressjournals.org/doi/pdfplus/10.1162/COLI_a_00249)

In this blog we interview Luís Marujo on what his research is about and get his perspectives on the fast emerging machine learning branch called ‘deep learning’.

 Q. What do you do at Feedzai?

I work as a data scientist in the Feedzai Research team, where my overarching goal is to develop applied scientific research. The major difference between this kind of research and traditional scientific research is the objective. Applied research is more pragmatic because it focuses on developing cutting-edge technology to further improve products. This is not necessarily the case in traditional scientific research where the goal is to generate new knowledge and spread it through publicly available written publications.

An example of applied research is to investigate ways to improve Feedzai Machine Learning models used to detect fraud in financial transactions. In my recent work, I accomplished this objective by incorporating the most recent cost-sensitive methods in Feedzai Random Forest Models to guarantee, to a certain extent, that our models perform at their best.


Q. Can you give us some background into your research & whether it is relevant to fraud detection?

Before joining Feedzai, I was a dual-degree Ph.D. in Language Technologies from Carnegie Mellon University (CMU) and the Instituto Superior Técnico (IST). During my final year at CMU, I attended ICML 2015, one of the top machine learning conferences, where it was said that “NLP (Natural Language Processing) is like a rabbit in the headlights of the deep learning machine, waiting to be flattened” (https://mobile.twitter.com/sgouws/status/619880524329414656). Since then I have attended (NAACL, ACL, EMNLP), some of the most important conferences in NLP, and I have observed the almost overwhelming influence of Deep Learning in the NLP field. I’ve also contributed to this with one of my publications at ACL when I used word embeddings to perform automatic keyword extraction from Twitter data. I foresee that this trend will continue since Deep Learning has become a very hot topic.

While this paper does not include Deep Learning technology, it deals with parallel corpus gathering, which was later used by a new Deep Learning translation architecture. Having parallel corpus means that exists a collection of texts in one language and their translations into a set of other languages. In most cases, parallel corpora contain data from only two languages. In this situation, it actually vaguely resembles labeled financial transactions for fraud detection, where the transaction information is a sentence in one language, and the label is the same sentence in another language. Let us clarify this comparison a little bit more, sentences and financial transactions are represented by a set of features (e.g., words, word vectors) and a label is a binary value or a feature. Now, we can generalize and view machine translation as a solution to the problem of finding a function that maps a set features into another set of features. Hence, fraud detection tries to find function that maps features into one unique feature or label (legit or fraud).

This paper was also part of the Ph.D. work of the first author of the paper, my colleague and friend Wang Ling. In fact, the work was so interesting that Wang and one of advisors, Chris Dyer, were invited to present and later join Google DeepMind team. This team is now quite famous for having developed a system capable of beating Go champions using Deep Learning techniques!


Q. Why are we hearing so much about Deep Learning now?

Deep Learning is fulfilling some of the promises made in the 80’s about neural networks becoming the state-of-the-art machine learning solution. At that time, there was simply not enough computational power to fulfill such objective. Only recent advances in hardware, such as very fast and cheap graphic cards and memory, made the computational power available. This computational power combined with big data, efficient and flexible algorithms based on multiple layers of abstraction justify deep learning to produce high accuracy results in many fields including Speech Recognition, Computer Vision, and Machine Translation. These results replaced the previous state-of-the-art solutions drawing the attention of the research communities and top software companies.

Another big advantage of deep learning methods is that they have an architecture based on hidden layers. These layers can extract higher level features directly from the data without manual feature engineering. Removing feature engineering from the machine learning pipeline is very important because it is a manual time consuming task. Unfortunately, in many cases, it is still necessary for some manual feature engineering efforts to archive the best results.

Deep learning is not always applicable to all problems because it requires large datasets (compared to other methods including Random Forests for the same question) and some kind of spacial locality, e.g., speech has phonemes that compose words, text data have words that are part of phrases and sentences, and images have pixels that are composed of the nearby pixels to describe an object. Applying transformations to these pixels, words, phonemes can represent more general concepts. This is not necessarily correct for problems where the order of the features and is not important – for example, to predict the price of a car, one may have car brand, number of wheels, county, among other features. Combining these features does not necessarily enable to extract more general information, such as if the car is a sports car or a van.


Q Why did you choose to come to work for Feedzai?

There were several factors that made me come to work for Feedzai. Some of the factors were serendipitous, such as catching the same flight from Newark to Lisbon when I was returning home from my Ph.D. thesis defense at CMU (in Pittsburgh, PA, USA) with Sérgio, a former assistant professor at IST who taught me  Software Engineering and who is now a Senior Software Engineer at Feedzai.

I also met Paulo Marques (Feedzai’s CTO) multiple times during CMU|Portugal events and we talked about event detection, which is an imbalanced multi-class classification problem. This means that there were some points of contact between my Ph.D. work and fraud detection, which is an imbalanced binary classification problem.

Lastly, the company’s growth has been running at 300 percent year-on-year. This fact indicates amazing career opportunities and at the same time reveals the high quality of our human resources. On top of this, I am able to work on Machine Learning problems from my favorite city in the world: Lisbon.

Subscribe to stay infomed