Life as a data scientist building tech products
This blog post is the second part of the “3 Shades of Data Science” Tech Tuesday blog series.
Working in Product
Imagine you’re walking across a rope bridge that’s suspended one hundred feet over perilous rapids. With each step you take, the bridge sways, creaks, and shakes. But you keep going, hoping to make it to the other side, which isn’t exactly a picnic either. It’s a thick jungle whose depths are concealed by the vastness of foliage. Get the picture?
That’s what building a product that targets a particular domain (like fraud), with very demanding end-users (like data scientists!), can feel like.
Ideally, a product company’s goals should be somewhere along the lines of:
- Build great stuff.
- Acquire numerous customers.
- Become successful.
But more often than not, this is only a dream. The reality is decidedly different. It looks more like this:
- Build great stuff while you’re onboarding new customers.
- Build faster, better! The market is changing, and new competitors with innovative ideas pop up overnight.
- Manage differing customer requests due to cultural or geographical differences.
- Troubleshoot use-case differences that might seem subtle at first glance, but can impact your priorities as you move forward.
So how do you adapt to this ever-changing world while you build your product?
Let’s talk about vision. As perfectly illustrated by the famous vision statements of global companies such as Uber, the purpose of a company is definitely not about the technology.
“Uber was founded with a simple vision: tap a button, get a ride. We’ve since evolved into a transportation network that helps you get from point A to point B…”
Any references to technology? Nope.
Global, uber-successful companies indeed have a clear purpose without considering the tech involved to realize their vision. But to become successful, they eventually have to cross the rope bridge AKA build stuff.
Now that we’ve said that, we have reached the point in this blog post where I’m going to tell you something you’ve never read before: there’s a lot of data out there. Yes, shocking, I know. But seriously, the value of data can’t be understated. As Wired uncovered, data is worth billions. As a consequence, new tech tends to be heavily based on data. More recently, it’s also based on artificial intelligence (AI) components.
Enter the data scientists
Data scientists’ responsibilities are as diverse as everything data-related. For instance, data scientists drive business decisions, maintain awareness of state-of-the-art algorithms, and keep up with the latest tech out there. But that’s not all. They also:
- produce customer-facing work;
- perform super-charged A/B testing;
- digest terabytes of data; and
- monitor all of the above.
But what if the product is a data science platform?
Then it’s also the responsibility of the data scientists to help choose which features should be implemented, review code with engineering, and build product features that are more data science-centric.
My current team’s vision is to detect more fraud by combining data from multiple customers and vendors. If you need a more tangible metaphor, it’s this: you can try 100 restaurants to find the bad ones, or you can use an aggregator, where many people rate everything.
After both successful and failed experiences, we have amassed a few insights as to the best approach for these data science problems.
Here are some of the techniques and principles our data team applies.
Focus on the big picture
- Avoid going down the rabbit hole. Practical example: Start coding right away… NOT! Doing design sprints (or the equivalent) during the early stages is extremely helpful.
- Perfect is the enemy of good. Practical example: A particular project was good enough to ship to engineering, but then the DS team worried about improving memory, which led to them testing a multitude of scenarios. In the end, testing dragged out without meaningful improvements. It would have been more efficient and timely to deliver the project, and then monitor it to see if we could provide incremental improvements.
Map the “small picture” vs. the “big picture” view
- Which algorithm should we use?
- Will customers want this?
- Will this work?
- How can we ensure this will meet privacy and security expectations?
- What data do we need?
- Whom do we partner with?
- How do we market and sell this?
I’ll admit it’s not exactly motivating to end up with more questions than we started with. But that is precisely what makes things easier moving forward. As general advice, keep your experiments as “atomic” as possible. Your work (literally) needs to integrate with Engineering. Also, product managers need to understand your work to integrate it (figuratively) into sprints, roadmaps, communications with stakeholders, and such.
Context might be one of the toughest challenges. You might be wondering why this data science blog post seems to be turning into a UX blog post. The truth is that data scientists’ work will affect the user experience.
Here are a few examples of how context impacts the way we design experiments in our B2B scenario:
- How will the model score be used? How will data scientists use it? Fraud analysts? What about Operations? All of the above? And don’t forget about how it’s going to affect the end-user (a mobile app that blocks card registration) vs. our direct customer (business) and their goals.
- Different industries. When adding this new feature to the product, will it work the same for banks, acquirers, issuers, and networks? Probably not. So let’s focus on areas that we predict may be more highly impacted.
Different use cases. Similarly, when trying to predict risk, can we generalize the feature engineering pipeline to solve account opening, transaction monitoring, and anti-money laundering at the same time?
- Different geographies. Classic issue. Can this be applied to data written right-to-left? And what happens when we have Cyrillic mixed with other characters? Will the scheduled computation of aggregations work for different time zones?
Challenge your assumptions
Feedzai data scientists probably use “challenge your assumptions” as their number one mantra because it covers the whole end-to-end data science workflow. Plus, it isn’t exclusive to working in Product. Here are a few learnings based on real-life situations:
- Carrying out performance tests with a data pipeline is highly complex, especially when you can’t transfer data from secure environments, and anonymizing the data would break essential relationships within it.
- Conduct code reviews and results reviews. Due to pressure, sometimes people tend to overlook the results when they look fine. Reviewing them not only prevents underwhelming surprises in production, but it also helps to focus on what’s valuable from the experiment.
- Ask questions. You will gain tremendous insights from talking with other people within your organization. Even asking how they view the work you do and how it affects them can be enlightening. In B2B, you can also ask questions of customers whom you have strong relationships with.
I have saved one obvious but critical piece of advice for the end: learn from mistakes. Not just yours, but everybody’s mistakes.
In the end, the challenge for product data scientists is to focus on using data to bring the Product team’s vision to life and to learn via continuous experimentation.
Subscribe to stay infomed
- data science
- Tech Tuesday