Data and Algorithm Acquisition Can Improve Machine Learning

Machine learning consists of three structural elements that are core to its function: the data received for analysis, the algorithms that turn the information into structural adaptations and the visualized results that appear when the first goes through the second. By training the machine to pick up patterns, it can identify anomalies or directions that help businesses make informed decisions, like determining whether a transaction discrepancy is a slight deviation from a customer or a sign of fraud. While pointing out data sources and creating the models they feed into is often necessary, sometimes purchasing them can be just as effective, especially if you gain both in the acquisition.

IBM’s acquisition is more than about the weather

In late October of this year, IBM acquired the digital and technology assets of the Weather Company, which runs the Weather Channel and the website Weather Underground. This is part of a plan to expand its machine learning platform Watson’s capabilities to include weather predictions and analysis. According to business blog Quartz, the purchase included not only Weather Underground, but extensive weather data from 3 billion weather forecast reference points, more than 40 million smartphones and 50,000 daily airplane flights. While this is an incredible amount of data, what many sources left unmentioned is that the acquisition likely included the company’s weather prediction models, which they spent decades crafting and honing.

This is not to say that IBM itself doesn’t know how to make forecast models based on analytics on its own. In fact, it helped produce several of them in collaboration effort with the National Oceanic and Atmospheric Administration, the federal government’s weather agency in the last two decades. However, as the acquisition indicates, the company looks to expand its capabilities in that field without necessarily creating sophisticated new models from scratch. After all, the Weather Company likely already linked its data sources directly to its models anyway, so it makes more sense from an economical standpoint.

Taking the best of both worlds

This acquisition reopens the question constantly in the background in the analytics industry: Is it better to have more data or better algorithms? The answers often vary. Some, like Garret Wu of Data Informed, say a combination of more data with simple algorithms is usually enough because the former can eventually inform the latter when given enough space to do so. Others, like machine learning scholar Xavier Amatriain, argue that if a model is too simple, the mere dumping of more data will only prove the model has a high bias, which is not something you want. This is a more significant problem if the data quality is poor.

IBM’s answer to the question is to simply seek both data and models at the same time, especially if they have a symbiotic relationship with one another. In doing so, you gain high variance as opposed to bias while at the same time having data to back it. This is especially important in areas where granular details and variance is exceptionally important, such as fraud detection. An extensive amount of data may not discern a fraudulent purchase from a real transaction that’s just unusual if the model is too simplistic. On the other hand, if there is too little information fed into a complex model, you won’t get any tangible results at all, leaving problematic purchases undetected. By having both a large amount of data combined with an array of basic and advanced algorithms working in concert, you’ll have a much clearer picture with fewer false flags. That will keep your customers happy and your financial data safe.