BitLightning | ML Engineering: How to Leverage Machine Learning for Business

03 Mar

ML Engineering: How to Leverage Machine Learning for Business

By Cory Smith machine learning machine learning , machine learning algorithms , machine learning architectures

Some time ago I attended a conference for entrepreneurs to learn about new and exciting technologies. The attendees were smart, forward-thinking business leaders. Because our lives are inundated with technology, in a sense we’re all technical. But in the context of what I do, I would describe the audience as non-technical.

One of the guest speakers was a man named Sam Altman who was invited to speak about artificial intelligence. While you may know him from his leadership at Y Combinator, what you might not know is that he’s now the CEO of OpenAI; an AI research lab he co-founded with Elon Musk, responsible for the famous GPT-3 natural language processing software.

One question he received was “I have problem X in my business, and I want to use AI to solve it”. I’m going to paraphrase in my own words how he answered; “You don’t need a data scientist for that. We already have a machine learning algorithm that solves that problem. You just need a machine learning engineer.”

To really understand the point he was trying to make, let’s take a look at the history of Machine Learning, and how it has evolved in recent years.

The Early Days of Machine Learning

In the summer of 1951, with funding from The Air Force of Scientific Research, Marvin Lee Minskey built the first neural network. It featured 40 randomly connected nodes. Each node had an input for a signal and an output for another signal to propagate from the node. Upon propagation of a signal, a capacitor would engage a clutch. Each clutch on each node would connect to a common motor. When an operator activated the motor as a reward, it would identify which nodes had been activated based on whether the node’s clutch was engaged. The machine didn’t have any utility, but it did serve as a proof of concept of a machine that could learn from inputs.

In the 1980s backpropagation was rediscovered, and would eventually become the foundation for training multi-layered neural networks. Over almost a decade of research from 1985 to the early 1990s, machine learning remained solidly in the research field at various universities. Then around 1995, Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN) started gaining in popularity. This seems to be where deep learning really jumped off, as LSTM RNNs were better than any previous machine learning technology at filtering out noise in data while at the same time remembering highly precise signal data points. This made them much more versatile compared to previous machine learning architectures, and this was probably the beginning of what we now think of as Deep Learning.

The mid-1990s were really the birth of several machine learning architectures that are the foundation of so much of today’s artificial intelligence. Not just neural networks, but also support-vector machines and the random forest algorithm came out of the data science research of this decade.

Sometime around 2010, Netflix built its recommender engine. According to people who worked on the project, Netflix’s recommender engine was estimated to take over 2000 hours of engineering. Data scientists architected, trained, and tuned over 100 algorithms that work in unison to make finding great content just a little easier for Netflix’s end users. As you look at the history of machine learning from the 1960s to the 1980s, to the 2010s, you can start to see a pattern in regards to who was taking on this work. Large companies, large research laboratories in tech companies, and large computer labs at top universities.

These projects were extremely expensive and took years to complete. The engineers working on these problems were data scientists by the strictest definition.

It was also in the early 2010s when IBM developed a natural language processing system that could understand questions in English and return answers. It was called Watson, and it was the first machine to ever win the quiz show Jeopardy! The project is estimated to have cost IBM over $1 billion and took around 3 years to complete.

University of Texas: MD Anderson Cancer Center

In October of 2013, IBM announced that it was “leveraging the IBM Watson cognitive computing system for its quest to eradicate cancer”. IBM Watson was initially touted as a game-changer for the medical community and was expected to improve patient outcomes by helping doctors make more accurate diagnoses.

At Texas University’s MD Anderson Cancer Institute, the proof of concept seemed promising with Watson showing evidence of being able to learn from medical records. But according to Ars Technica, due to lack of a standardized schema for medical data, and a failure to collaborate with MD Anderson’s IT operations department, they weren’t able to get a production system working. The data they had been using wasn’t good enough to train a production model, and they had no path to integrate with a new EMR system that was implemented mid-project.

In 2017 when MD Anderson Cancer Institute threw in the towel, they had paid IBM and Price Waterhouse Cooper a combined $62 million to develop and implement the Watson for Oncology platform. Sadly they had nothing to show for it.

Evolution and Present Day Machine Learning

In the early 2010s, as these university researchers entered the workforce, knowledge of these powerful machine learning architectures started to spread throughout top tech companies. But the work was still primarily done by data scientists who were manually building neural networks by hand using libraries like NumPy, a numerical analysis library that’s really good at linear algebra.

Around 2015 two software projects were released that seem to be responsible for really democratizing machine learning, making it available to anyone with a basic understanding of the underlying concepts and some software aptitude; Tensorflow and Pandas.

TensorFlow is an open-source library for machine intelligence that was released by Google. It allows developers to build systems that use deep learning and other machine learning architectures and models. TensorFlow has been used in projects such as Google Translate, Waymo, and Google Photos. By leveraging Tensorflow, developers no longer needed to code and debug common neural network architectures using NumPy linear algebra functions. What used to take a few hundred lines of complex NumPy instructions could now be conjured up in a single line of code using the Tensorflow library.

But algorithms and neural network architectures are only half of the puzzle. What’s shown to be equally as important, and in some ways harder to master, is the training data.

On the data side, we have Pandas. Pandas has become the defacto standard for data preprocessing, as it allows you to input, transform, and output data into pretty much any format you can imagine.

I’ll talk more about both of these tools later on, but the main point I’m trying to illustrate is that while machine learning is an incredibly complex technology, in the last 20 years it has gone from billion-dollar research labs to universities, to big tech, and now finally within reach of enterprises, small businesses, and thanks to the open-source ecosystem, even hobbyists.

We’ve gone from IBM Watson running on a multi-million dollar supercomputer, to Tensorflow Edge running on a $500 smartphone.

If the 2010s were the decade of deep learning, the 2020s will be the decade of ML Engineering. ML Engineering is about taking this body of readily available tools, algorithms, and models; building software around them, and applying them to business problems.

These days, if you want to leverage machine learning for your business, you need two things; data and an ML engineer.

The Three Machine Learning Paradigms

Looking from a broader perspective, almost all of these models use learning methods that can be classified into three types: supervised learning, unsupervised learning, and reinforcement learning. Here is a brief look at each:

Supervised Learning

This is probably the most popular method used by data scientists to train machines. It tries to predict an output from a given input, based on answers provided by a human expert. Supervised learning continues until the model achieves a desired level of accuracy on the training data, measured by comparing the answers provided by the human operator with it’s own answers. Machine learning models based on supervised learning are Regression, Random Forest, KNN, Decision Tree, Logistic Regression, etc.

Unsupervised Learning

In unsupervised learning, no answers are fed into the algorithm. Instead, the algorithm creates a model based on correlating data points, or mimicking patterns. This method is commonly used to cluster data points into different groups, find anomalies in data, or generate creative data. Examples of unsupervised learning include K-means and Isolation Forest.

Reinforcement Learning

Unlike supervised learning where the machine learns from a specific set of data, reinforcement models are exposed to an environment where they learn from trial and error. In the end, the machine tries to make the best decision based on experience. Markov Decision Process, and Generative Adversarial Networks are examples of reinforcement learning.

These learning methods are applied to build machine learning architectures that solve different types of problems, and over the years, some patterns have emerged in regards to what types of problems can be solved with machine learning.

Classification: Classifies the things into different categories.

Regression: Predicts a numerical value of the thing.

Clustering: Groups the things according to similarities that it determines.

Anomaly Detection: One of these things is doin’ its own thing, while the rest of these things are kinda the same.

Example Use Cases for Machine Learning Algorithms

Now that we have some idea of how machines are trained and how they’re used, here is an overview of some of the classical ML and deep learning models:

Classification

Logistic regression, Random Forest, K-NN, Gradient Boosting Classifier, and Neural Networks are some well-known examples of classification models. Classification models are effective in classifying a given set of data into different classes.

If there are only two classes, it’s known as binary classification. For instance, binary classification can tell if a person has a certain disease or not. Similarly, if there are multiple classes, it’s known as multi-nominal classification. It can be used to classify stocks into various classes such as buy, hold, and sell. We daily come across classification algorithms when using Google, Yahoo, and MSN mailbox. The mailbox detects incoming mail to classify it as spam or “not spam”.

Jian Yang’s famous “Hotdog / Not a Hotdog” algorithm is binary classification. An improved algorithm that could correctly label different foods would be multi-nominal classification.

Regression Models

Linear regression, K-NN, Random Forest, and Neural Networks are examples of regression models. These models often establish a relationship between a dependent and independent variable, which is used to predict a numerical value. The machine is trained to predict such numbers as house values, the number of calls, and inventory sales.

Zillow, a popular online real estate marketplace, uses its trademark Zillow Algorithm to determine the price of a house according to its location, characteristics, and similar homes in the vicinity.

Clustering

K-Means, DBSCAN, Hierarchical Clustering, Gaussian Mixture Models, and BIRCH are examples of clustering models. As the name suggests, these models are used to assemble data into clusters and groups. A real-life example of clustering includes finding similar topics and users on Twitter based on hashtags.

It’s quite common to confuse clustering and classification models because they seem relatively similar. In the case of clustering, there is no predefined label attached to input while classification uses predefined labels.

To reinforce this, let’s look at the k-means clustering algorithm.

With supervised learning, and a data set labeled by a human expert, I could train on a set of 100 images of 10 different fruit types, and a classification algorithm will label the image with the type of fruit.

By contrast, using clustering to group that same set of images, I could tell a k-means algorithm to sort them into 10 groups and it might group them by fruit type. It might also find something else to group them by. I could also tell the same algorithm to create 6 groups and it might group them by color. Or I could tell the algorithm to create 2 groups and it might group them based on whether the stems are attached. Often I won’t know how it will group things until I try, and even then it may not be obvious just looking at the groups.

Recommendation engines often use clustering algorithms. These models are used to recommend something, which can include suggestions such as the “next item to buy” and “related videos to watch”. Netflix uses recommender systems to suggest what to watch based on the viewing habits of the user, which I talked about earlier on.

Anomaly Detection

IsolationForest, Minimum Covariance Determinant, Local Outlier Factor, and One-class SVM are examples of anomaly detection algorithms. These are used to find outliers in the dataset. They try to differentiate rare items, events, and observations to detect an anomaly.

Anomaly detection is regularly used to flag suspicious fraud events in credit card transactions. Amazon uses this model to identify products that are no longer fresh. Amazon also uses an end-to-end machine monitoring system, Amazon Monitron, that detects anomalies in vibration or temperature to check when a specific system requires maintenance. Google uses anomaly detection to identify memory leaks in microservices by looking at how much memory an application uses relative to CPU.

An anomaly detection algorithm correctly identifies that the girl in the top right corner is the only one with pigtails.

Popular Machine Learning Tools

So I’ve covered the different ways that machines can learn, as well as the more commonly used algorithms. And while the math behind these algorithms is solid, the code implementation is incredibly complex. Luckily the community has published a comprehensive set of tools to make the implementation of these machines much easier.

Here is a look at some of the best real-world machine learning tools that have allowed engineers to quickly and more easily implement the above algorithms to create innovative products. While there are tons of proprietary tools on the market, I’m mostly going to talk about open-source software because that’s what I’m best at.

Tensorflow

Tensorflow is one of the most popular deep learning libraries. The machine learning toolset is practical for building large-scale neural networks using data flow graphs.

Use cases of Tensorflow include voice recognition, text-based applications, image recognition, video detection, and time-series algorithms. Among other things, it’s been used to build neural networks for understanding audio signals; therefore, it is increasingly used by telecom handset manufacturers and by customer relationship management vendors. Google is already using the technology for text summarization and a technique known as sequence-to-sequence learning. In fact, the S2S technique is the technology underlying Google Translate.

PyTorch

This open-source tool is developed by Facebook for the Python language. It has solutions for image classification, style transfer, predictive analysis, and natural language processing. PyTorch is particularly attractive for ML developers because it’s comparatively easy to learn compared to similar tools. It also offers dynamic graphs which make it easier to visualize data with large variations.

Salesforce uses PyTorch for natural language processing because PyTorch makes deep learning models, such as recursive neural networks, easier to implement. Due to the ease of learning, PyTorch is very popular with researchers and educational institutions. Researchers at Stanford use it to research new algorithm approaches whereas Udacity, an educational organization, utilizes PyTorch to teach AI innovators useful skills.

Keras

Keras is a deep-learning API created by Google. Written in Python, it has the option to integrate multiple back-ends for computation. Of the available frameworks, Tensorflow uses Keras as its official API. While it may be a bit slower, Keras is extremely beginner-friendly, which makes it the number one choice of both amateur and research ML Engineering projects.

Designed for humans, Keras often requires few lines of code and explains errors clearly. Yelp, Uber, and Netflix use Keras. Besides solving complex problems, it’s widely used to create and test basic models such as forecasting heart disease, predicting stocks, and detecting face masks.

Natural Language Toolkit

NLTK, Natural Language Toolkit, is designed to work with human language data. It contains a huge set of text processing libraries for tokenization, stemming, tagging, parsing, and semantic reasoning. A community-driven project, the NLTK platform also offers a comprehensive on-hands guide for learners.

It supports the largest number of languages compared to similar resources. Over the years, NLT has made many breakthroughs in text analysis. Researchers are using it for sentiment analysis to filter product reviews. It’s equally popular for creating chatbots that can help answer customer queries without human intervention.

OpenCV

OpenCV is used mainly for image processing and computer vision tasks. More than 500 algorithms and 5000 supported functions allow users to create intelligent models for almost any imaginable computer vision task. The in-built library supports a variety of algorithms to detect faces, identify objects, extract 3D models, and stitch images together.

Almost all major tech companies including Google, Yahoo, Apple, and Microsoft use OpenCV in ML Engineering projects. From stitching street view images together in Google Maps to checking runway debris in Turkey and inspecting labels inside factories, OpenCV is becoming a universal standard. OpenCV supports C++, Python, Java, and MATLAB interfaces.

Pandas

Pandas is a Python package used for data cleaning and analysis. Since its introduction by Wes McKinney, it has evolved into a powerful ML Ops tool that can technically perform almost any type of data manipulation. I love it because it represents data efficiently and allows me to organize such data in groups for easy discovery and exploration.

Economists use Pandas to get important insights into the complex data patterns that were once impossible to interpret. Similarly, Spotify and Netflix use Pandas extensively to provide useful and relevant recommendations to their audience. Neuroscientists are also benefiting from this tool as it helps them identify important trends that affect our nervous system.

It would probably be easier to list the companies that don’t use Pandas, as it has become sort of a defacto standard for data preprocessing. If you can think of a data format, Pandas probably has a way to import it, manipulate it, and export it.

Featuretools

Yet another open-source Python framework that can be used to prepare data for machine learning. Using deep feature synthesis, anyone can combine raw data to create meaningful features for machine learning and predictive analysis. Various APIs ensure that only valid data is used for modeling.

This open-source framework can work alongside various ML tools such as Pandas. For example, you can upload Pandas dataframes to create features that otherwise may take a lot of time. Featuretools is mostly used in feature engineering, sifting through hoards of data from multiple dissimilar sources to figure out which features will have the best impact on the algorithm. Subsequently, the newly organized data is used to train ML models.

Jupyter Notebooks

Jupyter Notebooks are quickly becoming a go-to ML tool for data analysis and visualization. The multi-language support and availability make it popular among data scientists, machine learning enthusiasts, data engineers, and scholars. Users can easily create and share the document called the Notebook, which enables them to share their work and let others join in. Picture a Google Doc with IDE features that allow you to alternate between documentation blocks, code blocks, and blocks that contain the output of a previously run code block.

Jupyter Notebooks are used by data scientists and mathematicians to share projects involving data visualization. It is also beneficial for teaching purposes because it allows students to seamlessly manipulate and interact with the given data. Bloggers, computer programmers, and data scientists can also share Notebooks allowing others to download and recreate the experience.

This is by no means an exhaustive list. There are lots of other machine learning tools and models that can pretty much solve most problems but it is important to note that thoughtful projects do not solely rely on models and tools. Instead, I recommend leveraging MLOps and ML Engineering for an efficient and pragmatic path to a successful outcome at all points of the ML project lifecycle.

Beyond Models and Algorithms

Machine workflows have all of the complexity of the software development lifecycle and the DevOps lifecycle, with some additional complexity added for big data.

Machine learning operations, or MLOps for short, is the methodology of breaking down these challenges and making them manageable.

Why Invest in MLOps

Let’s say you’re working on a proof of concept to use machine learning to solve a business problem. You collect some data, run the data through preprocessing, run it through a deep neural network, and look at the result. Let’s say the accuracy of your model came in at 75% and you need an accuracy of 95% to effectively use the model in production.

What would you do in this situation? Has the proof of concept failed? Do you need to use a different architecture? Do you need to use a different machine learning approach? Do you just need to tune some hyperparameters? Do you need more data? Do you need cleaner data?

You can quit now and cut your losses, or you can continue to invest in a solution. But if you continue, you have 5 possible paths to take, and maybe only one or two of them will actually help you. But they could all incur significant time and expense, eroding your capital and patience, and prolonging your ROI.

Let’s say you choose the “trying a different machine learning approach” strategy. What implications will that have on your preprocessing? How much will you have to refactor the outputs to get them to a usable format?

A solid machine learning operations model will help you implement good abstractions at each point in the MLOps lifecycle, so you can swap out different technologies and processes at each stage. It also helps you measure the effectiveness of different strategies so you can make pragmatic data-driven decisions about how to improve the effectiveness and accuracy of your machine learning model.

Above I talked about the failure at MD Anderson Cancer Institute. There were two main issues identified; challenges with the data, and inability to integrate with the new EMR system. Both of these are examples of problems that MLOps was created to solve.

Getting Started with ML Engineering

Just as the industrial revolution displaced the practice of manual one-off manufacturing of goods, the artificial intelligence revolution will displace much of the cognitive services sector. It’s been predicted that 10 years from now there will be two types of businesses; businesses that use AI, and businesses that offer old-school “artisan” services. If you manage a business, this might be a fork in the road. If you want to be able to scale your business, which path you should take seems pretty clear.

Looking at all of the algorithms and tools that I’ve mentioned, you can start to see why machine learning solutions are much more feasible and accessible today than they were 10 years ago.

In my experience, the biggest challenge companies are facing these days is around their data. The problem of “How do we create a machine that can learn?” has been solved. The problem that’s unique to each business is: “How do we prepare our data so that the machine can learn from it?”.

For many business problems, there is are machine learning algorithms just sitting around, waiting to be used. Most startups and enterprises don’t need a data scientist to leverage this technology. They need a software engineer that knows how to work with data, has a good understanding of algorithms, and knows how to set up good machine learning pipelines. In other words, they need a machine learning engineer.

How We Help Companies Leverage Machine Learning

By now you probably have a good intuition for what’s possible with today’s machine learning technology, how it can help your business. My question for you is; do you think you’ll be able to remain competitive if you don’t leverage this technology for your business?

Machine learning can be incredibly intimidating and confusing, especially if you’re not familiar with the terminology or concepts. It can be hard to know where to start or what tools to use.

BitLightning is here to help. We offer machine learning services and support for businesses of all sizes. Our team has experience with a variety of open source and cloud solutions, so we can help you get started quickly and efficiently. Let us help you take your business to the next level with machine learning.

If you want help evaluating your problem for machine learning, you can schedule a free consultation on the calendar below.

Start leveraging machine learning for your business today!

Book a free consultation and learn how to use readily available algorithms, models and tools to disrupt your industry.

Book Now

If you’d like to be notified when I publish new content, you can sign up for our mailing list.

Thanks for reading.