Delivering an introductory course on machine learning opens your eyes to some of the common misconceptions and non-considerations new entrants have when first presented with the subject. While I’d even consider myself a novice in many areas, I have been working with data for my entire (albeit short-ish) career. During that time I’ve seen a variety of purposes data has been used for in a business setting — sometimes with success and others with failure.
Ultimately, machine learning success stories are achieved by less-than-glamorous means and don’t make good content online. Here are a few of questions I hear during my workshop sessions:
“When do we start talking about deep learning and neural networks?”
Although my course is titled “Machine Learning and Deep Learning Fundamentals with Python” participants often want to skip straight to the “good stuff.” Yet, when deep learning works best we typically aren’t talking about your average business problem. Not every business will find high value in image classification or natural language processing. The fact is that oftentimes we are restricted in the complexity of predictive models we use due to their “black box” calculations (see neural networks in credit scoring for example). The passing of GDPR has added additional restrictions on most mid-sized and large business practices involving predictive modeling. The fact is most business problems can be addressed using simpler machine learning models and to immediately try and implement a deep learning algorithm is just overkill.
“Why do I need to work with a domain expert? All the answers lie within the data, right?”
Wrong! This is a common misconception I hear and I believe it is due in part to the hype often found in news and media involving stories about machine learning, deep learning, and the like. We are only treated to a superficial account of what must really be going on to achieve successes. Part of the process in nearly any predictive modeling project involves feature selection and engineering. When we are trying to predict some outcome or event we need to know something about how that event could relate to other variables we want to use in the prediction task. We could just throw every variable we have into the model and hope for the best, or we could engage with those that have tacit knowledge of the problem we wish to solve. There are plenty of examples of spurious correlations and we could very likely create a model that at least seems to perform well when in reality we are not capturing the true essence of the relationship. Yes, data speaks but often it needs a translator!
“Which algorithms really matter? Why don’t we just learn about the ones that work best and call it a day”
Well, that would make things much simpler, wouldn’t it? Yet the fact is that there really isn’t any single machine learning algorithm that works best on every problem. This is commonly referred to as the “No Free Lunch Theorem.” This can be for a number of reasons but two come to mind: idiosyncrasies in the data and problem being addressed and restrictions on model complexity. Every industry, sub-industry, geography, customer base, individual business, etc. contains its own nuances as do the problems and resources they wish to address with machine learning solutions. Also, as in banking, insurance, healthcare, and other regulated industries, we need to have a variety of models and techniques at our disposal for when we run into legal issues that demand model interpretability. Often in practice, we can achieve optimal results from ensembled models vs. deep learning models the former of which should be much easier to understand.
Training others in machine learning and deep learning has itself been a learning experience for me. I hope to instill in those I instruct the realities of the profession and the expectations of project outcomes.