Recently, there has been an upsurge of focus and awareness on big data, data mining and predictive analytics in business circles. This comes as no surprise due to the advantages that can be gleaned from it as businesses continue to seek out competitive advantages. Learning of its tremendous value and all the advantages it has to offer, you have probably wondered how you could also introduce these concepts in your business to reap its benefits. However you are not sure how or where to begin the process to get all those benefits everyone is talking about. Some consider it to be like magic; too complicated to understand – algorithms, machine learning, statistical models; you need skilled professional scientist to make sense of it or its something just for techies. However, there is no need to be overwhelmed because in a nutshell, it’s simply taking historical data and applying models(a simple abstraction of the real world) to gain insights(what has occurred) and predict what the future might possibly be. Knowing the current state of affairs, its contributory factors and what may happen in the future opens up a myriad of opportunities such as being proactive, improving your operations, fraud detection, default risks and a host of many other advantages. The knowledge gleaned from an analytic exercise increases a firms competitive advantage and efficiency in the allocation of resources.
For a successful analytics project, start by identifying what you want to solve or by understanding the business requirement; a very important step that should not be ignored. An important success factor is the need for a framework, process or a guide to provide structure within which to execute the project. Ensuring you have a framework in place increases the chances of success and minimizes the probability of failure. Although there are various frameworks or processes used in analytic projects, one of the most widely used is the Cross Industry Standard Process for Data Mining (CRISP-DM).
CRISP-DM is a non-proprietary and freely available framework used in structuring a data mining project. It defines a data mining/analytics project as a life cycle consisting of six phases(see illustration). The framework begins with the definition of a business understanding phase followed by Data Understanding, Data Preparation, Modeling, Evaluation and finally Deployment phases with some phases being recursive. Each phase is further broken down into tasks and outputs. You will quickly realize that this follows a waterfall approach invariably leading to trying to hit a moving target. Our experience leading data mining teams engaged in building models, identified access to data and meaningful data in the right format as a major component of the critical success factor. Another realization was the adoption of an iterative process across the spectrum of the framework which yielded the best results. Contact us if you want to know more about how to set up or to improve the delivery of your data mining projects for success.