Data Mining – What, Why, When
One of the best ways to learn about any topic is start with very fundamental questions like What, Why etc? Good old Socratic method. In this series of articles on data mining, I plan to approach this topic in a similar fashion.
What is Data Mining?
Simply put, Data mining is the process of sifting through large data sets to identify and describe patterns, discover and establish relationships with an intent to predict future trends based on those patterns and relationships.
Why is data mining relevant now? Haven’t we been ‘mining’ data from time immemorial?
Yes and No. It is true that data was always analyzed to identify patterns and predict outcomes, the data that organizations had to deal with exploded in recent times with the advent of big data. As these large data sets make it almost impossible to identify those multi-dimensional patterns using traditional techniques or tools, data mining in its modern form, with the advent of latest tools and faster processing, automates the discovery of patterns, establishing relationships, and putting together predictive models thus making it efficient.
What are some of the specific benefits of data mining?
The broad benefit of identifying hidden patterns, consequent relationships and establishing predictive models can be applied to many functions and contexts in organizations.
Specifically, customer-focused functions can mine customer data to acquire new customers, retain customers, cross-sell to existing customers. Other examples are to enhance customer lead conversion rates and/or build future sales prediction models or new products & services.
Financial sector companies can build fraud-detection models and risk mitigation models. Energy and manufacturing sector can come up with proactive maintenance models and quality detection models. Retailers can build stock placement/replenishment models in stores and assess the effectiveness of promotions and coupons. Pharmaceutical companies can mine large chemical compounds data sets to identify agents for the treatment of diseases.
What skills are needed for data mining?
Data mining sits at the intersection of statistics (analysis of numerical data) and artificial intelligence / machine learning (Software and systems that perceive and learn like humans based on algorithms) and databases. Translating these into technical skills leads to requiring competency in Python, R, and SQL among others. In my opinion, a successful data miner should also have a business context/knowledge and other so called soft skills (team, business acumen, communication etc.) in addition to the above mentioned technical skills.
Why? Remember that data mining is a tool with the sole purpose of achieving a business objective (increase revenues / reduce costs) by accelerating the predictive capabilities. A pure technical skill will not accomplish that objective without some business context.
A data point is from Meta Brown’s book “Data Mining for dummies” where she states:
“A data miner’s discoveries have value only if a decision maker is willing to act on them. As a data miner, your impact will be only as great as your ability to persuade someone — a client, an executive, a government bureaucrat — of the truth and relevance of the information you have to share. This means you’ve got to learn to tell a good story — not just any story, but one that honestly conveys the facts and their implications in a way that is compelling for your decision maker.”
Hope this gives you an overview of data mining and where it can be applicable. In the next article, I plan to go over various data mining techniques. Until then, good bye.
About the author:
Ramesh Dontha is Managing Partner at Digital Transformation Pro, a management consulting and training organization focusing on Big Data, Data Strategy, Data Analytics, Data Governance/Quality and related Data management practices. For more than 15 years, Ramesh has put together successful strategies and implementation plans to meet/exceed business objectives and deliver business value. His personal passion is to demystify the intricacies of data related technologies and latest technology trends and make them applicable to business strategies and objectives. Ramesh can either be reached on LinkedIn or Twitter (@rkdontha1) or via email: rkdontha AT DigitalTransformationPro.com