The day when the computer becomes a data scientist
Relax, that day has still not come. No computer is threatening your position, and you can skip the help wanted ads in the newspaper for now (Who reads the newspapers today anyway!?)
So, when should data scientists start worrying about losing their jobs? Probably not in the coming years, but they should be aware that parts of their work can be done by a computer algorithm.
Sounds a bit surprising, right? After all, data scientist is one of the most popular and demanded jobs in the tech industry. So why am I so pessimistic about its future? Why do I write this article that will decrease the motivation of those who are enthusiastic to become data scientists?
Before answering that – let’s briefly discuss what a typical data scientist is: a person who can study, analyze and interpret a huge amount of data by using mathematical and statistical methods as well as Machine Learning and Neural Network algorithms. The scientist is skilled to predict almost everything in almost every field by using the mentioned methods.
The data scientist usually starts every project by digging into the data (using charts, scatter plots, histograms and other visual tools), then cleaning it by dropping irrelevant variables (and adding missing data) – AKA preprocessing. The next step is choosing the right classifier / regression method followed by picking the right features in the data in order to get the most accurate prediction. In between, the data scientist tests different combinations of classifiers parameters for obtaining the most optimal and efficient prediction mechanism.
All the mentioned steps and methods demand high analytical and comprehension skills from the person who apply them, and right now, it doesn’t look like a computer can do all of these steps better than a human being.
Nevertheless, the computer plays an important role in many parts of the data scientist’s projects. A good example for this – is the Cross Validation in the Model Selection module where an algorithm ‘finds’ best classifier or the best classifier parameters. By that, it saves time for the scientist, who can skip this part and focus on other challenges.
Another potential ‘data science’ task that the computer will be able to do in the near future (or maybe it already does in some existing algorithms), is the initial scan and analysis of the data it receives. The computer will be able to decide by deep learning which parameters and features are important and which parameters are not. It can even create many histograms / scatter plots for the every combination of the features, and by image recognition it can determine the type of dependency between one feature to another.
Now, think about how many companies are using big data analytics, and how many different types of classifiers and regressions have been developed in so many fields so far. It is quite possible that there is an ML algorithm for almost every new need. Think about the option of storing every developed ML algorithm in a database, and whenever someone (without data science skills) needs to study and learn from data, the computer will choose the best classifier/analyzing method for his needs from the database!
I guess you understand where I’m going with this – almost every part of the data scientist’s work can or will able to be done by a Machine/Deep Learning algorithm, or in other worlds – by Artificial Intelligence (AI), and it really should not surprise us. AI, which is one of the most important application and research fields within the tech industry, takes over, and yes, data science plays a super important role in developing it! AI is responsible for many tasks that machines can do by themselves. Who would have thought a few years ago that computers would be able to code themselves? But thank to AI they can, and they will get improved with time. Computers are replacing and will continue to replace humans in many fields, so it’s probably a matter of time before a computer will also be a data scientist.
Think about it! A computer that the only thing you need to do for it – is giving the data you have, and what you want to know based on this data (see Fig. 1). The computer will have a friendly simple interface that will receive your data (you can just paste it from your Excel file), and with a smart AI algorithm it will give you the most accurate answer/prediction!
Fig. 1: A scheme that describes the steps needed for an AI machine to provide prediction. In the described scenario, all possible data (unprocessed) is given to the computer along with the feature we want to know/predict. These two inputs will be processed by a sophisticated AI algorithm, which will provide the most accurate prediction based on the data. The algorithm will basically do all the data scientist’s tasks.
Sounds awesome for the tech companies, but not for the future data scientists…
Ok, wait! Don’t be panic, dear data scientists (or data scientists to be)! Your future is not that bad! I know I described a pessimistic scenario of the data science, but I want to clarify, that I was talking about the current tasks and projects of the data scientists that will probably be performed by the computer. I did not say that that the data scientist career will ‘die’. It is quite possible that it will evolve to a different ‘line of work’; Perhaps the future data scientists will ‘supervise’ the computer and monitor its operations when it processes the data, perhaps they will develop new AI methods, or will focus on studying new theoretical and mathematical models (or perhaps they will do something else…). What is certain is that the future data scientists will deal different tasks and challenges. I am not worried about the ‘transformation they will have to undergo’. I believe there are skilled enough to learn and adapt to the ‘new’ data science future.
Whether you like it or not, future data science will be quite different from the present one. That is, as mentioned, thanks to AI that ‘kills’ many occupations, changes professions, but at the same time ‘creates’ new positions in the tech industry. Data science (if it will still be called that), is one of the occupations that will probably not ‘die’ but will change, and will also have a crucial role in the future AI technology.
What will be the role and the new duties of the future data scientists? We can only guess or speculate (or develop an AI algorithm that will able to predict that).