Over the past few years, the internet has been inundated with thousands of articles proclaiming the new age of data and how it interacts with and drives artificial intelligence (AI). As a result, the three terms data science, machine learning, and deep learning have transitioned almost overnight from buzzwords to standard vocabulary, and have become synonymous with the direction that society is moving in. But how many can really enunciate the differences between those sacred terms?
In the olden days, it was called statistics. But now it has morphed and grown, like Thanos’s chin, until it became ‘data science’. Today, top-flight universities offer degrees in it and everyone is calling it a career path that will never fail.
The first recorded reference to ‘data science’ was by Peter Nauer, a Danish computer pioneer, at the White Hart Tavern in 1960 when he used it to replace the term ‘computer science’. Actually, the White Hart Tavern part is not true, he was probably in his office talking to a grad student, but it still makes for a good story.
One of the more modern references to it was by C. F. Jeff Wu in his 1997 lecture at the University of Michigan “Statistics — Data Science?”. In Dr Wu’s universe, data science moved somewhat past traditional statistics, using the trio of data collection, modelling and analysis, and significantly, decision making.
Turing Award winner, Jim Gray, looked at it in 2007 as a ‘fourth paradigm of science’, augmenting the normal scientific methods by including the dimension of ‘data-driven analysis’. But it was not until 2012 when the Harvard Business Review published their article “Data Scientist: The Sexiest Job of the 21st Century” that things really began to take off.
In the end, it is hard to succinctly and completely describe what a data scientist does because it does cross over into the artificial intelligence area but it certainly starts with some hardcore statistical concepts and a really solid knowledge of the programing language Python which has many functions that make statistical analysis easier.
At the same time, it goes beyond statistics. Instead of just collecting and analyzing data using tried and true statistical methods, the data scientists ask the all-important question: What if?
What if we looked at the data from a different perspective? What if we extended the modeling that our test data has given us in a number of independent ways? What if we let a machine analyse the data with no rules to guide it? What will it show in terms of relationships?
In the end, it is the input generated by the data scientists that will feed the two tools they will use to use the data to make decisions; machine learning and deep learning.
Wikipedia says it best:
“Machine Learning is a field of computer science that uses statistical techniques to give computer systems the ability to learn . . . without being explicitly programmed.”
Much of the data intelligence work prior to this required extensive programming to help the computer account for every eventuality. And, of course, those efforts were doomed from the beginning because not every eventuality can ever be considered.
Machine learning is different. In this case, a framework is set up that feeds statistically relevant information into the computer and lets it make decisions, lets it learn, from that data, rather than from programmer instructions. In other words, machine learning lets the computer discover what is probably true and what is probably false on its own based on data provided.
The more data fed in and the higher the quality of that data, the more the machine will learn. And when it is done learning, data can be fed in and decisions are spat out by the machine.
A relatively simple example of a machine learning system is the spam filter attached to email inboxes. By looking at the various words that make up the email, and evaluating the probability of a given word or group of words being a danger, it is able to make a decision as to what should be filtered out and what should be left in.
Machine learning is still in elementary school. That is, as with the spam filter, we have all seen cases where legitimate emails are marked as spam and things that should be are not. But most of the time, it is close enough to the target. And as more and more spam filters start using true machine learning, that should improve.
Of all the words encountered in this article, none is more forbidding, more laden with the unknown, more likely to send a terrifying chill down the length of one’s back, than deep learning. Is this indeed what will unleash SkyNet on this unsuspecting world? There are some people who would swear it will. But it will not. Or at least not yet. Not for a few years.
Again, Wikipedia nails it by calling deep learning:
“part of a broader family of machine learning based on learning data representations as opposed to task-specific algorithms.”
Deep learning is part of machine learning, but it specifically ignores anything that is specific, that is task oriented. It is not used to define a system that will tell one what city in each state is the capital. Or the largest. Or the most fun.
Other names for deep learning are deep neural networks, deep belief networks, and recurrent neural networks. And it has been applied to things as freaky weird as computer vision, speech recognition, natural language processing, audio recognition, social network filtering (something that is way overdue), etc. Broad things for a broad approach.
As noted above, deep learning is a subset of machine learning, a subset that is focused on two things.
The first is patterns rather than rules or facts. The machines are taught to look for patterns or even just portions of a pattern.
The second is mimicking the behaviour of neurons, particularly those in the neocortex of the brain.
What difference does this make? One of the hallmarks of neurons in the human brain is that none of them works alone. There is not, for example, a neuron that is responsible for recognising a dog versus a cat. Instead many neurons will work together, each one perhaps only responding to one very small part of the patterns the brain has for ‘dog’ and ‘cat’. But working together they are able to reach a consensus on whether it is a dog or cat.
That is what deep learning is working on. It requires tremendous computing power, plus an in-depth understanding of how the brain works, something that is still being studied and an arena where knowledge is constantly growing and changing.
In the End
Data science is based in statistics but data scientists go beyond just linear regressions. Remember? They are sexy. The new data science goes beyond analysis to prediction, and to look at data in ways that the traditional techniques do not. And the main reason for this is not a breakthrough in the mathematics but the adoption of powerful computers to quickly run analysis that would have been impractical in the past.
Machine learning is about using data to let the computer learn on its own. Sometimes this learning is directed, as when rules are included or other parameters are set which guide how the machine makes its decisions, or undirected when the machine digs into the data and see what it can find. It is all about separating intelligence from programming. In the machine learning world, the data is the teacher, not the coder.
Deep learning is a subset of machine learning. It is based on data, but it uses a particular algorithm type, one that acts similarly to the neurons in a human brain. Will it result in a positronic brain and the three laws of robotics? Hard to say. But it is the future.
More on Artificial Intelligence
Deus Ex Machina: Fa.i.th in the Age of Artificial Intelligence
The artificial intelligence vocabulary has always been a phantasmagorical entanglement of messianic dreams and apocalyptic visions, repurposing words such as...
Automation: To Panic Or Not to Panic, That Is the Question
The dramatic headlines have been, and will continue to be, inescapable: “Robots Will Destroy Our Jobs — and We’re Not...
Artificial Intelligence Will Forever Change The Battlefield
Most have by now heard that Google is facing an identity crisis because of its links to the American military....