It feels good when people appreciate your work. Usually it’s as simple as a text with a couple of 👍 or ❤️ or even sharing my previous blog post on their Whatsapp status. Little things like that go a long way in motivating people. So if you’re reading this and just remembered someone who deserves your appreciation, go ahead, send them text or shoot them an email. It means a lot.
Anyways, here I am again, curiously, with yet another blog post titled “The Why of”. I think the idea of Start with the Why is really getting to me ‘cause I’m already petting an idea called The ‘Why’ of Religion (God save me). The reason I decided to write this despite the abundance of learning material on ML is that there is still, to a degree, a lack of clarity as to why machine learning is powerful. You see, machine learning has become a powerful hammer these days and to someone with a hammer, everything looks like a nail. To avoid this, it is key to understand the ‘why’ of machine learning ie the primary challenge ML solves. This is what we will be discussing in this post.
So let’s get started
“Once more unto the breach, dear friends, once more”
- William Shakespeare
Let’s begin with the most basic definition of a computer program.
From a user’s perspective every computer program can be explained using 3 things
- Stuff you have
- Stuff you need
You take the stuff you have and give it to a computer program which does magic on it and returns the stuff you need. For example, you need to add 2 numbers, say 32 and 10. Here,
10 are the stuff you have. You feed it to a program which conjures up some unknown magic and returns
42, the stuff you need.
This may seem like a very simplistic explanation. But every program you know, no matter how complex they seem, can be broken into simpler sub programs that consist of these 3 parts.
From the programmer’s perspective though, things are far less magical because it is their job to make the magic happen. When a user tells him that he has some “stuff”, and that he needs some “other stuff” it is the programmer who decides how to do this. In fact, around the world, that’s essentially what programmers do.
Let’s look at some actual code. Here we have the Python code to take two numbers and display their sum.
stuff_he_has_1 = input() # Recieve the first number stuff_he_has_2 = input() # Recieve the second number stuff_he_needs = stuff_he_has_1 + stuff_he_has_2 # Magic print(stuff_he_needs) # Show the sum (the stuff he needs)
Note that the crucial bit of the above code is line 3, where the actual addition happens. This is the “magic” that happens unknown to the user. The rest of the code is to take in data from the user and display the result.
Technically stuff he has, magic and stuff he needs can be called input, process and output respectively. And from now on, we shall use those terms. Input is what goes in to the program, output is what goes out of it and process converts input to output. Simple enough right ? So now, we can establish the following facts.
- A computer program can be conceptually explained using 3 things: Input, Process, Output
- Processing is the series of steps that convert input to output
- It is the duty of the programmer to define the correct process.
“These are the facts of the case, and they are undisputed”
- A Few Good Men (1992)
But, what if …
…the programmer cannot define the process ¯\_(ツ)_/¯
There are scenarios where the steps to convert input to output are not cut-and-dried.
Consider a situation where a user says she has the specifications of a mobile phone she wants to buy. She wants to know an estimate of the price that she will have to pay for such specifications. For example, the user says she wants a phone with 4GB RAM, dual camera, fingerprint sensor etc she asks you an estimate of how much she’ll have to spend for it.
- The inputs are the features of the phone (stuff she has)
- The output is the estimated price (stuff she needs)
This is a trivial problem if you know the exact formluae used by whoever sets the price (the phone companies). If you know exactly how much each feature adds to the price of a phone you can easily write a program for that. But you don’t know that. If you are someone who has been following the cell phone prices you might be able to give a ballpark estimate. But even then making a computer program replicate it is quite difficult for the reason that we did not calculate it, we guessed. We gave an estimate. Thus writing an equivalent line 3 for this problem is going to be awkward at best.
Now let’s consider another problem. The user gives you a word and wants to know the part of speech (Noun, adjective, verb etc) of the word.
- The input: A single english word
- The output: Part of speech
You probably can do this, but can you write program that tells you the same ? When you see a word like “apple” and identify it as a noun, you are able to do so because “apple” means something to you beyond just a sequence of letters. You know it’s a thing. You know it’s most probably red. You know it’s sweet. None of this information can be picked up from the sequence of letters A-P-P-L-E. But that information is the only reason you are able to identify apple as a noun. That information is necessary to cast the “magic”. The problem is, the computer does not have this information. To it, the word “apple” or “monkey” is just a sequence of characters which holds no meaning. If I have a file that contains the word “banana” and I search for “fruit” I’m not going to find a match, because neither “banana” nor “fruit” holds any meaning to the computer like it does for a human. Given this situation, it is difficult to write a ‘line 3’ (remember the python code ?) for this problem.
Even though at first sight both these problems seem different, they suffer from the same challege; the inability to express the output in terms of the input. In the first problem, we do not know the steps required to convert the features of the phone to an estimate of the price. In the second problem we don’t know the steps to extract the meaning of a word from the sequence of letters that form the word. And without those steps (line 3) it is impossible to construct a computer program for the task.
That, right there, is the “why” of machine learning because machine learning allows you to programatically generate your ‘line 3’.
Now that we have established where ML does it’s best job, we can examine it a bit closely. As we said earlier, the problem is coming up with the process part, the line 3. The solution here is to let the machine come up with the process on it’s own ie build a process that generates processes 🤯. Process-ception, I know, but bear with me and take a glance at the diagram below.
Let us consider a very simple example. I know we’ve already had quite a few of those, but I promise this help with the intuition.
See the input-output pairs below
- (10, 15) -> 25
- (5, 5) -> 10
- (100, 100) -> 200
Now guess what goes in the blank
- (20, 40) -> _______
This is an easy guess, 60. By looking at the 3 examples shown, you learned that the process here is to add the two input numbers. And once you knew that, you simply applied the process to the new input to get your required output. It is important to note that the input (20, 40) never appeared in the examples provided to you. And yet you were able to arrive at the correct answer because you did not remember example inputs and the corresponding sums, you learned the process by which the input was converted to output. This is the crux of ‘how’ machine learning works.
The example is trivial to us humans, but human brain is a fine piece of machinery. Making machines do this is really difficult. In fact, most of the research that goes into machine learning these days are really about making computers do things that 5 year olds can do easily.
Let’s go back to the cell phone price prediction again. Let’s say you have a list of phones along with their features. But now you also have a list of corresponding prices as well. So in one column you have the features of the phone and in the other column you have the price of the phone. From these columns if you can figure out the process by which the input features are converted into output prices, you’re practically sorted.
- Input: Mobile phone features, mobile phone price
- Output: The process that converts features to price
It’s okay if that sounds confusing. It is even okay to feel that it is cheating, because if you know the prices of phones why do you need a program to find it for you, right ? But remember our example. The process is not simply remembering each phone and its price, it is learning to convert the features of a phone into a price estimate. This means that it can predict the price of a phone which it has not seen before.
Machine algorithms are responsible for figuring out the process that convert inputs to outputs. This is ‘what’ ML does and there are numerous such algorithms in existence today. But I will not discuss the workings of any of those here. The idea of this article to give an intuitive idea about the essence of machine learning. What we discussed here might be something that you want to keep in the back of your head when reading about actual machine learning algorithms.
Before I wrap up, there is little disclaimer that I wish to add. The analogies that I presented in this article are suited to a particular class of machine learning algorithms called supervised machine learning, where both the inputs and the outputs are available. There is a class of algorithms under unsupervised machine learning. The analogies may seem a bit awkward when you encounter those, but the core idea still remains the same. The process is not explicitly programmed but generated from data. That’s a little something to keep in mind.
That’s all folks
I hope this article was worth a read. I have tried my best to distill my interpretation of ML in this article, but again, a writer is never content with his words. There always seems a gap between what you were trying to tell the reader and what you actually wrote. I hope that wasn’t too much. If you have any feedback or any questions that arose after reading this post please shoot a mail to firstname.lastname@example.org.
Goodbye until next time.