Tuesday, August 30, 2011

Delving in Theory!

Disclaimer: theory is not my forte, since I did not study a math undergrad. This post may be biased of what I think Theory means in the specific area of Machine Learning.


Do you like that iPod you have , does your smartphone apps keep you busy and entertained.  How much do you use a computer for your daily work? Do you know whom do we owe the computer and all the electronic devices to?
Quantum Mechanics (QM)! Material Engineers have to take several courses on QM to create transistors, which electric engineers then use to create computers (roughly speaking), and the Computer Scientists use this computers to create algorithms and spawn companies like Google, Facebook or Foursquare. Thus creating worldwide connectivity and interaction. All of this because QM.

Like you could see in the example, a plain theory like QM spawned an enormous amount of wealth and applications. And we can say the same about Machine Learning. Let's put an example, Support Vector Machines (SVM).

There are 2 ways to do research with SVM's, one is to do research on SVM's and the other is to do research using SVM's. In the latter case, you probably will be an application man. But in the former, you will take care of the deepest construction of the SVM. You'll care not only on how it works, but also on how to improve it. Which kind of kernel do you have to use, and in the extreme, design an entirely new kernel to work with the data you have.

Kernel? What's a kernel? It's safe to assume that some people whom uses SVM's have no idea of what a kernel is, or means. But they do know SVM's are pretty good doing classification. SVM's are one of the (few) algorithms that you can take off the shelf and run it over your data without pre-processing or knowledge of the algorithm...... is a magic box that classifies.

But if you knew what a kernel was, you'll have incredible tools at your disposition. Knowing that, now you can choose the best one (I said choose, not guess) for the data you are using. You can also go as far as to try to increase the classification capabilities of the SVM. Thus obtaining incredibly good results. The catch...... you'll probably spend more time modifying the SVM than analyzing your data. But in any good journal or conference on Learning Theory, they'll care little about the test data, and car a lot on the algorithm. (Note to myself: is no use to mention these qualities of your algorithm in an application conference, most of them would not care about the algorithm, but the results)

AS you can see, when you do research ON a ML topic, you do not care deeply about the application at hand. Rather, you care about the algorithm's performance. Or you also care about the best way to represent specific data: images, characters, sounds, etc.

Doing research in ML theory, as I mentioned in the last post, is deep and hard work. You will rarely delve into a specific application, but you'll know tons of different theories. It's also likely that you'll spend most of your time reading books and not doing actual programs or simulations. At the best, you'll be doing it 50/50.

Note: In a previous post, someone asked me to put references to the claim: "While they (Theory man) go over 200 year old proofs, you'll (application man) go over 10 year old proofs, and while their proofs are 20 pages long, yours will be about 1 page (average)."
I really do not have time to go over different applications, their theories and proofs. But bare in mind, while most of the proofs of some of the Machine Learning Papers fit in a 20 page paper, Fermat's Theorem Proof, published in 1993 was about 100 pages long. If you read Bishop's Book on Machine Learning, you'll find most of the basic principles have "easy"proofs of about half a page.


Next time, we'll go over the dreadful fact of writing papers. When to do it, and why to do it.

Until next time

No comments:

Post a Comment