The Machine Learning Journey

Monday, February 27, 2012

An introduction to linear regression - Cost Function (ML for the Layman)

I've tried keeping away from posting tutorials on ML topics. Mainly because I did not feel well prepared to do it yet. I hardly think I'm prepared now, but I definitely can give it a better shot, and with your feedback, I can ,at least, get an idea on where I can improve.

Disclaimer: Even though this introduction will be basic, and I mean as basic as it can get. You'll still need some knowledge on matrix operations, basic algebra and calculus to get through some of the explanations.

Imagine you want to sell your car, let's say it is a Prius 2007 with 20,000 miles. It is in a very good condition and you would like to do a survey of how much does your car costs in the market. You definitively want to get, as a seller, the best price for it.

So, how do you price a car? If you know nothing about cars (like me), you can go to someone that has a better idea, in our case, it is the internet.

We can see that the price is set by things, like the age, the maker, the mileage, the overall condition, etc. We will call this things "features". So a set of features is basically the characteristics of our car or any object.

So which features to choose? For the sake of simplicity let's choose year and mileage. It's important to notice that there are whole researches around how to choose features, but we are first learners here, so we do not care about that. It is important to know, however, that having many features is not always better than having few features.

Now, that you have chosen a set of features, how do you compare one car to another? We now go and look for data. For car comparison, we can go to different web sites where you can see different combinations of these features for different cars. Your local newspaper's classifieds or craiglist.

Let's create a mock data set of 5 cars using only 2 features, age and mileage. All of them are Prius:

$year=(2007,2005,2006,2007,2010)$
$mileage=(50000, 60000,54000,40000,20000)$
$price=(12000,9700,10500,13000,20000)$

It is intuitive to think that the price of our car will be directly related to age and mileage. A low mileage and a recent year increases the price, while an old car with a high mileage has a lower price. So there should be an abstract way to write this relation.

To model this kind of data, we use linear regression, which states that a variable is the resutl of a linear combination of other variables. That is, our price actually obeys to some combination of mileage and year.

For the first element (12,000 USD for a 2007 model with 50,000 miles):

$12000=a_1*(50000)+a_2*(2007)$

A second characteristic, is that every element has to share those $a_1$ and $a_2$ variables, so we can use those values with our own Prius, there would be no point in describing individual values for each car, when we want to find the best price for our car, so we can write a general equation:

$Price=a_1*mileage+a_2*year$

How do we find the $a_1$ and $a_2$ that solve this model? How do I know that "1" and "1" are not good choices? Well, for starters, summing up the year and the mileage does not seem like a good idea.

We need some function that'll tell us how bad or how good our prediction is. This is usually called the "cost" function. How do we build a cost function, well, our intuition tells us that we have to compare (rest) the truth with our guess. So for our guess of "1" and "1" for $a_1$ and $a_2$:

$Cost=(12000-(1*50000+1*2007))$

We can see there is something off here, since we are looking for the minimum cost, we can make the values in $a$ large enough and we will have incredibly low values (negative values), so we squared them to have a nice function that has a lower bound (it does not go lower than certain value).

$Cost^2=(12000-(1*50000+1*2007))^2$

We now have an intuition that the best we can do is $cost=0$, since a squared number can never be negative, we cannot do any better than that.

This cost, however, is only the cost for 1 example, we need the cost for all our cars. So we sum all of them:

$Total cost^2=\sum_{i=1}^5(Price_i-(1\times Mileage_i+1\times year_i))^2$

We now have the intuition that our choice of 1 and 1 may not be a very good choice at all, just for kicks, lets put how much the cost would be for different values for $a_1$ and $a_2$:

a1,a2

Cost

0,0

917,340,000

0,1

675,707,259

1,0

6,595,340,000

1,1

7,252,615,259

The costs are terrible! There is, however, something happening for "0,1", since the cost decreased compared to "0,0". But there has to be something better than just random guessing right?

Our guess of 1,1 is just terrible. But we gained an intuition, we see that our solution has to be near the "0,1". Also, some of you may notice by now that no matter which values do we put, there is no way we are going to get a zero. It's just not possible. Not without some extra help, which we will call an offset.

Next time, we'll talk about optimization or how to stop you from trying everypossible combination by hand, how to use the offset and a bit about comparing pears and apples.

See you later

Remember to visit my webpage www.leonpalafox.com. And if you want to keep up with my most recent research, you can tweet me at @leonpalafox.
Also, check my Google+ account, be sure to contact me, and send me a message so I follow you as well.

Thursday, January 26, 2012

Geoff Hinton Memes

A week ago, Yann LeCun started a ML related meme, by writing the Geoff Hinton facts (ala Chuck Norris Facts).

I'm writing them here as they appear, and if you have more, please send me yours:

Geoff Hinton doesn't need to make hidden units. They hide by themselves when he approaches

Deep Belief Nets actually believe deeply in Geoff Hinton.

Geoff Hinton uses an infinite amount of training data for each experiment - twice.

Others prove theorems. Geoff Hinton proves axioms.

Geoff Hinton once built a neural network that beat Chuck Norris on MNIST.

Geoff Hinton discovered how the brain really works. Once each year for the last 25 years.

Markov random fields think Geoff Hinton is intractable.

If you defy Geoff Hinton, he will maximize your entropy in no time. Your free energy will be gone even before you reach equilibrium.

Geoff Hinton can make you regret without bounds.

Geoff Hinton can make your weight decay(your weight, but unfortunately not mine)

Geoff Hinton doesn't need support vectors. He can support high-dimensional hyperplanes with his pinky finger.

Geoff Hinton frequents Bayesians.

Most farm-houses are surrounded by nice fields. Geoff Hinton's house is surrounded by mean fields.

All kernels that ever dared approaching Geoff Hinton woke up convolved.

The only kernel Geoff Hinton has ever used was a kernel of truth.

After an encounter with Geoff Hinton, support vectors become unhinged

Geoff Hinton's generalizations are boundless.

Geoff Hinton goes directly to third Bayes.

Never interrupt one of Geoff Hinton's talks: you will suffer his wrath if you maximize the bargin'.

Wednesday, September 28, 2011

The importance of socializing

Socializing is an odd term. People in Computer Science are usually regarded as socially incompetent, or at least plain out weird and introverted. Funny, though, that most of the modern tools (Twitter, Facebook, Google+) people use to interact were created by these socially impaired persons.
But, then again, people from CS are usually the ones that post more actively and have many friends in Facebook and Twitter. So, they do socialize.....

Yet, we are not here to point out this oddities, but to speak on the importance of socializing (which is different from team work) in a research environment. Also, I'll address and recommend some particular sites fro the ML community.

I've found that I feel motivated when I discuss different topics with different people. And even if the problem is different, you still feel good. How many times have you discussed with your adviser or a colleague and found out something you did not consider. Sometimes our thoughts are different when we externalize them. A good idea may seem dumb when spoken out-loud, or a bad idea may have a touch of genius when others hear it.

Teaching something you like is another way to socialize. Prepare a presentation on some random topic, and you'll see you can learn a lot from that topic (if presented to the right audience). I highly endorse making the student present their work or a random topic to an audience. It forces them to understand the material, and in the process they learn a bit on expressing their ideas.

But in this modern age, we have a myriad of tools to achieve these interactions without getting out of our desks (which I have yet to decide whether is a good or a bad thing). We have social networks, where we can find good researchers, we have online lectures, were we can socialize with other people watching the lecture, and thankfully machine learning has been an early adopter as well as an active player.

On social networks, you can have different interactions depending the network you use. I find Twitter lists to be a great source of Machine Learning researchers, feel free to follow mine. Google+ recently releases a feature that allows the users to share their circles, and I have a somewhat good circle of Machine Learning Researchers (Andrew Ng, Nir Friedman and Yan LeCun among others). The important thing about these lists and circles are not only the researchers, but the enthusiastic community of grad students that post, discuss and share ideas and innovations.

Now, I've found that while great, these networks are not really suited for a deep theoretical discussion, and with the advent of QA sites like Quora, I think we have a better forum to externalize doubts and consultations on ML and other topics.

I think the top QA venue in Machine Learning is Metaoptimize (lets plug the ad here). It is a place, where most of the people are devoted Machine Learning students (unlike Reddit's Machine Learning sub-reddit, where most of the people are ML enthusiasts).

In Metaoptimize, you usually will get good answers for most of your questions, and if you don't, you get at least a link and a starting point to keep looking for an answer. People there have their fundamentals right, so if you have specific questions on the intuition and maths of a problem, you'll get your answer there. Most of the top contributors in Metaoptimize are also in my Twitter and Google+ lists.

Metaoptimize is definetily a step in the right direction, but sill, I think more can be done to share and to create a broader community, Andrew Ng and Norvig have recently opened free courses to the world, in which they teach ML and AI. They haven't started yet, but a hefty amount of students are rallying to them.

Our world is open, do not think that doing research in a desk involves only reading books and papers, socializing is a very important part of it, and one you should embrace.

That's everything for tonight

Remember to visit my webpage www.leonpalafox.com. And if you want to keep up with my most recent research, you can tweet me at @leonpalafox.
Also, check my Google+ account, be sure to contact me, and send me a message so I follow you as well.

Friday, September 16, 2011

Every time you write a bad paper an angel cries

"Writing is an art, every word has a specific meaning, and each expressions should hold an objective."

Scientific writing, unlike any other kind of writing has a very specific goal: To report your findings in a clear and concise way. Your opinion while important, has to leave most of the space for the cold, hard, facts. I read once that in writing, academics are like civil engineers, their work is to make sturdy structures that will have a solid foundation, leave the niceties and decorations to the architects.

But how do you know what to write? Usually a good paper's results can be replicated by an informed reader. (In the NIPS reviewers poll, that is one of the conditions). Thus, your paper needs enough information so a fellow researcher does not need to ask you any questions.

Sound easy right? Truth is, it isn't.

Sadly, most people thing that because they can speak, they can write. A large number of people, think that writing the way they speak is the way to do a research paper (or any online publication). However, clarity is a luxury we usually forget when we speak, for example, while saying that there was "a lot" of people in the market is a great expression when telling a story, in a research paper saying that they had "a lot" of data is non representative of the amount, and thus ambiguous.

A research paper, unlike speaking, allows us to rewrite our sentences several times. Remember that is impossible to ask for clarification when reading an article (unless you send an email), so you have to write as clear as possible. All those times you've had to clarify what you meant with a sentence when speaking, are dead sentences in writing. My rule of thumb is that a reader has to understand the paper without me sitting next to them.

But how do we learn to write well? If you are lucky, your advisor is still very prolific, and a lifetime of reading and writing papers have given him at least a good idea of how to structure a paper. With time, you too will have a hold of the most common rules of writing a research paper. Something I tend to do is check for verb consistency, is every noun's action being described by a verb, or not. If you have free nouns in your paper, you will have confusion. If you're not sure, keep yourself from writing complex statements, and instead hold to the basics.

There are other tools you can use to improve your writing. "The elements of Style" by Strunk and White and "On Writing Well" by Zinsser are 2 great books which offer great and basic advice on how to structure a good piece. While the books are mostly oriented towards people writing real literature (for me academic journals are more like technical reviews), they do help you to structure your sentences so they become less ambiguous (clarity, clarity, clarity).

My advise is to practice, you can try writing a blog, writing papers for small conferences, or even mock papers (there is no rule against them) and look for your mistakes, look for things you could write in a better way. Try asking people to read your documents and see how much sense it made to them. Really, I think you have to keep practicing over and over, until you get a firm grasp on paper writing.

Another really important thing is to review. If your paper has typos, it speaks ill of you and your research, it shows you as a sloppy author and thus a sloppy researcher. Take your time to re read your papers. I usually spend around 3 days writing a blog post, first I write it in what we could barely call English, then I re-read the entry and I try to give sense by taking out sentences, useless words and adding clarity. Then I use tools like Microsoft spell checking and style checking to look for the use of odd sentences and passive voice (extreme use of passive voice in a paper adds confusion and usually is better to write using an active form)

It is fair to say I spend 2 to 3 times more doing the reviewing than the writing of the piece, but I know it helps me to write better the next time. So I don't look at it as a burden, but rather as practice.

If you have any other advise on writing let the comment section hear them.

Remember to visit my webpage www.leonpalafox.com. And if you want to keep up with my most recent research, you can tweet me at @leonpalafox.
Also, check my Google+ account, be sure to contact me, and send me a message so I follow you as well.

Tuesday, September 6, 2011

Starting to write

What is the best way to start a book?

Many people will tell you that the best way is to gather experience and then write your book. Others will tell you to write as you gather experience. Truth is..... I do not know yet who is right. While it is true that a lot of great books are written after experience has been learn, it is also true that a lot of great books are written on the go.

So how does this translates to papers in Machine Learning?

For most graduate programs, you'll need to write Conference and Journal papers. In some universities, though, they are not really picky on which journals or conferences, as long as it's published. Others, do care to which conference do you go. Not to mention that your Professor probably has already a set of conferences where he is a regular, thus, asking you to write a paper for such conferences.

So, back to the basic question, do you write as you go, or do you write once you finished every experiment, theory and survey? (at the end of your PhD). Some people will tell you the former is the better, while others, will tell you that is good to have a ton of papers, since most committees will rarely look at the papers themselves, but just at the sheer number of publications.

I think a fair balance is the best policy, you do have to wait until you have good results to publish, but you also have to publish enough so you get to go to different conferences and get feedback from experts in the area you are working in, or at least a good networking. Be careful to remember that really good journal or conferences won't accept papers of half-done research or quick baked results.

Remember, though, that most journals will take their sweet long time to accept your paper, or even reject it. So, if you wait until the end to submit it, you'll definitively will have troubles getting your paper accepted by the end of your PhD. (We are speaking that some IEEE journals take around 7 to 8 months to give you any feedback)

Lastly and most important, when in Rome, do as Romans do. Regardless on your opinion, truth is, you are a simple student. Thus, if you prefer to wait, and your committee wants you to publish.....you'll either publish or go. On the other hand, if the University wants you to go to specific conferences (acceptance rates around 20 or 30 %), you will have to wait, even if you badly want to publish any incremental gain you''ve had.

Good luck writing your first paper, and in the next post, we'll talk about on how to write, and why it is very important for you to be extra careful when writing a paper.

Remember to visit my webpage www.leonpalafox.com. And if you want to keep up with my most recent research, you can tweet me at @leonpalafox.
Also, I've recently started using more and more my Google+ account, be sure to contact me, and send me a message so I follow you as well.

Tuesday, August 30, 2011

Delving in Theory!

Disclaimer: theory is not my forte, since I did not study a math undergrad. This post may be biased of what I think Theory means in the specific area of Machine Learning.

Do you like that iPod you have , does your smartphone apps keep you busy and entertained. How much do you use a computer for your daily work? Do you know whom do we owe the computer and all the electronic devices to?
Quantum Mechanics (QM)! Material Engineers have to take several courses on QM to create transistors, which electric engineers then use to create computers (roughly speaking), and the Computer Scientists use this computers to create algorithms and spawn companies like Google, Facebook or Foursquare. Thus creating worldwide connectivity and interaction. All of this because QM.

Like you could see in the example, a plain theory like QM spawned an enormous amount of wealth and applications. And we can say the same about Machine Learning. Let's put an example, Support Vector Machines (SVM).

There are 2 ways to do research with SVM's, one is to do research on SVM's and the other is to do research using SVM's. In the latter case, you probably will be an application man. But in the former, you will take care of the deepest construction of the SVM. You'll care not only on how it works, but also on how to improve it. Which kind of kernel do you have to use, and in the extreme, design an entirely new kernel to work with the data you have.

Kernel? What's a kernel? It's safe to assume that some people whom uses SVM's have no idea of what a kernel is, or means. But they do know SVM's are pretty good doing classification. SVM's are one of the (few) algorithms that you can take off the shelf and run it over your data without pre-processing or knowledge of the algorithm...... is a magic box that classifies.

But if you knew what a kernel was, you'll have incredible tools at your disposition. Knowing that, now you can choose the best one (I said choose, not guess) for the data you are using. You can also go as far as to try to increase the classification capabilities of the SVM. Thus obtaining incredibly good results. The catch...... you'll probably spend more time modifying the SVM than analyzing your data. But in any good journal or conference on Learning Theory, they'll care little about the test data, and car a lot on the algorithm. (Note to myself: is no use to mention these qualities of your algorithm in an application conference, most of them would not care about the algorithm, but the results)

AS you can see, when you do research ON a ML topic, you do not care deeply about the application at hand. Rather, you care about the algorithm's performance. Or you also care about the best way to represent specific data: images, characters, sounds, etc.

Doing research in ML theory, as I mentioned in the last post, is deep and hard work. You will rarely delve into a specific application, but you'll know tons of different theories. It's also likely that you'll spend most of your time reading books and not doing actual programs or simulations. At the best, you'll be doing it 50/50.

Note: In a previous post, someone asked me to put references to the claim: "While they (Theory man) go over 200 year old proofs, you'll (application man) go over 10 year old proofs, and while their proofs are 20 pages long, yours will be about 1 page (average)."
I really do not have time to go over different applications, their theories and proofs. But bare in mind, while most of the proofs of some of the Machine Learning Papers fit in a 20 page paper, Fermat's Theorem Proof, published in 1993 was about 100 pages long. If you read Bishop's Book on Machine Learning, you'll find most of the basic principles have "easy"proofs of about half a page.

Next time, we'll go over the dreadful fact of writing papers. When to do it, and why to do it.

Until next time

Monday, August 15, 2011

The Application Man!!!

Ask any company how much would they pay for a software, that would allow them to predict in a more reliable way how their products are going to sell. Or how much the competitors are capable of pushing the prices down before losing all profit. Let me tell you...... a lot.

Application, definitively the sweet spot of Machine Learning, it is where most of the money is made and is the thing that most people will relate to when they hear you do something AI related. Barely no one has heard of PageRank, yet everyone knows the Google Company, the same happens with pretty much every machine learning application out there, or any other theory. For example, Quantum Mechanics are the basis for most of the modern electronics we have.

So, if you don't really care on how the algorithms are working, and just care about applying the algorithms for the fame and glory, maybe developing an application might be right for you. Not to say you must entirely disregard the math, just that the math you'll have to go over won't be as dense as the ones statisticians have to use. While they go over 200 year old proofs, you'll go over 10 year old proofs, and while their proofs are 20 pages long, yours will be about 1 page (average).

I can't really emphasize how important is for you to learn the math behind the applications, even if you do not know exactly how is it working, at least is good that you have an idea of what the algorithm is doing. This way, if you have any errors, you can look for solutions in the right places instead of changing variables praying for something good to occur. Also you have bragging rights that you know more math than your peers at undergrad working as software engineers.

There are different machine learning applications, ranking, natural language processing, image processing, activity recognition, etc. However each of this problems have the difficulties and challenges. And different people may be suited for different applications.

How to find the application that best suits you? In my personal point of view, go over you passion. If you like dinosaurs, maybe you could apply recognition algorithms to detect structures and patterns in x-ray scans in the bones. If however, you like financial data, there are some work using Game Theory, and probability to increase your profit.

It would be crazy to try and list every laboratory that has an application and uses machine learning to solve it, instead, I'll list some of the most common machine learning algorithms, and how can you use them.
Of course this list is not exclusive, and there are thousands of different algorithms for the different applications, this is just to give you a head start of where to look and which algorithms you may find interesting to go over. I've chosen some of the most recent algorithms used to solve this problems, those published in ICML and NIPS from the last 10 years.

Financial Data: Markov Chains, Learning, Regression, Gaussian Processes, SVM
Robotics: Kalman Filter, Markov Chain Monte Carlo Methods, Markov Decision Process (Reinforcement Learning), SVM
Biology: Network Structure, Clustering, Network Parameters, Dirichlet Processes, Indian Buffet Processes, SVM
Vision: Markov Random Fields, Belief Networks, Neural Networks, SVM
Natural Language Processing: Conditional Random Fields, Latent Dirichlet Allocation, Mixture Models, SVM

Note: SVM's can solve everything, from a Biology inference to your dishwasher, and are very good out of the box using algorithms.

In future posts we will go over the algorithms and explain them. So if you have a particular interest, stay tuned, because this is just getting interesting.

If you wish to contact me, you can always do so at my Twitter account @leonpalafox, my Google+ account, or my personal webpage www.leonpalafox.com

Take care, and see you next time