The Machine Learning Journey

Monday, May 28, 2012

Reading List for the ICML 2012

The list of accepted papers for the ICML 2012 is out, and following some of my colleagues, I'll post the papers that at first hand cached my eye:

(Disclaimer: Since my research tends to be on nonparametric statistics, I tend to gravitate towards paper on those topics)

Gaussian Process Regression Networks
Andrew Wilson, David Knowles, Zoubin Ghahramani

Abstract: We introduce a new regression framework, Gaussian process regression networks (GPRN), which combines the structural properties of Bayesian neural networks with the nonparametric flexibility of Gaussian processes. GPRN accommodates input (predictor) dependent signal and noise correlations between multiple output (response) variables, input dependent length-scales and amplitudes, and heavy-tailed predictive distributions. We derive both elliptical slice sampling and variational Bayes inference procedures for GPRN. We apply GPRN as a multiple output regression and multivariate volatility model, demonstrating substantially improved performance over eight popular multiple output (multi-task) Gaussian process models and three multivariate volatility models on real datasets, including a 1000 dimensional gene expression dataset.

Quick Opinion: Based on the abstract and a quick reading of the arxiv version, this sure looks like a nice variation for Gaussian Processes. And merging both Bayesian Neural Networks with GP seems both a good idea for specific problems like Gene Regulatory Networks Inference, given that some people have been using Recursive Neural Networks for such tasks.

Modeling Images using Transformed Indian Buffet Processes
KE ZHAI, Yuening Hu, Jordan Boyd-Graber, Sinead Williamson

Abstract: Latent feature models are attractive for image modeling; images generally contain multiple objects. However, many latent feature models ignore that objects can appear at different locations, or require pre-segmentation of images. While the transformed Indian buffet process (tIBP) provides a method for modeling transformation-invariant features in simple, unsegmented binary images, in its current form it is inappropriate for real images because of computational constraints and modeling assumptions. We combine the tIBP with likelihoods appropriate for real images. We also develop an efficient inference scheme using the cross-correlation between images and features that is both theoretically and empirically faster than existing inference techniques. We demonstrate that, using our method, we are able to discover reasonable components and achieve effective image reconstruction in natural images.

Quick Opinion: I could not find the pdf, so based on the Abstract, the paper seems pretty interesting, although I'm curious in which way did they extend tIBP using likelihoods.

Revisiting k-means: New Algorithms via Bayesian Nonparametrics
Brian Kulis, Michael Jordan

Abstract: Bayesian models offer great flexibility for clustering applications—Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for shared clusters across multiple data sets. For the most part, such flexibility is lacking in classical clustering methods such as k-means. In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint. Inspired by the asymptotic connection between k-means and mixtures of Gaussians, we show that a Gibbs sampling algorithm for the Dirichlet process mixture approaches a hard clustering algorithm in the limit, and further that the resulting algorithm monotonically minimizes an elegant underlying k-means-like clustering objective that includes a penalty for the number of clusters. We generalize this analysis to the case of clustering multiple data sets through a similar asymptotic argument with the hierarchical Dirichlet process. We also discuss further extensions that highlight the benefits of our analysis: i) a spectral relaxation involving thresholded eigenvectors, and ii) a normalized cut graph clustering algorithm that does not fix the number of clusters in the graph.

Quick Opinion: I think this extension was something that was missing in ML, I'm very intrigued on this paper in particular, I remember reading on how K-means was a relaxation for Mixture of distributions with circular Gaussians.

Friday, March 30, 2012

Machine Learning and Memes

Inspired by this great post

This is a story of how our life through the peer reviewing process goes in the ML community.

First, we usually start looking for ideas to do new research on, and more often that not, I'm like this:

Then, I start reading papers on the topic, and find out a lot of papers like:

Then, I finally send a paper to a journal or conference to be peer reviewed and I find different types of reviewers:

The grammar defender, who will punish every little grammar mistake and recommend you have your paper checked by the editors of the The New Yorker.

The organizer who want's to increase the perceived quality of the journal/conference by being overly strict when doing reviews.

The guy who knows nothing of your topic, but still, looks at the comparison tables with other works and asks:

The author of one of the papers you cite, who thinks no one but him/her has more authority on the topic:

But some times, you find someone who gives you some expectation that this area may become better overtime:

So, keep submitting so you are lucky enough the three reviewers of your paper are like the last guy.

Good luck

Monday, February 27, 2012

An introduction to linear regression - Cost Function (ML for the Layman)

I've tried keeping away from posting tutorials on ML topics. Mainly because I did not feel well prepared to do it yet. I hardly think I'm prepared now, but I definitely can give it a better shot, and with your feedback, I can ,at least, get an idea on where I can improve.

Disclaimer: Even though this introduction will be basic, and I mean as basic as it can get. You'll still need some knowledge on matrix operations, basic algebra and calculus to get through some of the explanations.

Imagine you want to sell your car, let's say it is a Prius 2007 with 20,000 miles. It is in a very good condition and you would like to do a survey of how much does your car costs in the market. You definitively want to get, as a seller, the best price for it.

So, how do you price a car? If you know nothing about cars (like me), you can go to someone that has a better idea, in our case, it is the internet.

We can see that the price is set by things, like the age, the maker, the mileage, the overall condition, etc. We will call this things "features". So a set of features is basically the characteristics of our car or any object.

So which features to choose? For the sake of simplicity let's choose year and mileage. It's important to notice that there are whole researches around how to choose features, but we are first learners here, so we do not care about that. It is important to know, however, that having many features is not always better than having few features.

Now, that you have chosen a set of features, how do you compare one car to another? We now go and look for data. For car comparison, we can go to different web sites where you can see different combinations of these features for different cars. Your local newspaper's classifieds or craiglist.

Let's create a mock data set of 5 cars using only 2 features, age and mileage. All of them are Prius:

$year=(2007,2005,2006,2007,2010)$
$mileage=(50000, 60000,54000,40000,20000)$
$price=(12000,9700,10500,13000,20000)$

It is intuitive to think that the price of our car will be directly related to age and mileage. A low mileage and a recent year increases the price, while an old car with a high mileage has a lower price. So there should be an abstract way to write this relation.

To model this kind of data, we use linear regression, which states that a variable is the resutl of a linear combination of other variables. That is, our price actually obeys to some combination of mileage and year.

For the first element (12,000 USD for a 2007 model with 50,000 miles):

$12000=a_1*(50000)+a_2*(2007)$

A second characteristic, is that every element has to share those $a_1$ and $a_2$ variables, so we can use those values with our own Prius, there would be no point in describing individual values for each car, when we want to find the best price for our car, so we can write a general equation:

$Price=a_1*mileage+a_2*year$

How do we find the $a_1$ and $a_2$ that solve this model? How do I know that "1" and "1" are not good choices? Well, for starters, summing up the year and the mileage does not seem like a good idea.

We need some function that'll tell us how bad or how good our prediction is. This is usually called the "cost" function. How do we build a cost function, well, our intuition tells us that we have to compare (rest) the truth with our guess. So for our guess of "1" and "1" for $a_1$ and $a_2$:

$Cost=(12000-(1*50000+1*2007))$

We can see there is something off here, since we are looking for the minimum cost, we can make the values in $a$ large enough and we will have incredibly low values (negative values), so we squared them to have a nice function that has a lower bound (it does not go lower than certain value).

$Cost^2=(12000-(1*50000+1*2007))^2$

We now have an intuition that the best we can do is $cost=0$, since a squared number can never be negative, we cannot do any better than that.

This cost, however, is only the cost for 1 example, we need the cost for all our cars. So we sum all of them:

$Total cost^2=\sum_{i=1}^5(Price_i-(1\times Mileage_i+1\times year_i))^2$

We now have the intuition that our choice of 1 and 1 may not be a very good choice at all, just for kicks, lets put how much the cost would be for different values for $a_1$ and $a_2$:

a1,a2

Cost

0,0

917,340,000

0,1

675,707,259

1,0

6,595,340,000

1,1

7,252,615,259

The costs are terrible! There is, however, something happening for "0,1", since the cost decreased compared to "0,0". But there has to be something better than just random guessing right?

Our guess of 1,1 is just terrible. But we gained an intuition, we see that our solution has to be near the "0,1". Also, some of you may notice by now that no matter which values do we put, there is no way we are going to get a zero. It's just not possible. Not without some extra help, which we will call an offset.

Next time, we'll talk about optimization or how to stop you from trying everypossible combination by hand, how to use the offset and a bit about comparing pears and apples.

See you later

Remember to visit my webpage www.leonpalafox.com. And if you want to keep up with my most recent research, you can tweet me at @leonpalafox.
Also, check my Google+ account, be sure to contact me, and send me a message so I follow you as well.

Thursday, January 26, 2012

Geoff Hinton Memes

A week ago, Yann LeCun started a ML related meme, by writing the Geoff Hinton facts (ala Chuck Norris Facts).

I'm writing them here as they appear, and if you have more, please send me yours:

Geoff Hinton doesn't need to make hidden units. They hide by themselves when he approaches

Deep Belief Nets actually believe deeply in Geoff Hinton.

Geoff Hinton uses an infinite amount of training data for each experiment - twice.

Others prove theorems. Geoff Hinton proves axioms.

Geoff Hinton once built a neural network that beat Chuck Norris on MNIST.

Geoff Hinton discovered how the brain really works. Once each year for the last 25 years.

Markov random fields think Geoff Hinton is intractable.

If you defy Geoff Hinton, he will maximize your entropy in no time. Your free energy will be gone even before you reach equilibrium.

Geoff Hinton can make you regret without bounds.

Geoff Hinton can make your weight decay(your weight, but unfortunately not mine)

Geoff Hinton doesn't need support vectors. He can support high-dimensional hyperplanes with his pinky finger.

Geoff Hinton frequents Bayesians.

Most farm-houses are surrounded by nice fields. Geoff Hinton's house is surrounded by mean fields.

All kernels that ever dared approaching Geoff Hinton woke up convolved.

The only kernel Geoff Hinton has ever used was a kernel of truth.

After an encounter with Geoff Hinton, support vectors become unhinged

Geoff Hinton's generalizations are boundless.

Geoff Hinton goes directly to third Bayes.

Never interrupt one of Geoff Hinton's talks: you will suffer his wrath if you maximize the bargin'.

Wednesday, September 28, 2011

The importance of socializing

Socializing is an odd term. People in Computer Science are usually regarded as socially incompetent, or at least plain out weird and introverted. Funny, though, that most of the modern tools (Twitter, Facebook, Google+) people use to interact were created by these socially impaired persons.
But, then again, people from CS are usually the ones that post more actively and have many friends in Facebook and Twitter. So, they do socialize.....

Yet, we are not here to point out this oddities, but to speak on the importance of socializing (which is different from team work) in a research environment. Also, I'll address and recommend some particular sites fro the ML community.

I've found that I feel motivated when I discuss different topics with different people. And even if the problem is different, you still feel good. How many times have you discussed with your adviser or a colleague and found out something you did not consider. Sometimes our thoughts are different when we externalize them. A good idea may seem dumb when spoken out-loud, or a bad idea may have a touch of genius when others hear it.

Teaching something you like is another way to socialize. Prepare a presentation on some random topic, and you'll see you can learn a lot from that topic (if presented to the right audience). I highly endorse making the student present their work or a random topic to an audience. It forces them to understand the material, and in the process they learn a bit on expressing their ideas.

But in this modern age, we have a myriad of tools to achieve these interactions without getting out of our desks (which I have yet to decide whether is a good or a bad thing). We have social networks, where we can find good researchers, we have online lectures, were we can socialize with other people watching the lecture, and thankfully machine learning has been an early adopter as well as an active player.

On social networks, you can have different interactions depending the network you use. I find Twitter lists to be a great source of Machine Learning researchers, feel free to follow mine. Google+ recently releases a feature that allows the users to share their circles, and I have a somewhat good circle of Machine Learning Researchers (Andrew Ng, Nir Friedman and Yan LeCun among others). The important thing about these lists and circles are not only the researchers, but the enthusiastic community of grad students that post, discuss and share ideas and innovations.

Now, I've found that while great, these networks are not really suited for a deep theoretical discussion, and with the advent of QA sites like Quora, I think we have a better forum to externalize doubts and consultations on ML and other topics.

I think the top QA venue in Machine Learning is Metaoptimize (lets plug the ad here). It is a place, where most of the people are devoted Machine Learning students (unlike Reddit's Machine Learning sub-reddit, where most of the people are ML enthusiasts).

In Metaoptimize, you usually will get good answers for most of your questions, and if you don't, you get at least a link and a starting point to keep looking for an answer. People there have their fundamentals right, so if you have specific questions on the intuition and maths of a problem, you'll get your answer there. Most of the top contributors in Metaoptimize are also in my Twitter and Google+ lists.

Metaoptimize is definetily a step in the right direction, but sill, I think more can be done to share and to create a broader community, Andrew Ng and Norvig have recently opened free courses to the world, in which they teach ML and AI. They haven't started yet, but a hefty amount of students are rallying to them.

Our world is open, do not think that doing research in a desk involves only reading books and papers, socializing is a very important part of it, and one you should embrace.

That's everything for tonight

Remember to visit my webpage www.leonpalafox.com. And if you want to keep up with my most recent research, you can tweet me at @leonpalafox.
Also, check my Google+ account, be sure to contact me, and send me a message so I follow you as well.

Friday, September 16, 2011

Every time you write a bad paper an angel cries

"Writing is an art, every word has a specific meaning, and each expressions should hold an objective."

Scientific writing, unlike any other kind of writing has a very specific goal: To report your findings in a clear and concise way. Your opinion while important, has to leave most of the space for the cold, hard, facts. I read once that in writing, academics are like civil engineers, their work is to make sturdy structures that will have a solid foundation, leave the niceties and decorations to the architects.

But how do you know what to write? Usually a good paper's results can be replicated by an informed reader. (In the NIPS reviewers poll, that is one of the conditions). Thus, your paper needs enough information so a fellow researcher does not need to ask you any questions.

Sound easy right? Truth is, it isn't.

Sadly, most people thing that because they can speak, they can write. A large number of people, think that writing the way they speak is the way to do a research paper (or any online publication). However, clarity is a luxury we usually forget when we speak, for example, while saying that there was "a lot" of people in the market is a great expression when telling a story, in a research paper saying that they had "a lot" of data is non representative of the amount, and thus ambiguous.

A research paper, unlike speaking, allows us to rewrite our sentences several times. Remember that is impossible to ask for clarification when reading an article (unless you send an email), so you have to write as clear as possible. All those times you've had to clarify what you meant with a sentence when speaking, are dead sentences in writing. My rule of thumb is that a reader has to understand the paper without me sitting next to them.

But how do we learn to write well? If you are lucky, your advisor is still very prolific, and a lifetime of reading and writing papers have given him at least a good idea of how to structure a paper. With time, you too will have a hold of the most common rules of writing a research paper. Something I tend to do is check for verb consistency, is every noun's action being described by a verb, or not. If you have free nouns in your paper, you will have confusion. If you're not sure, keep yourself from writing complex statements, and instead hold to the basics.

There are other tools you can use to improve your writing. "The elements of Style" by Strunk and White and "On Writing Well" by Zinsser are 2 great books which offer great and basic advice on how to structure a good piece. While the books are mostly oriented towards people writing real literature (for me academic journals are more like technical reviews), they do help you to structure your sentences so they become less ambiguous (clarity, clarity, clarity).

My advise is to practice, you can try writing a blog, writing papers for small conferences, or even mock papers (there is no rule against them) and look for your mistakes, look for things you could write in a better way. Try asking people to read your documents and see how much sense it made to them. Really, I think you have to keep practicing over and over, until you get a firm grasp on paper writing.

Another really important thing is to review. If your paper has typos, it speaks ill of you and your research, it shows you as a sloppy author and thus a sloppy researcher. Take your time to re read your papers. I usually spend around 3 days writing a blog post, first I write it in what we could barely call English, then I re-read the entry and I try to give sense by taking out sentences, useless words and adding clarity. Then I use tools like Microsoft spell checking and style checking to look for the use of odd sentences and passive voice (extreme use of passive voice in a paper adds confusion and usually is better to write using an active form)

It is fair to say I spend 2 to 3 times more doing the reviewing than the writing of the piece, but I know it helps me to write better the next time. So I don't look at it as a burden, but rather as practice.

If you have any other advise on writing let the comment section hear them.

Remember to visit my webpage www.leonpalafox.com. And if you want to keep up with my most recent research, you can tweet me at @leonpalafox.
Also, check my Google+ account, be sure to contact me, and send me a message so I follow you as well.

Tuesday, September 6, 2011

Starting to write

What is the best way to start a book?

Many people will tell you that the best way is to gather experience and then write your book. Others will tell you to write as you gather experience. Truth is..... I do not know yet who is right. While it is true that a lot of great books are written after experience has been learn, it is also true that a lot of great books are written on the go.

So how does this translates to papers in Machine Learning?

For most graduate programs, you'll need to write Conference and Journal papers. In some universities, though, they are not really picky on which journals or conferences, as long as it's published. Others, do care to which conference do you go. Not to mention that your Professor probably has already a set of conferences where he is a regular, thus, asking you to write a paper for such conferences.

So, back to the basic question, do you write as you go, or do you write once you finished every experiment, theory and survey? (at the end of your PhD). Some people will tell you the former is the better, while others, will tell you that is good to have a ton of papers, since most committees will rarely look at the papers themselves, but just at the sheer number of publications.

I think a fair balance is the best policy, you do have to wait until you have good results to publish, but you also have to publish enough so you get to go to different conferences and get feedback from experts in the area you are working in, or at least a good networking. Be careful to remember that really good journal or conferences won't accept papers of half-done research or quick baked results.

Remember, though, that most journals will take their sweet long time to accept your paper, or even reject it. So, if you wait until the end to submit it, you'll definitively will have troubles getting your paper accepted by the end of your PhD. (We are speaking that some IEEE journals take around 7 to 8 months to give you any feedback)

Lastly and most important, when in Rome, do as Romans do. Regardless on your opinion, truth is, you are a simple student. Thus, if you prefer to wait, and your committee wants you to publish.....you'll either publish or go. On the other hand, if the University wants you to go to specific conferences (acceptance rates around 20 or 30 %), you will have to wait, even if you badly want to publish any incremental gain you''ve had.

Good luck writing your first paper, and in the next post, we'll talk about on how to write, and why it is very important for you to be extra careful when writing a paper.

Remember to visit my webpage www.leonpalafox.com. And if you want to keep up with my most recent research, you can tweet me at @leonpalafox.
Also, I've recently started using more and more my Google+ account, be sure to contact me, and send me a message so I follow you as well.