Thursday, October 1, 2015

How I begin machine learning

Today, one friend asked me about how to start to learn machine learning, I replied her the way I started and the resources I think is great, I listed here, hope it is also useful to others who want to learn machine learning. Machine learning as most of other discipline, is a matter of accumulation, so read more and exercise more, you will soon find machine learning is not that difficult, and fun to learn, good luck, and let me know if you have more questions. 

I started to learn Machine learning 2 or 3 years ago, here's how I learned it:

Python and R are the most popular tools in data science, especially for small prototype project, so I learned python first, and now using python for all my stuff. R is really good too, especially for some statistical analysis, it may have better or easy ways to solve problem than python, but I just don't have time to learn R. So if you have time, learn both, otherwise, choose one, I recommend Python, after all, it is a general programming language that can do everything you want. 

Scala is another language I am learning, which is a functional programming language designed for scale up project (at least I think the name looks like scale ^)^), it is used in Apache Spark, which is one of the most popular framework for distributed computing, along with Hadoop, if you want to do data science, you'd better learn Spark for big data analysis. It has other api's, e.g. python, java, R, but scala is the best option if you are planning to use Spark in the future. 

Machine learning:
In the following I just put the resources I used to learn ML (also only good ones):

Online course:
(1) Learning from data, a very good course (and book too) for the basics from Caltech. It is a little theoretical than Ng's course on coursera, but it will give you a very nice foundation of machine learning.
(2) I guess Ng's machine learning course is the most popular one, and more practical, I think this is also the one you are talking about, it is a nice course as starting point, but he uses octave, which you will like if you use matlab. Anyway, after this course, you will only grab very shallow part of ML, if you go on youtube, Ng's video record from one semester's course, it has much more content than the coursera version, you can take it afterwards. 
(3) You also need good statistical background, which I didn't have when I started out, so I also took some statistical courses online, or you can read a related book. The one I took is Statistic One on coursera, which is kind of boring, but useful. 
(4) I also took Stanford's Introduction to statistical learning online, and read through the book, it is also a very good introduction course of ML
(5) Mining of massive datasets, another great course from Stanford, that covers more application of data science that used in industry. 

My favorite part, I almost collected all popular machine learning books by now ^)^ But not read all of them ^)^
(1) Learning from data I talked above. 
(2) An introduction of statistical learning I talked above. You can use it while taking the class
(3) Machine learning by Tom Mitchell, very classical book, even though it is a bit outdated, it is still one of the best introduction book on ML
(4) Machine learning: An Algorithmic Perspective. A nice book focus on python implementation, I especially like the neural network chapter, very clear!
(5) Applied predictive model, one of my favorite, it covers a lot of stuff not in the above books but also very useful when you doing ML project.  
(6) All of statistics, a very nice and readable book to catch up with statistics. 
(7) Python Machine Learning, a very nice practical book if you want to quickly build models in python. 

After the above introduction books, you will have a very good working knowledge of machine learning, I then recommend to the two bible level books:
(8) Pattern Recognition and Machine learning. This is the most famous ML book, it is heavy on Bayesian point of view, even today, I haven't finish it from cover to cover, I only read the parts which I really interested, but it will take your ML skill to a higher level. 
(9) The elements of statistical learning. It is a great book too, but also very statistical ^)^ Both this book and the PRML book are not good for beginning ML, but improving your skills later.

Like you said, the best way to learn machine learning is to work through easy project (but need to understand the underlying theory), so the following are the ones I think good for beginner:
(1) Kaggle definitely is a good place to play around, it has well defined questions and datasets, and also has the community to help you improve you skills when you ask questions. 
(2) UCI machine learning repository has many datasets you can try out
(3) My favorite machine learning lib is sklearn, go to browse their example gallery, and follow their tutorial, you will soon be able to apply it to various problems. 

No comments:

Post a Comment