Tags:: scikit-learn, machine learning

Location:: A05/A06

Duration:: July 23, 2014 10:00 - 13:00

Language:: English

Target-audience:: Expert

This tutorial will introduce the basics of machine learning, and how these learning tasks can be accomplished using Scikit-Learn. By the end of the tutorials, participants will be poised to take advantage of Scikit-learn's wide variety of machine learning algorithms to explore their own data sets.

Tutorial objective

Machine Learning develops algorithms which can learn from previously-seen data in order to make predictions about future data. It undergoes fast progress and is the focus of many startup creation, leveraging the accumulation of user-centric data via Internet services.

Scikit-learn is a Python module that builds upon the scientific-Python tools such as numpy and scipy to deliver machine learning tools for the non-specialist.

Tutorial outline

Basics of numpy and matplotlib for manipulating and visualizing data.
Basic concepts of machine learning
Simple classification examples
Simple regression examples
Measuring model performance: cross-validation
Extracting features from text
Linear Models
Random forests and boosted trees
On-line learning to tackle big data
Dimensionality reduction: clustering and projections

Teaching method

This is a hands-on course. Students are strongly encouraged to work along with the trainer at the interactive prompt. There will be exercises the students need to do on their own. Experience shows that this active involvement is essential for an effective learning.

Software used

Please bring your laptop with the operating system of your choice (Linux, Mac OS X, Windows). In addition to Python 2.6 or 2.7 or 3.X we need:

IPython (for interactive work with scientific plotting)
Matplotlib
scikit-learn and its dependencies (numpy, scipy)

If installing all these requirements, that have compiled dependencies, is difficult, consider using anaconda http://continuum.io/downloads or Canopy https://www.enthought.com/products/canopy/

Intended Audience

Python programmers who would like build predictive engines from data.

Audience level

Programmers with good Python knowledge. No prior knowledge of machine learning, scikit-learn or scientific programming is needed.

Session: An introduction to Machine learning with Scikit-learn