Session: An introduction to Machine learning with Scikit-learn

July 23, 2014 10:00 - 13:00

This tutorial will introduce the basics of machine learning, and how these learning tasks can be accomplished using Scikit-Learn. By the end of the tutorials, participants will be poised to take advantage of Scikit-learn's wide variety of machine learning algorithms to explore their own data sets.

Tutorial objective

Machine Learning develops algorithms which can learn from previously-seen data in order to make predictions about future data. It undergoes fast progress and is the focus of many startup creation, leveraging the accumulation of user-centric data via Internet services.

Scikit-learn is a Python module that builds upon the scientific-Python tools such as numpy and scipy to deliver machine learning tools for the non-specialist.

Tutorial outline

  • Basics of numpy and matplotlib for manipulating and visualizing data.
  • Basic concepts of machine learning
  • Simple classification examples
  • Simple regression examples
  • Measuring model performance: cross-validation
  • Extracting features from text
  • Linear Models
  • Random forests and boosted trees
  • On-line learning to tackle big data
  • Dimensionality reduction: clustering and projections

Teaching method

This is a hands-on course. Students are strongly encouraged to work along with the trainer at the interactive prompt. There will be exercises the students need to do on their own. Experience shows that this active involvement is essential for an effective learning.

Software used

Please bring your laptop with the operating system of your choice (Linux, Mac OS X, Windows). In addition to Python 2.6 or 2.7 or 3.X we need:

  • IPython (for interactive work with scientific plotting)
  • Matplotlib
  • scikit-learn and its dependencies (numpy, scipy)

If installing all these requirements, that have compiled dependencies, is difficult, consider using anaconda or Canopy

Intended Audience

Python programmers who would like build predictive engines from data.

Audience level

Programmers with good Python knowledge. No prior knowledge of machine learning, scikit-learn or scientific programming is needed.