EuroPython 2014 - Session: Topic Modeling For Fun and Profit

Tags:: information retrieval, natural language processing, topic modeling

Location:: A05/A06

Duration:: July 24, 2014 14:00 - 17:00

Language:: English

Target-audience:: Advanced

This is a hands-on workshop for extracting and utilizing semantic topics from large collections of natural language texts.

By the end, participants will have built an application for efficiently processing, indexing and querying the entire English Wikipedia, using wondrous Python tools.

Workshop assumes knowledge of intermediate Python concepts (classes, generators, iterators).

20 min: motivation & dataset: the English Wikipedia
30 min: NLP: tokenization, lemmatization (textblob)
60 min: topic modeling (gensim)
60 min: document indexing, querying, parallelization (gensim)
10 min: cushion/extra: "Wikipedia similarity" web app (flask)

Install beforehand: IPython, NumPy, SciPy, TextBlob, Gensim + optionally Flask+Angular (all Python & open-source).

Linux, OSX and Windows are all fine.

Session: Topic Modeling For Fun and Profit

Speakers: