Session: Topic Modeling For Fun and Profit

Schlagwörter:
, ,
Ort:
A05/A06
Dauer:
24. Juli 2014 14:00 - 17:00
Sprache:
Englisch
Zielgruppe:
Advanced

This is a hands-on workshop for extracting and utilizing semantic topics from large collections of natural language texts.

By the end, participants will have built an application for efficiently processing, indexing and querying the entire English Wikipedia, using wondrous Python tools.

Workshop assumes knowledge of intermediate Python concepts (classes, generators, iterators).

  • 20 min: motivation & dataset: the English Wikipedia
  • 30 min: NLP: tokenization, lemmatization (textblob)
  • 60 min: topic modeling (gensim)
  • 60 min: document indexing, querying, parallelization (gensim)
  • 10 min: cushion/extra: "Wikipedia similarity" web app (flask)

Install beforehand: IPython, NumPy, SciPy, TextBlob, Gensim + optionally Flask+Angular (all Python & open-source).

Linux, OSX and Windows are all fine.