EuroPython 2014 - Session: Topic Modeling For Fun and Profit

Schlagwörter:: information retrieval, natural language processing, topic modeling

Ort:: A05/A06

Dauer:: 24. Juli 2014 14:00 - 17:00

Sprache:: Englisch

Zielgruppe:: Advanced

This is a hands-on workshop for extracting and utilizing semantic topics from large collections of natural language texts.

By the end, participants will have built an application for efficiently processing, indexing and querying the entire English Wikipedia, using wondrous Python tools.

Workshop assumes knowledge of intermediate Python concepts (classes, generators, iterators).

20 min: motivation & dataset: the English Wikipedia
30 min: NLP: tokenization, lemmatization (textblob)
60 min: topic modeling (gensim)
60 min: document indexing, querying, parallelization (gensim)
10 min: cushion/extra: "Wikipedia similarity" web app (flask)

Install beforehand: IPython, NumPy, SciPy, TextBlob, Gensim + optionally Flask+Angular (all Python & open-source).

Linux, OSX and Windows are all fine.

Session: Topic Modeling For Fun and Profit

Vortragende(r):