Faculty of Political Science
University of Milan

Career Development Observatory & Research Management Seminar
The Collective Action of Data Collection: Innovations in Gathering Political Data
Hands on' session introducing some of the software tools that facilitate 
working with heterogeneous data sources.

Friday 13th of March, 2009

Holger Döring
University of Konstanz
Holger.Doering@uni-konstanz.de


Web scraping example

 * requires Python 2.4, 2.5 or 2.6
   + does not work with Python 3.0 
 * making use of Beautiful Soup html parser
   (http://www.crummy.com/software/BeautifulSoup)
 * use Firebug add-on to explore structure of web pages
 * see also http://en.wikipedia.org/wiki/Web_scraping

 * I'll explain only the script 'getfiles.py'
 * 'htmltocsv.py' requires you to dive a little more into Python

