﻿Faculty of Political Science
University of Milan

Career Development Observatory & Research Management Seminar
The Collective Action of Data Collection: Innovations in Gathering Political Data
Hands on session introducing some of the software tools that facilitate 
working with heterogeneous data sources.

Friday 13th of March, 2009

Holger Doering
University of Konstanz
Holger.Doering@uni-konstanz.de


PARLGOV DATABASE

 * data provided in folder 'data'
 * web interface to the data at http://dev.parlgov.org/
   + user: pg_guest -- password: pg_guest_$?
 * writing the documentation has yet to be finished
   + for a documentation of tables and variables see 
     http://dev.parlgov.org/doc/tables/
 * coding of Italian elections will be finished until the beginning of the workshop
 * please let me know about every issues, bugs, suggestions you have


PREPARATION (not required but helpful)
 * browse through http://dev.parlgov.org/ (see above)
 * explore data tables in folder 'data/csv'
 * skim papers in folder 'papers'
   + Hix/March -- only data section and Model 7 relevant
   + Doering -- skim and Jackman for an R perspective on the issue
   + Christen -- just to know how name matching may work
 * optional
   + install Firefox add-on SQLite Manager and explore 'data/parlgov-public.db'
   + run R scripts
 

STRUCTURE OF HANDS ON SESSION

1. Working with ParlGov I -- no new software
  * introduces you to ParlGov and its data
  * required data provided in folder 'data/csv'
  * I'll introduce the web interface at dev.parlgov.org
  * no knowledge of 'non-standard' software needed
    + requires a spreadsheet programme (eg. Excel, OOCalc)

2. Working with ParlGov II -- the optimal toolbox
  * introduces the full potential of ParlGov, its data and software
  * required data provided in file 'data/parlgov-public.db'
    + SQLite3 file (www.sqlite.org)
  * we'll do some simple exploration of the database
    + requires SQLite Manager add-on for Firefox
      (addons.mozilla.org/en-US/firefox/addon/5817)
  * more important: I'll demonstrate usage in R (cran.r-project.org)
    + first: we do some simple studies of government formation
      * Is the median party more likely to be a government member?
        + getting data from ParlGov database in R
        + determine median parties with R functions
        + files are provided in 'script' folder
    + second: I'll provide an example to create a replication data set with ParlGov
      * we use a paper by Hix and March 2007 and create a data set to study
        the second-order effect in EP elections
      * files are provided in 'script/epsecond-order' folder

3. Advanced issues -- web scraping and record linkage techniques (presentation only)
  * I'll give a short overview on Python for political methodologists
    + read Doering 2008 to get an introduction (folder 'papers')
  * example scripts provided in folder 'advanced'
  * web scraping example
    + downloading and converting data into csv 	from 
      http://elezionistorico.interno.it/index.php?tp=C
    + using Firebug add-on in Firefox to explore structure of html page
  * record linkage example
    + combining information on Italien MPs and MEPs
      - http://legislature.camera.it
      - http://www.europarl.europa.eu/members/archive.do?language=EN
    

