Log in / create account Article Discussion Edit History Go to the site toolbox

Election Results Scraper

From Reporting Cookbook: www.forjournalists.com/cookbook

This Python script adapts Derek Willis' College Football Penalty Finder script to download elections results for the 2004 U.S. Presidential Election by county.

The results are found on USA Today's election results web sites. Like the ESPN.com play-by-play logs, the URL is consistent. All you have to do is have the script plug in each state code to complete the URL.

The script pulls out the county name, the number of precincts in the county, the number of precincts reporting, and the vote totals for Bush, Kerry and Nader for each state's results page. The script then writes this to a text file, creating a new line for each record and also inserts the state name as a field. Each field is seperated by a | character.

Word of caution, the USA Today data is not 100% accurate. For example, Maine's results page lists more counties than actually exist. It looks, and I'm guessing here, like it was pulling in precinct results instead.

Thanks to Derek and Ryan Konig on helping me work out the kinks. --Mizzousundevil 23:34, 17 January 2007 (UTC)

# import required modules
import urllib, re

# set up base url for games
url_base = 'http://www.usatoday.com/news/politicselections/vote2004/PresidentialByCounty.aspx?sp='
results= file('results.txt', 'w')
 
# create pattern to match, collecting yardage (an integer) and penalty info (text)
countyname = re.compile("""<td class="notch_.*?" width=.153.><b>(.*?)</b></td><td class="notch_.*?" align="Right" width="65">(.*?)</td><td class="notch_.*?" align="Right" width="70">(.*?)</td><td class="notch_.*?" align="Right" width="60">(.*?)</td><td class="notch_.*" align="Right" width="60">(.*?)</td><td class="notch_.*?" align="Right" width="60">(.*?)</td>""")
 
# list of state ids
States = ['AL', 'AK', 'DC', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE','FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY']
 
# set up loop to fetch game logs
for state in States:
        SState = urllib.urlopen(url_base+state+'&oi=P&rti=G&tf=l').read()
        RResults = countyname.findall(SState)
        for county, votes1, votes2, votes3, votes4, votes5 in RResults:
                results.write('\n'+state+'|'+county+'|'+votes1+'|'+votes2+'|'+votes3+'|'+votes4+'|'+votes5)
results.close()

Site Toolbox:

Personal tools
Attribution-Noncommercial-Share Alike 3.0 Unported
This page was last modified 23:37, 17 January 2007. - This page has been accessed 991 times. - Disclaimers - About Reporting Cookbook