This algorithm converts the given 5-digit standard U.S. ZIP codes into its congressional districts and geographical coordinates (latitude and longitude). Download the most recent version of the plugin here.
This plugin only support U.S. ZIP codes. It convert 5-digits ZIP codes to their belonging congressional district. It is an external plugin since the data size is so large. The dataset is based on the 2012 election (113th Congress).
Words for developers: Please do take a look at the ZIP code wiki at here to have a better understand on how U.S. ZIP+4 code system works. The first 5-digits number in ZIP code is called Uzip. The last 4-digits number in the ZIP+4 code is Post Office box number which can refer to here.
The challenge of the implementation is the design of the mapping model that used to look up congressional districts from ZIP codes. To understand the metadata file (provided by GovTrack), create a mapping model with constant (O(1)) look up time and easy to managed. The implementation detail is documented in the source code.
The following will provide a high level view of the design.
The output table contains all columns of the input table with three new columns (Congressional district, latitude and longitude).
Here is a four steps guide to use the plugin:
5-digit ZIP codes with multiple congressional districts, empty entries and invalid ZIP codes that failed to be geocoded will list in warning messages on the console.
The output of this algorithm is the original input table with additional 3 columns (Congressional district column, latitude column and longitude column). ZIP codes that failed to be geocoded will have blank entries.
Our benchmark is 50,000 ZIP codes per second.
Geomap the congressional districts
The geocoding algorithm was authored, implemented, integrated and documented by Chin Hua Kong. Many thanks to the Sprint team for providing advices and suggestions. Many thanks to GovTrack that provides ZIP to district mapping data and district's geolocation information. Thanks to Carl Malamud and Aaron Swartz, that make the data available on WATCHDOG.NET for GovTrack.
It is interesting to work on this algorithm from zero knowledge of ZIP codes and congressional district. A lot exploring works and analysis are done during development which have caused the design and preparation period in Sprint longer than expected. There is a lot of mapping databases available for sale. However, we are lucky to found the GovTrack that provide all free data and web service for the mapping. A lot of revise and improvement were done during the development which make the plugin in better and accurate. It is fun and worth for the knowledge I gained. Now I have better ZIP code system knowledge and congressional district concept. V!
The data used to power this plugin was originally sourced from the GovTrack.us website. As of the 113th Congress, they no longer support or update the district to geolocation or zip code to district data. We have recently updated the data to reflect the most current 113th Congress data using the following sources:
The data from this site must be parsed correctly before being used in this tool. We used the following Python scripts to parse this data before including it in the plugin. Refer to the script comments for documentation:
For the convenience of users, we have already pulled this new data, parsed it, and included it in the most recent build. For anyone who wants to use legacy data for the 112th Congress, however, those data files may be found here: