A Development Story (redistricting)
Feb. 16th, 2010 12:58 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Last week I discovered that the Census bureau has changed the format in which they release the geometry that goes with their data. I use these geometry files to draw my maps of congressional districts. The old format was an ASCII text format with fixed field widths that was a pretty lame format but at least it was well documented. The new format uses 'industry standard' binary formats that the Census bureau didn't bother to specify, they assumed the container format and just wrote documentation for the inner data. It wasn't too hard to look up specifications, and from two authoritative sources and one hackish I found what I needed about the "ESRI Shapefile" and "dBase 3" formats.
Over the course of three days at odd times in evenings I wrote some Python code to parse these files. The tight development loop (no compiling) and nice handy abstractions and libraries made it pretty quick going. Once I had a Python parser for the new census map data files I was pretty sure I understood them.
Then, this weekend, I rewrote it all in Java to be faster and more productionizable and to add some real computation that I hadn't done in the Python code that was just the format parser. Now I have it. I have the new data parsed and understood and I get all the data I need out of it and I can switch over fully and that part of my code is ready for the March 2011 data dump from the Census.
New goal: 1 month after data release, have 43 states of congressional districts (7 have only one rep) and 99 state legislatures (Nebraska is unicameral). I may publish a "redistricting @ home" distributed client to make this happen, because at last check it takes about two months of continuous computer time to get just the 43 congressional maps. Another option is to buy compute resources from Amazon. I might be able to get all I need for $200-$300.
Over the course of three days at odd times in evenings I wrote some Python code to parse these files. The tight development loop (no compiling) and nice handy abstractions and libraries made it pretty quick going. Once I had a Python parser for the new census map data files I was pretty sure I understood them.
Then, this weekend, I rewrote it all in Java to be faster and more productionizable and to add some real computation that I hadn't done in the Python code that was just the format parser. Now I have it. I have the new data parsed and understood and I get all the data I need out of it and I can switch over fully and that part of my code is ready for the March 2011 data dump from the Census.
New goal: 1 month after data release, have 43 states of congressional districts (7 have only one rep) and 99 state legislatures (Nebraska is unicameral). I may publish a "redistricting @ home" distributed client to make this happen, because at last check it takes about two months of continuous computer time to get just the 43 congressional maps. Another option is to buy compute resources from Amazon. I might be able to get all I need for $200-$300.