bolson: (Default)
is not to store it at all.
A unique id for a US Census block winds up being 15 decimal digits, which fits handily into an 8 byte int.
Actually there are less than 10,000,000 blocks in the US, so that could easily be a 32 bit number.
But if what I really want to do is store a mapping from each block to district number for each block (easily a 1 byte number), the smallest way to store this is just a list of district numbers. Use the Census data file as a canonical ordering of the blocks.
CSV for this becomes 15 decimal digits, comma, one to three decimal digits, newline. 20 bytes vs 1.
For the hundreds of thousands of blocks in Texas, after gzipping the CSV, this is a 2372 KB file. gzipped byte list is 32 KB.
Sadly, a CSV file in a .zip archive seems to be the common interchange format for these things.
At least I get to use my format between my client and my server.
bolson: (Default)
I should write about New Zealand, but
the Census started releasing data!
Just 4 states have the detailed data needed for redistricting out yet, but that'll keep me busy for a few days parsing it and loading it in to my solver.
bolson: (Default)
And now I'm thinking about designing a sign to carry. In the sense that the whole system is mad and needs fixing, I could vie for either of my favorite system reforms: impartial redistricting and rankings ballot voting.

Should I:
A) Make a sign with one side each
B) all voting
C) all redistricting
D) not make a sign, signs are dumb
bolson: (Default)
I wrote my own rasterizer. Given some loops of points, yield pixels. I probably didn't do it in the most efficient way, but on the plus side I got to run exactly the code I wanted over every pixel, yielding a somewhat unusual setup where every pixel is associated with a 64 bit key, rather than a color (they key is used to look up the color later. rasterize once, color many). On the down side, there are annoying corner cases that a rugged rigorous library would have taken care of, like what happens when the scan line happens to exactly intersect a point. In my code this can lead to either under-counting or over-counting of that intersection. Uf. Fix it eventually, worked around for now by winding up leaving that scan line blank out of the offending polygon.
bolson: (Default)
Lots of redistricting hacking lately, mostly in the surrounding glue scripts. I just finished writing the scripts that will allow easy running of a client that downloads precompiled data and runs it, in a fashion similar to SETI@Home or other @Home-style projects. All you need is Python and one binary. I just need to set up a server with all the precompiled data on it.

Also, someone else taking an interest in my code and trying a few things and finding shortcomings turns out to be a great motivator. That and a confluence with some other things that made want to hack on this have made for a very productive couple weeks.

(cross-posted from
bolson: (Default)
Law makers need more software engineer friends. We know some things about designing large systems of complex rules for sanity and future maintainability. I read two pieces of law yesterday (redistricting related, one passed, one on the ballot in Nov) that fail miserably in some obvious-to-me ways.

New Blog

Mar. 2nd, 2010 09:38 am
bolson: (Default)
I've been doing a lot of work on my redistricting program lately, and decided to found a blog specifically for that subject. I'll probably link to major developments, but if you're curious and want to stick it in your RSS reader:

Also, note the new domain. B Districting, rhymes with re-districting, but it's from me, B, get it?
bolson: (Default)
Last week I discovered that the Census bureau has changed the format in which they release the geometry that goes with their data. I use these geometry files to draw my maps of congressional districts. The old format was an ASCII text format with fixed field widths that was a pretty lame format but at least it was well documented. The new format uses 'industry standard' binary formats that the Census bureau didn't bother to specify, they assumed the container format and just wrote documentation for the inner data. It wasn't too hard to look up specifications, and from two authoritative sources and one hackish I found what I needed about the "ESRI Shapefile" and "dBase 3" formats.

Over the course of three days at odd times in evenings I wrote some Python code to parse these files. The tight development loop (no compiling) and nice handy abstractions and libraries made it pretty quick going. Once I had a Python parser for the new census map data files I was pretty sure I understood them.
Then, this weekend, I rewrote it all in Java to be faster and more productionizable and to add some real computation that I hadn't done in the Python code that was just the format parser. Now I have it. I have the new data parsed and understood and I get all the data I need out of it and I can switch over fully and that part of my code is ready for the March 2011 data dump from the Census.

New goal: 1 month after data release, have 43 states of congressional districts (7 have only one rep) and 99 state legislatures (Nebraska is unicameral). I may publish a "redistricting @ home" distributed client to make this happen, because at last check it takes about two months of continuous computer time to get just the 43 congressional maps. Another option is to buy compute resources from Amazon. I might be able to get all I need for $200-$300.
bolson: (Default)
I've updated my impartial automatic redistricting site to include the racial breakdown of all current congressional districts (sometimes interesting by itself) and that of the compactness based districts I have come up with. If you want you can jump directly to where XX is any US state abbreviation to see what's up for a state you're interested in.
bolson: (Default)
From this morning's Democracy Now:
"... people who are incarcerated in Upstate communities are actually counted as residents of those districts, not as residents of the districts in New York City, where the majority of our state’s prisoners come from."
"... can’t vote, but count as residents ..."

So, a positive feedback cycle: powerful state legislators get prison projects built in their districts, get more people for the count but have a relatively smaller voter base. Small voter base likes the money coming in to the district, likes the power they have, re-elects their legislator.
I wonder how pronounced the situation is. I should check the Census data to see if there is an "in prison" column to the data along with all the other attributes.
bolson: (Default)
I went to Ignite Boston 5, an O'Reilly sponsored tech geek gathering, and they seemed to be gathering on a theme of data for social good and democratized data, so I presented a talk on my impartial automatic redistricting work.
It was great! (except that I hadn't rehearsed my speech well enough and it didn't quite sync with the timing I had programmed into the slides and so there were some awkward pauses and a couple moments of shuffling through notes to see if I'd covered everything and some filler and I think I did miss a couple small points, but in the end it was ok)
I wound up missing most of the talks after mine because I got absorbed in talking to people who came up to me to talk about my presentation. There was of course the one guy off on his own wonkish tangent too loudly and for longer than I was prepared to feign interest, politics does that to people sometimes. Mostly it was really good to talk to people who were interested in the subject. I got a couple good comments along the line of "hey, I had wondered if this could work, I'm glad you're doing it". I think the biggest thing my redistricting project needs if it's going to have any real effect is networking and mind share. Eventually someone with the platform to make things happen or spread the word more effectively needs to have it introduced to them. I don't know who that will be, so I'll just go on talking these things up now and then and see if anything happens.
bolson: (Default)
In about 3 or 4 hours tonight, mostly coffee shop lurking at Diesel, I wrote code to make sure my redistricter actually produces contiguous districts. Yay! Now I just have to re-run it on all the states. Maybe I'll have results in a couple weeks.

43 states

Dec. 9th, 2007 04:10 pm
bolson: (gd)
Big update on my redistricter. I made a much faster solver that still gets pretty good results. So, now I have up 43 states* shown with current districts and an example of what an impartial redistricting might look like.

* AK, DE, MT, ND, SD, VT, WY have only 1 district.
bolson: (Default)
O'Reilly and Google put on a good show tonight at "Ignite Boston". It was like open mic night for geeks. 20+ people got 5 minutes of show and tell time. I put on a little slide show about my redistricting work. It was cool. Some of the things people talked about were just plugs for something they were selling, but counted as new technology in some way or another so were ok. A couple were incomprehensible.

I got the good validation that people complimented me on presenting good stuff and asking questions about it. After every set of 4-5 little presentations they would open up to group questions and I got 2 questions and there was 1 other about one of the others in my block. Yay! People like my stuff! Now I just need to convince a few million other people and we'll make it law.
bolson: (Default)
I did finally wrestle the census map data into something useable and I wrote up the code to rasterize it. This really does help:

before: after:

And a big png map of CA congressional districts

When you look at the big map it shows that my rasterizing is still missing a few pixels here and there and sometimes whole triangles it looks like. Also the map has lost the information about what regions are water. So, the whole SF bay has disappeared. That and some major cities would be some nice landmarks to add.

[Update Sun, 5:12 PT] Fixed the triangle rendering problem. It's even prettier now, but still has no water or cities.
bolson: (Default)
I'm trying to make my redistricting output prettier, but it isn't easy.
rant about the Census map data's file format )
bolson: (Default)
Redistricting is pretty computationally intensive. Also the solver I've written has randomized search in it, and is vulnerable to getting stuck at locally optimal solutions. So, I have to run it many times and it takes about four hours to settle a mapping of California using a 3.4 GHz P4. Right now I have my machine at work and my iMac G5 at home spending their idle cycles searching for optimal redistricting maps.

The last time I got serious about running big compute jobs it was for election method simulation to determine which method had the highest expected social utility. Eventually that became an excuse to buy a new computer. Sigh. But I ought to wait right now because the rumor mill (and conventional wisdom) says that Apple is going to release some awesome new machines next month. Also it's time to replace my aging iBook G4 1GHz. It still feels odd to think that a laptop will be my fastest computer (possibly by a large amount) and would be a good machine to do heavy computation on.


bolson: (Default)

May 2017

78 910111213


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 23rd, 2017 10:37 am
Powered by Dreamwidth Studios