bolson | Entries tagged with hacking

These are things that actually have been useful or continue to be useful to my 16 year career as a software professional.

Based on “Introduction to Algorithms” 3rd edition, (c) 2009
I had the 1st edition from 1990 when I used it in class around 2000, but all this is pretty foundational stuff that’s mostly stable over the last 20-30 years.

I Mathematical Foundations
3 Growth of Functions
This is important, everything else will refer to it

II Sorting
Sorting was a foundational bit of CS when it was taught to me, and this was taught using about a dozen different sort algorithms. But what I think actually worth learning is:
6 Heapsort - and the section on “priority queues”
8.4 Bucket Sort - when ‘close enough’ is really fast
And what I sadly don’t see in the table of contents is merge sortall of these - stacks and queues, linked lists
11 Hash Tables - yes, all of javascript and ruby and python and perl are based on this
12 Binary Search Trees - another basic concept that I think is an important alternative to hashes

VI Graph Algorithms
22 Elementary Graph Algorithms - breadth-first-search and depth-first-search are important concepts

VII
27 Multithreaded Algorithms - sooner or later this will be important - thinking about concurrent programming is more and more important as our computer systems get more multi-core and more distributed.

VIII
Appendix B: Sets, Etc
More basic structures everything will refer to. A bit thick with notation, there ought to be simpler explanations of these things. If you want a reminder about what union and intersection of sets are, look here.

Programming almost isn't about performance anymore. Performance has to be good enough, but most problems are small enough that's pretty easy. Most of the programming problems at my current and previous job are about managing complexity. We have to write a big complex piece of code that manages lots of attributes and has lots of rules and does lots of actions. We have to do that in a way such that it works in the short term of days or weeks for whatever feature we're adding this time, and we have to do that in such a way that we can come back to it next month or next year and do something completely different.

When rearchitecting the world, do it along with the rest of the team.

I recently had the misfortune of doing a 4 month long project to make major changes to the underlying architecture of a project 6 other developers were also actively working on. I did this on a development branch, outside the scope of development the rest of the team was doing. On at least three occasions I had to spend a day to a week merging in their semi-major changes (week to month long projects) and making sure everything still worked in my new world.

That sucked. I didn't understand what their changes were supposed to do, and reintegrating them didn't always work right away, and they had no visibility into how they should code to be more compatible with what I was doing.
I did too much of my reorganization as if no one else was working on the code base (which had been true just a few months earlier).
In one place, code from four different files became gathered in one new function. But because of the scope and flow control of this new code site, it wound up being a copy-and-paste job, and then when people made changes to the old code, those changes had to be manually merged into the new code. Uf.

I now think the answer should have been that I do all the reorganizing I could in the main development branch, making my changes visible to others, lifting code into functions that could be called from either the old framework or the new one.
I think this wasn't done because management insisted (and it seemed kinda reasonable) that experimental code shouldn't be committed to the main development branch where it could get in the way of other programmers or even moved out to production code (as a web service we could deploy every week if we wanted).
I think this was actually wrong.

Big changes should be made where they effect everyone, but they should be made in lots of small steps.
I tried to do this in one place early on, and got push back and kinda told not to do that again.
I lifted the middle of one function out into a new function, because that was the part I needed to call from new code.
It was functionally no change to existing code, and that was received as a kinda bad thing. Don't change what works; if it ain't broke, don't fix it.
And yet I was undertaking a large project to change a lot of things that needed changing.

So, there was a little mismanagement, and a little messy coding, and I'm not sure exactly what the moral of this cautionary tale is, except that next time I'm making major architectural changes to software, I want everyone to see my changes all the time so that we're not stepping on each other.

I wish the go language had subclassing or macros or templates or something. These are all ways of reusing code without copy and paste. A macro or a template can be applied by the compiler to apply the same code pattern to different data. Subclassing can make things applied to the old data also apply to a new expanded version of that data.
Right now this lack is making me kinda hate writing a container/collection class in go. If I write a LRUCache class, but then want to extend it to be an LRU cache which also expires cached elements by HTTP rules (a normal pattern of thinking for class design in Java or C++) in go I'm going to wind up copy and pasting the entire implementation in order to make an extended version of it.
In Java (or very nearly C++) I'd write:

class LRUCacheEntry {
	Object it;
	int foo, bar; // some metadata to track
}

then

class LRUHttpCacheEntry extends LRUCacheEntry {
	long age; // more metadata to track
}

The subclass gets all of the members and functions of the superclass, and a little bit more. Go is missing this kind of composition or extension.
If the LRU cache behavior was written as a template or a macro, it might be applied to anything that had the right set of fields.
Instead, Go is advocating that classes implement interfaces of accessors. I'm annoyed at having to write lots of accessors, and I'm annoyed because I wanted to come to Go to write efficient and fast code which would compile and compete with the runtime speed of C or Java. Sure, requiring a virtualized function call to access a variable isn't the slowest thing ever, but do a few billion of them and I'll still be annoyed that other languages have good ways around requiring that.

Google AppEngine announced a new pricing model and the killer is that as soon as an 'app'* costs anything it costs $9 per month. The implication to me is clearly this: never do anything small. This probably won't be a barrier to people building a business on AppEngine, they're probably planning on moving many more dollars per month than $9. But I think it will be a barrier to tinkering and creativity.

( rant about Google culture and AppEngine )

Once upon a time, 2001-2007, I worked in the 'embedded' world, working on software for computerized things that go into cars and printers and planes and whatnot. I got into this through hobby hacking a long time ago on a little 8 bit microcontroller, the Motorola 68HC11. I've been craving that hobby hacking again lately. A few years ago I bought a little ARM based board that seems to be a pretty hacker friendly device, with a built in boot loader so that I don't have to buy any expensive equipment to start using it (Atmel AT91SAM7S series). But, software support for it is kinda lacking. There needs to be an open source hacker community around these things these days to write all the drivers and complex piles of software to do things for USB and Ethernet and web serving and all the things we expect modern devices to do. My old ARM part didn't have that. It seems there is that kind of community around what is a superior device: Microchip's PIC32 line. If you've used the old classic PIC (8 bit) or PIC16, stop thinking about them. PIC32 is a MIPS 4k core at 80MHz with up to 128K of RAM and 512K of flash, and an awesome set of onboard peripherals (USB, Ethernet, 1MHz ADC, and bunches more). Yup, gotta get me one of those and start hacking. Also, and this is the real win that makes me want to use this part, Microchip is developing and has for download beta versions of a Mac (or Linux) toolchain!

I may be going crazy, or it may be that 'avoiding obviously horribly wasteful designs' has become 'premature optimization' while I wasn't looking.

is not to store it at all.
A unique id for a US Census block winds up being 15 decimal digits, which fits handily into an 8 byte int.
Actually there are less than 10,000,000 blocks in the US, so that could easily be a 32 bit number.
But if what I really want to do is store a mapping from each block to district number for each block (easily a 1 byte number), the smallest way to store this is just a list of district numbers. Use the Census data file as a canonical ordering of the blocks.
CSV for this becomes 15 decimal digits, comma, one to three decimal digits, newline. 20 bytes vs 1.
For the hundreds of thousands of blocks in Texas, after gzipping the CSV, this is a 2372 KB file. gzipped byte list is 32 KB.
Sadly, a CSV file in a .zip archive seems to be the common interchange format for these things.
At least I get to use my format between my client and my server.

Oracle is having a chilling effect on Java. I don't think I could in good conscience start a project in Java right now given that Oracle might capriciously lock out parts of the Java world. I've been saying for a couple years than Java is my favorite language these days. Nice to use and fast run time. Boo. Maybe I'll go look at the Go language some more.

One, measure the size of your monitors to get the vertical spacing.
Two, drill holes big enough for metric "M4" screws on 100mm by 100mm square patterns at positions appropriate to your monitors.
Three, 3" nickel plated hinge into back board and foot.
Four, two strips of wood int 2" strap hinges. The first of those into the back board, the second so that it slides parallel and incontact with the first but with the hinge attached to the foot.
Five, tightly wind string around wood strips to bind them together and adjust tilt of whole setup.

Bonus! Tilt adjustable is nice anyway but it means I don't have to worry about getting the monitors exactly perpendicular to the base.

Just got a new linux box put together last night. Took me from about 7:30pm to 11pm to physically assemble the thing and install Ubuntu 10.10 and the nvidia driver to drive its four screens (still have 2 year old iMac too).

`make -j8` is my new friend thanks to the four-core eight-thread i7-950 cpu. Also, it's 750W power supply will warm my toes all winter.

I wrote my own rasterizer. Given some loops of points, yield pixels. I probably didn't do it in the most efficient way, but on the plus side I got to run exactly the code I wanted over every pixel, yielding a somewhat unusual setup where every pixel is associated with a 64 bit key, rather than a color (they key is used to look up the color later. rasterize once, color many). On the down side, there are annoying corner cases that a rugged rigorous library would have taken care of, like what happens when the scan line happens to exactly intersect a point. In my code this can lead to either under-counting or over-counting of that intersection. Uf. Fix it eventually, worked around for now by winding up leaving that scan line blank out of the offending polygon.