Friday, 27 June 2008

Compression, fundamental

To compress data you don't just go about fitting the all the numbers that you have in the smallest spacesyou can fit them. Compressing things is usually about understanding what information you have.

Data does not equal information.

consider the data:

int gMyArray[] = { 15,15, 0,0,0,0,0,0,0,0, 1205,30000, 4,4,4,4,4,4, 65000,65001 };

sizeof( gMyArray ) == 80; // (20 elements, 4 bytes each)

If you want to compress that sequence, you could store the sequence as a list of unsigned shorts. That would save you half the data space, 40bytes, but its no-where near as good as you could do. Naivley storing as pairs of ints that supply first the value, then the number of copies will give us a final size of 56bytes, which is not as good as the half data space compression, but once we change to using a short as teh data value and a char as the copy count, we drop down to 21 bytes. that's almost half the size of the first attempt at compressing.

Watch out for what information you have in your data. Sometimes you have a lot of data and very little information.

consider recording button presses for a game, if you store the array of button values at each time there is a change, you will use up quite a bit of memory quite quickly, but if you just store the time until a button changes state, with the button ID, then you can make a saving. The more keys the bigger the saving. Now consider a keyboard. No-one stores a keyboard map per key up or down.

No comments: