Monday, 9 December 2013

Looping back and forth, unsigned ranges.

I just saw this post about unsigned loop problems on Eli Blendersky's blog:

http://eli.thegreenplace.net/2004/07/18/cc-annoynace-unsigned-iteration/

In it he tells a tale of woe with unsigned values not being easy to test when doing negative for loops, but one trick that I've used numerous times for checking ranges applies here too. I've marked the changes.

Positive loop:
for (size_t i = 0; i < SIZE; ++i)
  ...

Negative loop:
for (size_t i = SIZE-1; i < SIZE; --i)
  ...

What? Ah, unsigned types are unsigned, so -1 is really big. Cool. Go use this in your code. Ranges are handled the same way:

if ( (unsigned_value-RANGE_MIN) < (RANGE_MAX-RANGE_MIN) )
  ...


Wednesday, 4 December 2013

A Primer in Compressing things.

Encoding into bits

If you want to store a number in a computer, it has to be stored as a sequence of bits. These bits can be used to represent large numbers, but the number space represented by whole number counts of bits is always a power of two.

  • 1 bit = 2 options, or a range of 2 values = 2^1
  • 2 bits = 4 options, or a range of 4 values = 2^2
  • 3 bits = 8 options, or a range of 8 values = 2^3
  • 10 bit = 1024 options, or a range of 1024 values = 2^10

As long as you can fit your value in under the number of options, you can encode it in that many bits. You can encode numbers from zero up to 1023 in 10 bits as it is under 1024.

For example :

  • to store a 7 or a 4, you could use 3 bits (b111 and b100) as 7 and 4 are both under 8.
  • to store a 10, you could use 4 bits (b1010), as 10 is under 16 (2^4=16)

Using your knowledge

You don't have to store your numbers starting at zero. As long as you know the minimum and maximum possible, you can shift the options into your needed range.

For example, I know that the number of legs on any table I own is always three or four. That's only two options, so I can store it in 1 bit. I store the number of legs as 3+(encoded bit)

  • for 3 legs, encode a 0
  • for 4 legs, encode a 1
I know that the valid temperature range on a piece of equipment ranges from 45 to 55 degrees, so I can encode it as a range of 11 (so, really 16 as that's the smallest power of two above 11), so I encode the data at 45+encoded value
  • for 45 degrees, encode a 0 (b0000)
  • for 50 degrees, encode a 5 (b0101)
  • for 55 degrees, encode a 10 (b1010)

Wasting space

What's noticable from the temperature example is that the values 12-15 are never used. We're wasting potentially usable bits, except they aren't whole bits, so there is no obvious way of using them. How many options can you store in half a bit?

Using different bases

The reason for the waste is the number space being used. If we restrict ourselves to the number space of the computer, we make encoding and decoding easy for the CPU, but we waste storage space. If we use a base other than 2, we can pack the data really tight. In theory, it's possible to pack your data so that you only waste less than one bit of data.

A simple example is in order. Assume you have a list of booleans with an additional unknown state. These tribools come up frequently so this might be of practical use to some of you. Tribools require three options. How many can you store in a normal 8 bit byte? Using base 2, you have to use two bits per tribool, meaning that in 8 bits, you can store 4 tribools.

But 8 bits can store up to 256 different options, and if you slice it differently, you can pack more tribools in. Tribools (3) stored 5 times has less options than the bools (2) stored 4 times.
  • 2^4 = 256
  • 3^5 = 243
To store the tribools, we merely need to generate the "option" of that set of 5 tribools, and then encode that value instead. The trivial example is all tribools but the first are zero:

  • 0,0,0,0,0 => 0
  • 1,0,0,0,0 => 1
  • 2,0,0,0,0 => 2
what is interesting is what happens when we add to the other elements
  • 0,1,0,0,0 => 3 (1*0 + 3*1)
  • 0,0,1,0,0 => 9 (1*0 + 3*0 + 9*1)
  • 0,0,0,1,0 => 27 (1*0 + 3*0 + 9*0 +27*1)
  • 1,2,1,2,1 => 151 (1*1 + 3*2 + 9*1 + 27*2 + 81*1)
We multiply out the number by it's base. Element 0 * 3^0, element 1 * 3^1, element 2 * 3^2, element 3 * 3^3, and element 4 * 3^4. Now we can store 5 tribools in 8 bits because we are actually storing a number from 0 to 242 in 8 bits, but that number can be decoded into 5 different 3 option values. Decoding is handled by taking modulus and then dividing to remove that part of the encoded value.
  • 151 % 3 = 1
  • 151 / 3 => 50
  • 50 % 3 = 2
  • 50 / 3 = 16
  • 16 % 3 = 1
  • 16 / 3 = 5
  • 5 % 3 = 2
  • 5 / 3 => 1
  • 1 % 3 = 1
Decoding is just like figuring out the time of day from the number of seconds elapsed.

Probability of a value

Apart from knowing the range, we can also reduce the storage used by using our knowledge of how likely a thing is. This is known as probabilistic compression, where the chance of a value being selected affects how many bits it uses. The one you learn about first is Huffman coding, but an easier to swallow example might be useful.
Let's say we have a game about sewing seeds in a garden full of plots, and letting them grow. If I am trying to compress the type of seed planted in a plot, but most of the plots are empty, I can say that the most common type of plant is zero, the NULL plant. I can encode the plot data with only one bit for whether there is a plant, and then the actual plant type if there is. If we assume 8 types of plant, then:
  • no plant => (b0)
  • plant of type 4 (b1100) (bit 1 set because there is a plant, then bits 2-4 set as the plant type.
If the garden is empty, then the compressed sequence takes as many bits as there are plots. If all the plots are full, then the compressed sequence takes as many bits as there are plots more than a naive solution (actually, if there are 8 types of plant, then it's actually the same for a naive solution, as 8 types of plant means 9 options, which would require another bit as the lowest power of two greater than or equal to 9 is 4)

It's possible to encode like this automatically, generating the probability of each option, then generating a binary tree which defines how you encode and decode values. This is Huffman coding, and is very useful when you have no domain knowledge.

Compressing to less than 1 bit

If you have probability information, Huffman coding can be great, but it's not quite as efficient as it can be for the same reason why we found that we could save storage by changing the base when compressing the tribools. Huffman must encode at least one bit for even the most common data. As an example of how we can do better, let's compress a garden with an additional bit of information: whether there is anything in a whole row of plots.
  • no plants in row => (b00)
  • no plant in plot => (b01)
  • plant of type 4 => (b1100)
What happens here is that the save takes up two bits per row rather than 1 bit per plot. That might be a significant saving. You can probably think of many other ways to mark larger areas of empty data too. Gridding it out at a lower resolution, then setting bits for all cells that contain any data, for example. With domain knowledge comes much better compression. Huffman coding isn't readily adaptable to wildly biased data, but it can be if you give it more symbols to work with. Therein lies the tradeoff. If you have to send a large palette of symbols, does the palette cost more to send than the data? Run length encoding can get down to less than one bit per byte on some data because runs of 128 bytes can be encoded in a single byte, or runs of millions can be encoded in 32 bit RLE encoders.

Domain knowledge

Really powerful compression comes from understanding the probabilities and rules of data being compressed. If you have a look at the lossy compression algorithms, this becomes more easy to understand. In lossy compression, you have domain knowledge of what humans can and regularly perceive. If the computer is able to analyse the data and find the information and only compress that, then it can reduce a very large raw file down to something significantly smaller.

Let's compress our garden again. This time, we know that the garden cannot have any plant directly next to another plant.
  • no plant in plot => (b0)
  • plant of type 4 => (b1100)
  • plot where plan cannot exist because of following rules and relying on already encoded data => (b)
Yes, zero bits. Zero bits to encode the fact that a cell is empty when it's not possible to have a plant in it. It makes sense because the actual fact that the cell is empty was already encoded by the plot encoding a plant type. It's not really zero bits, it's 1/5th of a bit really as the plot with a plant in it defines the state for 5 cells. (the one following, and the three on the next row.

There are often many places where rules give you information for free, you just need to look out for them.

Friday, 30 March 2012

Bezier Curve template

Sometimes you just want some simple code you can drop in to do bezier curves:

Templated Bezier function:

static inline float CR40( float t ) { return 0.5f*(2*t*t-t*t*t-t); }
static inline float CR41( float t ) { return 0.5f*(2.0f-5*t*t+3*t*t*t); }
static inline float CR42( float t ) { return 0.5f*(4*t*t+t-3*t*t*t); }
static inline float CR43( float t ) { return 0.5f*(t*t*t-t*t); }
template<typename T>
T CRBez4( T a, T b, T c, T d, float t )
{
return CR40(t) * a + CR41(t) * b + CR42(t) * c + CR43(t) * d;
}

download

Well, now you can. Here's some rather old code I found, and recently reused.

Thursday, 24 November 2011

Existence checks

Let's say you have a large table of data, lots of rows with a few columns. You know that every row is unique because that's a basic feature of your table, no duplicates. There are only a few instances where non-unique elements are a good idea in tables, so we're going to ignore that corner case for now.

Let's look at what we have with an example:

Find out if Bob is the leader of team A
Name Clan Rank
alice team A leader
bob team B private
bob A team leader
charlie A team private

We can see that we can't just check for any one column and return on success, this is much like checking for a string, we have to check every element before we're sure we have a match. You could maintain the table's alphabetical order, reducing the string queries down to only the rows that match "bob" or "team A" (dependent on which column you sorted by). If you walk through the table, you have to string match at least one column.

But, we know that the table is made up of unique elements. There will be only one row returned if at all, so that makes this table a set. It's the set of Name Clan Rank associations that are known to be true. There are a very large number of potentially true values in this set, the product of the list of all potential names, all potential clans, and all potential ranks. Let's assume 4 billion names, 1 million clans, and 100 ranks. If we were to use a bit field to identify all the possible permutations, we'd not be able to fit the array into our computers. 400 trillion combinations would be quite literally, larger than we could handle. But sparse bit fields have better solutions than just simple array lookups.

What we've done is reduced the problem to its essential components. We're not looking for the row that contains the data, we're looking to see if the data exists. We don't want to access the data, only find out if it's in there. For this task, there are many solutions, but I'll just mention one.

I like hashes. My solution to this is to store the hashes of all the rows in an array of small arrays. I index into those small arrays by a small number of bits, the smallest amount of bits I can that differentiates the hashes enough that all the associated ones don't overflow the capacity of the cacheline sized mini arrays.

H1[bits0-n] -> index of array it should be in.

With this, I do one cache line load to get at the array of hashes, and if the hash is in there, then I can return that the row does exist. If not, then I can assert that it doesn't. And that's quick.

Monday, 31 October 2011

static

The static C++ keyword has multiple meanings based on context.

When used inside a function,

void foo() {
  static int i;
  printf( "%i", i );
  i++;
}

it means that the declared variable is global in lifetime scope, even though only local to the function for reading and writing. Normally this is used to implement singletons.

When used inside a class,

class Bar {
public:
  static void Baz() { ++foo; }
  static int foo;
};

it means that the declared member variable or function don't require a this pointer, and thus don't require an instance for operation. This can be useful for implementing class specific memory pools, or other types of class specific global state encapsulation.

When used in front of a variable or function,

static void Baz() { ++foo; }
static int foo;

it means that the global variable or free function are not destined to be used outside the current compilation unit, which reduces the number of symbols during link time, which can lead to better build times.

There is a missing functionality in that you cannot have a class defined that doesn't offer external linkage. You cannot specify that a class doesn't export its member functions. This is a shame as it guarantees that the link time of object-oriented C++ code gets worse, and you can't fix it.

Monday, 22 August 2011

float == float

Never use equality checking with floats, they are not accurate for checking with an equals operator.
For example, this loop would run forever:

for( float a = 0.0f; a != 1.0f; a += 0.1f )
blah();

Not ten times. Try it.

This is because 0.1f is an approximation. The IEEE standard float cannot represent 0.1f, only a very near neighbour. In fact, once you've added all ten, the bit pattern is only off by 1 bit, so it's pretty accurate, but still not equal.

1.0f = 0x3f700000
the sum of ten 0.1f is 0x3f700001
quite remarkable that it's so close.

So, you agree, floats are not safe for checking equality, right?


Wrong.
floats are great for equality, but just don't use the wrong ones.
for example, 0.125f has an exact representation in floats, so adding together eight of them will in fact, be perfect.
In fact, floats have exact representations for all the integers right up to 16,777,216.0f, after which the mantissa runs out of accuracy.
so, when you see code like this:

for( float a = 0.0f; a < 100.0f; a += 1.0f )
{
float percent = a * 0.01f
/* do something with a percent */
if( a == 0.0f || a == 25.0f || a == 50.0f || == 75.0f )
/* something to do with quadrants */
}

then you can trust it just fine. Seriously. Also, this is useful for times when you might want to not suffer the notorious load-hit-store. If you had done this:

for( int a = 0; a < 100; ++a )
{
float percent = a*0.1f;
/* do something with a percent */
if( a == 0 || a == 25 || a == 50 || == 75 )
/* something to do with quadrants */
}

you would be converting from int to float on every run through the loop, which is slow on any modern console.

Tuesday, 12 April 2011

Why use a quaternion?

No, Gimbal-lock* is not the answer.

Quaternions provide a different way of representing a transform. The representation is limited to rotations, optionally you can try to use it for mirrorings too, but it get's a little awkward so we don't normally do that.

Matrices provide a complete solution to representation of the orientation and position of an object, so they can be considered to be the de-facto solution for static scene transform representation, however, because they are a linear re-combination, they do suffer when used dynamically.

When you create your model matrix, the one for you windmill, you would normally generate a matrix (something like Matrix::MakeXRot( bladeAngle )) then set it into the rendering transform hierarchy. All well and good. What this is doing is setting a static matrix every time you change the bladeAngle value. If you are rendering a robotic arm, you would have a number of different matrices at different levels in that hierarchy, each one concatenating with another to finally provide that finger transform.

If you're not provided with all the angles, and instead being told that you have a start and end value for each transform in the hierarchy, then that works fine for the examples so far, you interpolate the angles and produce new matrices that are in-between the start and end matrices. This works fine because we're still in the world of scalars, interpolating between 0 and 100 for the windmill just gives us a value that we use to generate a rotation matrix. So all is fine.

But, when we are given the matrices, and not the rotations, suddenly we have a surplus of information, and we're not sure what to do about it. If you have a character end position that is a 180 turn from its start position (a Y-axis turn of PI), then you cannot tell which way to turn to interpolate, nor can you tell whether or not the real intention was not something more obscure, like a scaling to -1 in both the x and z axes.

Okay, so that last possible animation is unlikely to be a feature, more likely to be a bug, but that's what happens if you linearly interpolate a matrix. That's what normally happens when you linearly interpolate a linear transform.

This is where quaternions come in: they aren't a linear representation of orientation, not only are they non-linear, they're also not really rotations. The best explanation I've read / figured out for myself about what they really are, is that they are a kind of 4D mirroring operation, which when applied twice gives you the rotation you would expect. As far as anyone using them should be concerned though, they are vectors that give you a space for doing interpolation of 3D rotations.

A quaternion represents a rotation, it can be interpolated to produce a rotation that will be on the shortest path between the two given orientations. Quaternions can be interpolated linearly, spherically, or used in higher order functions such as beziers or TCB splines.

That is what you need to know. It allows you to interpolate properly.

Spherical linear interpolation, which leads to very smooth results, can often be relegated to trying too hard territory. Most animators and motion capture data already provide enough keys that doing the interpolation that carefully is not really necessary. Don't forget that artists don't generally see sub frame data, so getting it right is pointless in most cases.

* Gimbal-lock is when you can't interpolate from one rotation to another without doing some long winded route all round the houses. It manifests in Eulers due to the polar representation. Any time you get near a pole, you have to work overtime to get the transform in the right orientation.