Monday, 22 March 2010

Ronseal Rule

Now here is something you should think about every time you write a function. Does the function do what it says on the tin?

It's an important thing to think about. Some code is like this:

int CountThings()
{
// iterate through things
// while there, update their cache stuff
// and if one needs deleting, do it now
// return the count
}

That's pretty bad, but in a way, it's what a lot of systems do. Okay, so there are some optimisations to be made. Now, how about this one?

Thing *GetThing()
{
return mThingPtr;
}

That's not Ronseal either. And, it's prevalant in many systems I've worked on (including my own, oops).
No, I mean it. It's not Ronseal.
No, really. Have a look at what it's doing. It's not actually returning a Thing, it's returning a pointer to a thing...
You think I'm being picky? Well, most of the time I'd say you're right, but, what if I told you I could refactor it into two different functions?

Thing& GetThing()
{
return *mThingPtr;
}
bool ThingExists()
{
return mThingPtr != NULL;
}

Did you have an "ah-ha" moment there?

Okay, now think about this from the point of view of all those function calls you do to get objects, then check for NULL on return... You've had to infer two pieces of information from one call return value.
Now, go and fix your code.

Sunday, 21 March 2010

Defensive programming is offensive programming

Some people advocate defensive programming, think that it's better than a system carries on working, logs the fault and continues on merrily. This is okay for any programming where performance isn't of the utmost importance, and you don't mind shipping your software riddled with bugs that have all been caged. What it's not good for is any software that needs to be really safe, or really fast.

Why does it slow stuff down? First reason is, it's usually code that makes sure that return values from get functions are not null, or systems that try to handle illegal or irregular arguments.

if not null is bad because it is an inherent indirect branch (to get a value), then a probably predicted branch (not null). Constantly getting pointers to things and checking them for null is just going to thrash your branch and memory to death. It's offensive to the cache, and offensive to the in order processors in general.

What to do instead? Use asserts. Assume things are not null and carry on regardless. Make your game break when things are actually going wrong. What is wrong with finding out it's all broken a year before you release rather than a day after?

Friday, 19 March 2010

How I do the washing up.

Here is how I do the washing up.

  • Take one dirty thing from the pile of dirty things
  • if it looks as if it needs food scraping off:
  • I grab my scraping thing, walk to the bin, scrape off the crud, return to the sink, put down the scraper.
  • if it still looks dirty
  • I fill the bowl to the necessary level to wash up the item, put on gloves, wash it up, put it on the drainer, empty the bowl, take off gloves.
  • if it is now wet
  • I grab a drying towel, dry the item, put the towel back down
  • then as it must be clean by this point, i put it away where it belongs, then return to the sink ready to start all over again.
No, hang on, that's not how I wash up, that's how I code with virtuals. Doh. Silly me.

Thursday, 18 March 2010

A quote and a rethink

Rule of Modularity: The only way to write complex software that won’t fall on its face is to build it out of simple modules connected by well-defined interfaces, so that most problems are local and you can have some hope of fixing or optimizing a part without breaking the whole." - The Art of Unix Programming

Now, looking at what's been going on with data-oriented development I see that there are some words that though at the time pertinent, actually cause an inflexibility of interpretation. An inflexibility that will allow many to point and laugh at the data-oriented crowd. The problem is with the natural interpretation of modules and interfaces.

Modules and interfaces were just a way of saying break down the complex stuff into smaller easier to manage stuff, but the meaning has been lost as we have solidified what a module and an interface means. Let's go back and think about this again. Breaking a problem of many entities down was solved a long time ago by database engineers, they invented many tools, and even created a generic but powerful interface to manipulate their data. This could be called modularising the idea of persistence. SQL was the interface.

We can do the same, ignore the words "modules and interfaces" instead concentrate on the idea, separated and distinct techniques for processing objects.
If we allow ourselves to define objects as the streams or arrays of data, then all we need to do is write a lot of processes that operate on them. Each process does something, usually something somewhere between simple and complex as we don't want to waste data throughput on tediously simple problems (which is what OO advocates normally), and we don't want too much data per row (as that will cause cache issues). Which is why the data-oriented approach works really well with the old Unix programming quote, but only if you try to distil the essence of the quote, not just use the words blindly.

So, a refactored quote.

"Rule of Modularity: The only way to write complex software that won’t fall on its face is to build it out of simple data definitions connected by well-defined transforms, so that most problems are locally defined and you can have some hope of fixing or optimizing a single link without breaking the whole chain." - The Art of Unix Programming (revisited for modern hardware architecture - me)

Wednesday, 17 March 2010

Right Shift for the win

I have learned something new today.

Something that had been a "don't know, won't assume" has finally filtered into a fact. Right shift operator maintains the highest bit.
I didn't know if this was true, or if it was somehow costly, but I remember it not being true too. I now also remember there being a difference between arithmetic right shift and just plain right shift. Now the important thing here is, if you can provide a last bit, you can provide a mask.

i.e., any negative number right shifted by 31 (while identified as a signed int of 32 bits), is -1, or, the all inclusive mask. Any positive number, right shifted by 31 is 0, or, the all exclusive mask.

Now, apply that logic to branches and you get a good general branchless technique for value manipulation. Remember, most significant bit is maintained when you use signed types, don't go using unsigned ints.

Have fun!