Tuesday, 29 July 2008

Management is hard.

Can you manage more than ten people? They are all doing different tasks, each one reporting back to you once a day about their progress including any reports on difficulties. These difficulties need to be disseminated out to the appropriate ears so that problems can be swiftly dealt with. Add the job of managing resources between these people, requests for assets, requests for features, and you get a hell of a tangled web. Then add in illness, holiday, differing skill sets, rate of work, and you get an incredible amount of information. Can you still manage ten people?

If you think you can, then you're probably wrong.

So, how do large organisations work? Why don't they fail miserably? The answer normally lies in using a hierarchy so that there is no more than four or five people being managed at once, but that means that staff are lost to management that might be able to do other jobs better.

Is there an alternative? Your own comments first before I reveal my opinions.

Process

It's important to realise that you have a process, even if it includes no written rules. Everyone has their way of doing things. What most people don't see is that process is one of the more important areas of self improvement. If you have a constantly evolving, improving process that you follow, then you are less likely to commit the same bugs twice and more able to remember what to do next.

Process includes how you code, what tools you use, how you approach tasks and how you manage your time. There are innumerable areas in which any individual can improve, and the same goes for companies too.

Every time you see something go wrong, have a think about what could have been put in place to make it never have happened, this is the safeguard. Also think about other things that could have happened because of the lack of this safeguard. This will let you hunt down other bugs/errors quicker than just waiting for them to happen or be found, and will allow you to move onto bigger things quicker.

Tuesday, 8 July 2008

Templates primer

Before C++, many coders thought that even though they were dangerous, #define macros were valuable. They were right in that they allowed code to be more "meta", but their price was high. With the advent of templates however, there is the opportunity for the same level of hard coded efficiency without the same dangerous lack of safety mechanisms.

Templates are compile time macros just like #define, but they are type safe, and can be used with C++ constructs better as they allow for overloading. Using a template can help you set some compile time asserts too, such as making sure that two arguments have the same type (like when you want to do a swap). All this is done at compile time and not with any hand waving either, it's just C++ compiler unrolling the code for you then compiling it as usual, just like a #define macro. The difference really is that the #define macro stuff doesn't know about C,C++,Pascal,Java, so can cause arbitrary damage.

So, if you're thinking that a piece of code might be better templated, but are worried about any overheads, don't.

Virtuals, primer

Without fully understanding virtual tables, you can make some nasty gaffs. Without understanding the basics, you can get lost in some errors.

Firstly, the tables have to exist somewhere. They are real things, they are actually tables of function pointers. They have to go somewhere in memory, so most C++ compiler/linkers put them in the place where the first virtual function is defined for the most base class. In Geko we made the destructor/deconstructor that function. the simple Shutdown function was left inline.

Virtual tables are pointed to by the invisible first pointer in a class. You see it in the debugger, but not in the code. This pointer points to the array of function pointers that apply to it. Any function calls are made by finding the function's array reference (at compile time) and adding that to the vptr (at runtime) and calling that function instead of a statically bound one.

Multiple inheriters can have more than one vptr, which makes it even more complicated, and different compiler linkers handle it in different ways.

Other things to remember:
There is no reason to inline virtual functions. They can't be inlined, so inlining them just makes them look faster to humans. Virtual functions are slower than standard function calls, so use them for what they are good at and promote them to non virtual or inline whenever you see an opportunity.
You can write templated virtual code and you can write virtualised templates, but generally, you should get good at both before writing them together.

Unsorting.

Something I learnt from Ian Gledhill:

If you have a list that you want to shuffle, you can do it very efficiently by using the following technique.

First, define N as the number of items in the list.
n = N
r is a random number between zero and n (not inlusive)
swap element r and element n-1
reduce n by one
if n is greater than 1, then return to the line where you make r at random.

consider the following list:
1,2,3,4
n = 4, r = rand(0,4) = 2
1,2,4,3
n = 3, r = rand(0,3) = 0
4,2,1,3
n = 2, r = rand(0,2) = 1
4,2,1,3
n = 1 (finish)

This is about as optimal a shuffle as is possible. It's even better that it's an in place operation.

Tuesday, 1 July 2008

Hash functions.

Hashes reduce the amount of information in an object down to some limited form. They take all the bytes that make up the original, and combine them in all sorts of ways to generate a hash value. A good hash function will modify approximately half the bits of the hash value whenever 1 bit of the original byte stream has changed. There are some very good hash functions out there, and rolling your own isn't that difficult to do well.

Using hash functions and values can be a little more demanding though as you always have to remember that a hash code will be able to tell you, for certain, that two objects are different, but never be able to guarantee that they are the same. Also, because of the birthday problem (explained in a moment) the number of possible objects going into a hash that can come out unique is quite small to the size of the possible variants of the hash value. Even a perfect hash (one that has 1 bit difference half hash modification bit per bit 50% variance also) cannot get around the problem of a hash being a proof of non-similiarity. The reason is simple. A hash is a reduced amount of information.

The Birthday problem:
How likely do you think it is that someone at your workplace has the same birthday as you?
That is quite a simple calculation, your take the number of people in the office, minus one for yourself, then calculate 364 over 365 to the power of the number of other workers to figure out the probability that you dont share a birthday.

20 workers = 95% that you wont share.
100 workers = 76% that you wont share.

but, what about the probability that someone in the work place shares a birthday with someone else?

well, if you're out of the picture, then there's the rest of the office against random dude.

20 workers = 95% for you, 95.2% for dude...

what about if we consider the number of people in the office in total with those that they could check against? that's 190 possible birthday comparisons for a room of 20 people...

20 workers = 59.4% chance of not sharing a birthday!
oh dear...
100 workers = 0.00012% chance of not sharing a birthday...

office count combinations probability of unshared birthday
5
10
0.972938
10
45
0.88386
15
105
0.749712
20
190
0.593771
25
300
0.439092
30
435
0.303184
35
595
0.195465
40
780
0.117664
45
990
0.066135
50
1225
0.034709
55
1485
0.017008
60
1770
0.007782
65
2080
0.003324
70
2415
0.001326


okay, so that's fairly mad. A better view of the combinatoric reasoning for the sudden growth can be found here: http://en.wikipedia.org/wiki/Birthday_paradox

so... for a hash code that that has 4 billion values, we can expect to get quite a few hashes, but we will hit a wall at about 77000 as at that point, the probability of having a collision is just over 50%, and no matter how many times you adjust the seed, once you get to about 200000, it will be almost impossible to find a valid non-colliding hash. That's 4 billion 4,000,000,000 can't support 200,000 different hashes, just because of some silly factorial law.