poita.org

Holism

Posted: 13 May 2012 - Link

Here’s some SSE code to normalize a 3-vector:

; vector in xmm0
movaps  xmm2,   xmm0
mulps   xmm0,   xmm0
movaps  xmm1,   xmm0
shufps  xmm0,   xmm0,   _MM_SHUFFLE (2, 1, 0, 3)
addps   xmm1,   xmm0
movaps  xmm0,   xmm1
shufps  xmm1,   xmm1,   _MM_SHUFFLE (1, 0, 3, 2)
addps   xmm0,   xmm1
rsqrtps xmm0,   xmm0
mulps   xmm0,   xmm2

Here’s some SSE code to normalize four 3-vectors:

; x's in xmm0, y's in xmm1, z's in xmm2
movaps  xmm3,   xmm0
movaps  xmm4,   xmm1
movaps  xmm5,   xmm2
mulps   xmm0,   xmm0
mulps   xmm1,   xmm1
mulps   xmm2,   xmm2
addps   xmm0,   xmm1
addps   xmm0,   xmm2
rsqrtps xmm0,   xmm0
mulps   xmm3,   xmm0
mulps   xmm4,   xmm0
mulps   xmm5,   xmm0
; x's in xmm3, y's in xmm4, z's in xmm5

As you’ve no doubt noticed, normalizing four vectors in the right format is just about as fast as normalizing one vector.

This is nothing new. It’s fairly well known that using structure of arrays opens up more optimization opportunities than arrays of structures.

The interesting thing here is that there is no API with a function called normalize(vec3 v) that can take advantage of this. There is no way to break down the problem of normalizing four vectors in a way that recombining the parts is as efficient as the whole.

Holism is the idea that systems should be viewed as wholes rather than the sum of their parts. The opposite of Holism is Reductionism. A Reductionist, when presented with the problem of normalizing 100 vectors would first find a way to normalize one vector, then apply that 100 times. A Holist would see the problem as a whole, and realize that the normalizations can be done in parallel. The Reductionist would use the first code snippet. The Holist would use the second.

I like this quote from Stepanov on the subject:

It is essential to know what can be done effectively before you can start your design. Every programmer has been taught about the importance of top-down design. While it is possible that the original software engineering considerations behind it were sound, it came to signify something quite nonsensical: the idea that one can design abstract interfaces without a deep understanding of how the implementations are supposed to work. It is impossible to design an interface to a data structure without knowing both the details of its implementation and details of its use. The first task of good programmers is to know many specific algorithms and data structures. Only then they can attempt to design a coherent system. Start with useful pieces of code. After all, abstractions are just a tool for organizing concrete code.

If I were using top-down design to design an airplane, I would quickly decompose it into three significant parts: the lifting device, the landing device and the horizontal motion device. Then I would assign three different teams to work on these devices. I doubt that the device would ever fly. Fortunately, neither Orville nor Wilbur Wright attended college and, therefore, never took a course on software engineering. The point I am trying to make is that in order to be a good software designer you need to have a large set of different techniques at your fingertips. You need to know many different low-level things and understand how they interact.

I think the take-away point here is that you should consider your problem as a whole before you begin breaking it down into smaller parts. It’s tempting to start creating a separate class for every noun in your application domain, but more often than not there is a better design involving a more holistic approach.


Imagine C++ Without Function Overloading

Posted: 29 Apr 2012 - Link

I’ve spent quite a lot of time in the D programming language community, and one thing I’ve noticed is that people are very quick to suggest new features, but rarely do people suggest to remove a feature (unless it is outright broken). It’s easy to see the benefits of adding to a language, but the costs are often harder to see, or are simply ignored.

For a bit of fun, this post will explore the benefits of removing a useful feature from C++: function overloading.

Quick recap: function overloading is a feature of C++ that allows you to have two functions with the same name that vary only on their parameters. In C, you would to name your functions differently, e.g. with min and fmin.

int min(int x, int y) { return x < y ? x : y; }
float fmin(float x, float y) { return x < y ? x : y; }

int a = min(1, 2);
float b = fmin(1.0f, 2.0f);

In C++, you give them the same name, and the version is chosen based on parameters.

int min(int x, int y) { return x < y ? x : y; }
float min(float x, float y) { return x < y ? x : y; }

int a = min(1, 2);
float b = min(1.0f, 2.0f);

In C, you could use macros as a hacky form of overloading, and in C++ you could use templates, but I’ll ignore those for the moment.

One clear advantage to overloading is that you don’t need to name your functions differently if they do the same things, but on different types. It makes things easier to remember and arguably makes the intent of your code clearer. It also means that you can change the types of your arguments (e.g. changing from float to double) without having to go around changing all your function names. It can also help with generic programming.

I’m sure there are other benefits, but those are the main ones. Now let’s go through the costs.

The first cost is the cost of name mangling. In C, since there can only be one function for a given name, the symbol assigned to that function is just the name of the function. In C++, two functions can have the same name, but they need to have different symbols, so C++ compilers must mangle the names to include the parameter type information.

Using g++, the two C++ min functions above mangle to __Z3minii and __Z3minff. Those are quite simple examples. Things get more nasty when more intricate types are involved. For example the std::sort of an std::deque<std::pair<int,int>> mangles to __ZSt4sortISt15_Deque_iteratorISt4pairIffERS2_PS2_EEvT_S6_.

Why does this matter? It matters if you have to look at call stacks, or object dumps, or care about the size of your debug files. Many tools handle common C++ name mangling schemes, but this is extra work for the tool developers. It also doesn’t help that every compiler mangles names differently, which is why you can rarely link code generated by two different C++ compilers.

Without function overloading, there would be no need to mangle the parameter types into the symbol name.

Another cost of function overloading is the quality of compile time errors. If I try to call min(1, 2.0f) in C++, I will get an error because it cannot figure out whether I want to call the int version or the float version. The only way to express my intent is to cast one of the arguments min(1, (int)2.0f). Without funcition overloading, if I want to call the int version, I call min, and if I want to call the float version, I call fmin.

Having no direct way to specify what overload I want to use has other consequences. Suppose I want to get the address of one of the overloads because I want to use it as a parameter to a higher-order function.

int a[] = {3, 2, 5, 1, 4};
int amin = std::accumulate(a, a + 5, a[0], &min); // get the min

// ERROR!
// error: no matching function for call to 'accumulate(int [5], int*,
int, <unresolved overloaded function type>)'

How do you do it? &min could refer to either the int or float version.

Do you know how to do it?

You have to cast the function pointer to the resolved overload function pointer type, i.e. (int (*)(int, int))&min gives you a pointer to the int version. This would be easier if the functions were named separately.

In the above scenario, things would be easier if min were a template as you could just refer to them as &min<int> and &min<float>. Just be careful when you mix overloads and specialisation though, as they don’t play nicely with each other.

While we’re talking about things that don’t play nicely with each other. Have you ever tried to override a particular function overload in a derived class? That’s not simple either.

So, it would seem there are at least a few costs to overloading. I’m not sure if the costs are so great to warrant its removal from C++, but it’s worth thinking about these things every once in a while.


Small Object Overhead

Posted: 20 Apr 2012 - Link

One problem that seems to come up quite a lot is the issue of per-object overhead when you have lots of small objects. For example, consider a small class to represent a bullet in a side-scrolling shooter:

struct Bullet
{
    void update(float delta);

    Vec2 m_position;
    Vec2 m_velocity;
}

That’s how it starts at least. What happens when the bullet hits something? It needs to inform the scoring system that you’ve scored some points. You’ll also probably want to play some sound effects, perhaps control a sprite; maybe some bullets increase the health of the player when they hit things. Before long, your simple Bullet class looks more like this:

struct Bullet
{
    void update(float delta);

    ScoringSystem m_scoringSystem;
    AudioSystem m_audioSystem;
    SpriteSystem m_spriteSystem;
    Player m_player;
    Vec2 m_position;
    Vec2 m_velocity;
}

One solution is to bunch up all of those systems into a separate object.

struct Bullet
{
    void update(float delta);

    Systems m_systems;
    Vec2 m_position;
    Vec2 m_velocity;
}

This is certainly better for the size of your object (one word overhead vs. four), but it still smells a bit funky.

A better solution, in my opinion, is to take responsibility away from the Bullet.

class BulletSystem
{
    void update(float delta)
    {
        foreach (bullet; m_bullets)
        {
            bullet.m_position += bullet.m_velocity;
            ...
        }
    }

    ScoringSystem m_scoringSystem;
    AudioSystem m_audioSystem;
    SpriteSystem m_spriteSystem;
    Player m_player;
    Bullet[] m_bullets;
}

struct Bullet
{
    Vec2 m_position;
    Vec2 m_velocity;
}

Now, the Bullet is nothing more than the data associated with the bullet itself. All the logic and responsibility is pushed up into a less granular BulletSystem. This has a few advantages:

  1. No unnecessary overhead per bullet.
  2. Only one function call to update N bullets, rather than N function calls.
  3. Opens up opportunities to parallelize updates.
  4. Has much better data flow.

When information and logic are too granular, you increase overhead and lose control, which means losing opportunities to make high-level optimisations. In this case, the whole seems to be greater than the sum of the parts.


Data Structures

Posted: 27 Feb 2012 - Link

Something I’ve realised recently is that it’s usually a bad idea to use data structures to structure your data… yeah.

That probably doesn’t make much sense. What I mean is that you shouldn’t use data structures to organise high-level information in your programs. Data structures exist to make operations and algorithms faster, and should have nothing to do with how your “objects” are organised logically.

As an example, consider a game that has several worlds, and within each world there are several stages. Each stage has a bunch of stars you have to collect, and we want to know how many stars there are and how many you’ve collected.

How might you structure this in your program? Until recently, I would have used something like this:

class Game
{
    World[] worlds;
}

class World
{
    Stage[] stages;
}

class Stage
{
    Star[] stars;
}

class Star
{
    bool collected;
}

This seems reasonable. Let’s try to use it:

int totalStars(Game game)
{
    int count = 0;
    foreach (world; game.worlds)
        foreach (stage; world.stages)
            count += stage.stars.length;
    return count;
}

int starsCollected(Game game)
{
    int count = 0;
    foreach (world; game.worlds)
        foreach (stage; world.stages)
            foreach (star; stage.stars)
                if (star.collected)
                    ++count;
    return count;
}

I don’t know about you, but I’m not impressed. It’s not too difficult to write these functions, but I can’t help but thinking, “what am I gaining from structuring the code like this?”

If we step back and look at the problem that needs to be solved, surely the following would be a better way to structure the data?

class Game
{
    Star[] stars; // ALL the stars
}

class Star
{
    bool collected;
    int stageId;
}

This way, the queries are straightforward loops over all the stars. If we want to count only the stars in a particular stage then we can filter using the star’s stageId.

In the end, what matters is how you use the data, and what algorithms need to be implemented efficiently on the data. Any relationship between things in the application domain is of little relevance when choosing data structures.


Scoped Imports in D

Posted: 26 Jan 2012 - Link

One of the nice things about the D programming language is that it has very convenient syntax for conditional compilation. For example, suppose you want to print out some useful info in debug builds. You just use:

debug writeln("Value of x is ", x);

One problem that I constantly run into in cases like this is that I haven’t imported std.stdio, so I have to go right up to the top of the file, add debug import std.stdio;, back down again, and continue. Sigh.

Not so fast! D has scoped imports, so you can put the import std.stdio; right where you need it.

void main(string[] args)
{
  debug import std.stdio;
  debug writeln(args);
  // ...
}

It’s the little things like this that make D so pleasant to use.