Ed Page: Trust Issues

Where I work we write in C++ but our coding style has been heavily influenced by kernel development and older coding practices. This means we do not use exceptions, even in userspace.

Rather than using a global variable for errors or peppering our code with return value checks, we pass a reference to an error value into all of our functions. On an incoming error, the function is suppose to be a no-op.

The group I work in is responsible for shared code by many hardware product teams. One joys of this is when something a hardware group develops is wanted by others, we become the owners. Its also fun when they continue to contribute features to it and we remain weak in our knowledge of it. Oh and we are also clients of some groups that provide shared code for the whole company.

The group underneath us in the stack has had a bug with their tScopedLock code that it can cause a BSOD when an error is being passed in in an ISR.

The code looks something kind of like (changed for simplicity, especially to be free of internal code)

void tClass::operation(void* param, int& errno)
{
tScopedLock lock(memberLock, errno);

if(errno != 0) {
int tempError = 0;
cleanup(param, tempError);
    return;
}
...
}

Because we technically own the component they sent the bug report our way with a suggested fix that they tested.

void tClass::operation(void* param, int& errno)
{
if(errno != 0)
{

    return;
}

tScopedLock lock(memberLock, errno);

if(errno != 0) {

int tempError = 0;
cleanup(param, tempError);
    return;

}
...

}

This seemed fishy to me. One issue is the second error check only gets executed if the lock fails to acquire. Seems like a small scope to worry about cleanup and that should be left for later. The second is they probably had a reason to do that cleanup and do it under the lock. I double checked with the implementer and he confirmed it would leak memory. At least its better than a blue screen.

So my solution is found below.

void tClass::operation(void* param, int& errno)
{
int tempError = 0; tScopedLock lock(memberLock, tempError);

if(errno != 0) {

cleanup(param, tempError);     return;

}
else
{
errno = tempError;
}
...

}

All of the code is irrelevant. Who introduced the problem and who made the fix isn't relevant. To me the lesson here is the importance of code ownership (even if you don't know the code you own). Our clients are under pressure to meet their deadlines and don't feel the same level of responsibility for this code. It met their immediate needs and they moved on. We release with them and so have the same deadline but with a smaller bug backlog. Fair enough we do have a bit less stress. I still think our ownership of the code helped us to ensure it had the proper quality for the release looming over us.

Another great example is we have code repos for "shared code" that have no owner. Yes someone originally put it there but the ownership isn't recognized on the business side of things so when you have deadlines coming up it is hard to justify fixes or improvements. We end up viewing it as the code graveyard.

I'm not saying any of this is novel. I just think it was a nice clean example, one of those "huh, nice" type of things.

Ed Page

Thursday, July 08, 2010

Trust Issues

No comments:

Post a Comment