Valgrind is Not Optional

| categories: tech | View Comments

Disclaimer: I really don't mean to be picking on the Mesa guys. Libre graphics folk are already terribly overworked, and it's neither my place nor my intention to tell them to try harder. Nonetheless, given what I'm working on, this is the example I have.

I'm working on a rather large and complicated change to Luftballons, and I've managed to get myself into trouble with what looks like a rather nasty use-after-free bug.

No problem, I'll just fire up valgrind.

==31830== For counts of detected and suppressed errors, rerun with: -v
==31830== Use --track-origins=yes to see where uninitialised values come from
==31830== ERROR SUMMARY: 507326 errors from 743 contexts (suppressed: 2 from 2)

That's a lot of errors. And this is after I used --gen-suppressions with a known-working version.

They can't all be me, can they?

==31830== Invalid read of size 4   
==31830==    at 0x64311D0: gen7\_update\_texture\_surface (gen7\_wm\_surface\_state.c:325)
==31830==    by 0x64238F3: brw\_update\_texture\_surfaces (brw\_wm\_surface\_state.c:1497)
==31830==    by 0x63FAD91: brw\_upload\_state (brw\_state\_upload.c:501)
==31830==    by 0x63B844A: brw\_draw\_prims (brw\_draw.c:501)
==31830==    by 0x67C02E9: vbo\_handle\_primitive\_restart (vbo\_exec\_array.c:549)
==31830==    by 0x67C143C: vbo\_validated\_drawrangeelements (vbo\_exec\_array.c:968)
==31830==    by 0x67C195A: vbo\_exec\_DrawElementsBaseVertex (vbo\_exec\_array.c:1141)
==31830==    by 0x4C1FC25: mesh\_draw (mesh.c:232)
==31830==    by 0x4C23E17: draw\_op\_exec (draw\_op.c:275)
==31830==    by 0x4C274EA: draw\_proc\_run (draw\_proc.c:182)
==31830==    by 0x4C27528: draw\_proc\_run (draw\_proc.c:184)
==31830==    by 0x4C27528: draw\_proc\_run (draw\_proc.c:184)
==31830==  Address 0x7f16da073d20 is not stack'd, malloc'd or (recently) free'd

Nope. Mostly they look to be in Mesa. Thousands of them.

I've previously lost almost an entire day trying to clean this mess up. I almost thought I had the suppressions manicured enough to be worth something, and then another pile broke through.

Some of this is valgrind's fault for not having the best suppression tools in the world, but there's also something a bit worrying about a library making this much noise during normal use. The Java fans have been screaming about the dangers of using "unsafe" languages like C since as long as their kind have existed. I've always disagreed, but that was partly on the grounds that tools like valgrind existed, and, more importantly, that they were being used.

Again I can't pick on the Mesa devs too much, but as a general rule, if you are writing a C project, valgrind is not optional, released code should generate no valgrind errors, and in the case of an exception or valgrind bug, you should ship a suppression file. With the great power of a low level language comes great responsibility, and this simple and easy acceptance test finds many dangerous errors.

Valgrind is about the fourth tool that has broken on me trying to find this bug (surge of GDB bug reports to come soon). Now I'm sitting here trying to determine if hunting a few of these errors constitutes good citizenship or a dangerous slide into yak shaving. I guess it's ironic that it's my own memory mismanagement defect that's causing all the effort. No rest for the wicked I suppose.

blog comments powered by Disqus