In the search for a bug: Jordy Gennissen

 As a young technical researcher starting my CDT, I was working on detection of synthetic bugs. All went well, but when presenting this work to my peers at Royal Holloway, I received an unexpected question: What is a "[computer] bug"? Even though the work I presented had clear boundaries to what I was searching for (SEGFAULT crashes), it alluded to the more fundamental question. When is something considered a bug, or a vulnerability? I had hoped for a quick and insightful Duckduckgo search but even Google did not have a satisfying answer. But how can we really understand these concepts if nobody succeeded to properly define them? A few brilliant pieces have been written around that time, referring to weird machines. Weird machines are conceptually the state a computer program is in when a bug or vulnerability is triggered. This notion was introduced to try and reason about the abuse of bugs more generally. The idea here is that a weird machine might be easier defined or modelled compared to the generic concepts of "bug" or "vulnerability". I liked this idea of formalising, because it meant that we might be able to formally reason about it and maybe even (mathematically) prove the absence of such weird machines (and hence vulnerabilities). But once again, definitions so far do not align with the conceptual understanding. From a technical perspective, it is possible to compare two buffer overflows (e.g. Heartbleed) - something we know is a vulnerability. It can even be compared against e.g. a temporal memory issue like a double free. It however becomes a far more complex question (from a technical perspective) when broadening the notion to include command injection attacks (as for example Shellshock). The latter does not break any inner structures of the applications but can be at least as damaging. In my opinion, the biggest difference between the two mentioned above is the origin of the bug. When looking at the source code of a program that contains a buffer overflow (e.g. in C), we can conclude that this is explicitly undefined behaviour. Undefined behaviour is inherently bad when it occurs and would probably require a separate blog post. A program containing a command injection instead, has no such recognition point: the semantics of the source program ultimately define this behaviour. The difference aligned above can be looked at from a perspective of abstraction. Where the buffer overflow issue turns this undefined behaviour in source code into a problem in compiled code, a similar pattern can be seen in the Shellshock injection. Instead, it occurs on a different abstraction level. The intended program left undefined what needs to happen when the command injection occurs, and it has been the developers' decision for this to happen. The tricky thing here is that some conceptual intended program is not easily modelled formally, as lovely as it would be. It is informal by nature: formalising this (which is arguably being done in formal methods) would inherently lead to another level of abstraction where new bugs can occur. It might solve some issues – but without guarantees there's none left.

In fact, this argument leads to a somewhat depressing conclusion: computer programs can never be completely proven to be bug-free. It can be proven to be bug-free with respect to its formal specification, but not to its informal specification. The gap between these two remains a place where mathematics has no power. This also means we need a different set of tools if we are to define this origin of bugs. Does this solve the question of what is a bug? I don't think so. This question has stuck with me ever since, and needless to say, I don't believe the question has been answered. Yet.

Comments

Popular posts from this blog

Remote working and Cyber Security: Georgia Crossland and Amy Ertan

New Publication: Remote Working and (In)Security?: Amy Ertan

The Artificial Intelligence Monster: Nicola Bates