Cactus

The B Method for Programmers (Part 2)

22 February 2010 (programming language math correctness)

In my previous post, I introduced the B method and showed the steps of writing a simple program for finding the nth element of a sequence satisfying a given predicate p. While you may think the resulting program is correct, we can't just say so and be done with it. The whole point of the B method is that the resulting program can be formally proven correct.

The B software generates 69 so-called proof obligations for the code from the first part. These are assertions about the program actually behaving as specified. For example, let's look at PO69 which asserts that ll is correctly set. Recall first the relevant portion of the specification:

ll = bool(card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}) ≥ nn)

And the invariant of the implementation:

ll = bool(card({kk | kk ∈ aa..ii ∧ pp(kk) = TRUE}) = nn)

So what we have to prove is that given the preconditions, by the time the loop in the implementation terminates, the invariant makes sure ll is equal to its specified value. This is what's described (somewhat more verbosely) by the actual proof obligation below.

find_nth preconditions in this component
  aa ∈ INTEGER ∧ aa ≤ 2147483647 ∧ -2147483647 ≤ aa ∧
  bb ∈ INTEGER ∧ bb ≤ 2147483647 ∧ -2147483647 ≤ bb ∧
  aa ≤ bb ∧
  aa-1 ∈ INTEGER ∧ aa-1 ≤ 2147483647 ∧ -2147483647 ≤ aa-1 ∧
  pp ∈ INT ⇸ BOOL ∧ aa..bb ⊆ dom(pp) ∧
  nn ∈ INTEGER ∧ nn ≤ 2147483647 ∧ -2147483647 ≤ nn ∧
  1 ≤ nn ∧
  
Local hypotheses
  not(iiz$7777+1 ≤ bb ∧ not(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn)) ∧
  iiz$7777 ∈ aa-1..bb ∧
  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) ∈ 0..nn ∧
  not(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) ⇒ card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE})+1 ≤ nn ∧
  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE})+1 ≤ nn ⇒ not(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) ∧
  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn ⇒ iiz$7777∈ aa..bb ∧
pp(iiz$7777) = TRUE ∧
  card({kk | kk ∈ aa..iiz$7777-1 ∧ pp(kk) = TRUE})+1 ≤ nn

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = bool(nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}) ≥ nn)

This looks unwieldy at first, but it's not that hard to locate the parts that come from the precondition, the parts that come from typing constraints, the statements from the loop invariant, and the negation of the loop condition (since we're proving something's true after the loop has terminated). Everything above the character is a hypothesis that you can take for granted while writing the proof. Everything below it is what you actually need to prove. The generated variable iiz$7777 represents the value of ii after the loop has terminated.

Now let's write the actual proof! Proofs are written in a formal language much like a programming language. But proofs are not programs in the traditional sense since they don't yield any computation. Instead, a proof is a tree of statements where each statement is a consequence of the preceding steps. Once written by a human, the resulting proof can be verified by an automated checker. And that's the crucial step: a single computer program using a well-defined set of axioms is the only point in the verification chain that needs to be trusted. We may not be 100% sure that the proof we will write is correct – but if we trust the B proof checker, and our proof passes the verification, we can be sure.

So our goal is to decompose the proof into steps that are simple enough for the mechanical checker to verify. In practice, this is done in an interactive fashion by writing down a step, and then trying to verify this step by the checker. If it cannot verify it, we decompose even more.

We begin our proof by noticing that the statement is in the form A ⇒ B. The first step, then, is adding everything on the left side of the to the (initially empty) set of hypotheses (input is typeset in bold):

> dd

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = bool(nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}))

Although seemingly the job has become harder (we now have to unconditionally prove the statement), we now have a bunch of hypotheses created that the checker can use. For example, here are all the known properties of iiz$7777:

> sh(iiz$7771)

  card({kk | kk ∈ aa..iiz$7777-1 ∧ pp(kk) = TRUE})+1 ≤ nn ∧
  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn ⇒ iiz$7777 ∈ aa..bb ∧ pp(iiz$7777) = TRUE ∧
  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE})+1 ≤ nn ⇒ not(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) ∧
  not(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) ⇒ card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE})+1 ≤ nn ∧
  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) ∈ 0..nn ∧
  iiz$7777 ∈ aa-1..bb ∧
  not(iiz$7777+1 ≤ bb ∧ not(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn))

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = bool(nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}))

The last line of the output shows the same statement as before, which was to be expected, since sh only shows the hypotheses but doesn't take any proof step.

Now it's time to think about the substance of our proof-to-be. An obvious starting point is to consider the cases where both sides of the equation are true and when both are false separately. So let's suppose first that the left-hand side is true:

> dc(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn)

  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn ∧
  iiz$7777 ∈ aa..bb ∧
  pp(iiz$7777) = TRUE

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = bool(nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}))

> dd

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = bool(nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}))

Of course, if the predicate on the left-hand side is true, its bool value is TRUE (see how we're making a jump from the world of predicates to the world of expressions). So we can add the following hypothesis:

> ah(bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = TRUE)

This new hypothesis needs to be proven. Since it is a simple enough consequence of our hypotheses collected so far, we can simply tell the proof checker to try proving it on its own, by using term rewriting:

> pr

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = TRUE

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = bool(nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}))

> dd

Since we now explicitly know the value of the left-hand side, we can use this equality to re-write the equation via a simple substitution. We then simplify the result.

> eh(bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn))

  TRUE = bool(nn ≤ card({kk | kk: aa..bb ∧ pp(kk) = TRUE}))

> ss

  nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE})

Now we're definitely getting somewhere: we know from our initial supposition that |{k∈{a, ..., i} | p(k)}| = n, so we can substitute nn:

> eh(nn)

     card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE})

... which is true simply because the set on the left-hand side is a subset of the one on the right-hand side.

> ah(aa..iiz$7777 ⊆ aa..bb)

  aa..iiz$7777 ⊆ aa..bb

This seems like a trivial enough hypothesis that we can try running pr again.

> pr

  iiz$7777 ≤ bb

Oops! Instead of proving the hypothesis, all pr did was simplify it somewhat (pr stops after a certain number of steps, to avoid divergence). This means we're not yet finished with the proof. But we're close enough! Let's see what we know about iiz$7777 and bb:

> sh(iiz$7777 _and bb)

  iiz$7777 ∈ aa..bb
  not(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) ∨ (iiz$7777 ∈ aa..bb ∧ pp(iiz$7777) = TRUE) ∧
  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn ⇒ iiz$7777 ∈ aa..bb ∧ pp(iiz$7777) = TRUE ∧
  iiz$7777 ∈ aa-1..bb ∧
  not(iiz$7777+1 ≤ bb ∧ not(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn))

  iiz$7777 ≤ bb

I've highlighted the most promising hypothesis above. In fact, the statement we need to prove is a direct consequence of it. We're confident enough that we can tell B to start using predicate resolution (note that checking this step can take the computer several minutes).

> pp

  aa..iiz$7777 ⊆ aa..bb

  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE})

> dd

  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE})

The next step is straightforward: now that we've proven {a, ..., i} ⊆ {a, .., b}, it's time to use this (and some kind of implicit set monotonicity) to prove {k ∈ {a, ..., i} | p(k)} ⊆ {k ∈ {a, ..., b} | p(k)}.

> ah({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE} ⊆ {kk | kk ∈ aa..bb ∧ pp(kk) = TRUE})

  {kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE} ⊆ {kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}

> pr

  {kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE} ⊆ {kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}

  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE})

And believe it or not, in two more steps we're done with the first case of our initial branch:

> dd

  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE})

> pr

  not(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) ∧
  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE})+1 ≤ nn ∧
  not(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn)∧

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = bool(nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}))

The reply to pr above may seem totally out of the blue, but it is in fact the direct consequence of the negation of the case predicate from the dc statement at the beginning.

This looks like a good enough milestone to review our proof so far. Here's Atelier B's user interface showing the structure of our proof:

So the other branch is about the case when both sides of the equation are false. Let's make this knowledge explicit. I'm going to lump together a few simple steps, hopefully you can make out what we're doing here by consulting the detailed explanations so far.

> dd

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = bool(nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}))

> ah(bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = FALSE)

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = FALSE

> pr

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = FALSE

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = bool(nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}))

> dd

  bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn) = bool(nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}))

> eh(bool(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn))

  FALSE = bool(nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}))

> ss

  not(nn ≤ card({kk | kk ∈ aa..bb ∧ pp(kk) = TRUE}))

What these simplifications show us is that what we're really going for is proving |{k ∈ {a, ..., b} | p(k)}| < n, in other words, that we were unable to find n occurances of p in the given interval. If this is the case, then surely we have exhausted the whole of the interval while searching:

> ah(iiz$7777 = bb)

  iiz$7777 = bb

How can we prove that our algorithm iterated until ii hit bb? Let's search for relationships between i and b, maybe we can find something promising.

> sh(iiz$7777 _and bb)

  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn ⇒ iiz$7777 ∈ aa..bb ∧ pp(iiz$7777) = TRUE ∧
  iiz$7777 ∈ aa-1..bb ∧
  not(iiz$7777+1 ≤ bb ∧ not(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}) = nn))

  iiz$7777 = bb

In case you're wondering, the highlighted hypothesis has been generated by the negation of the loop condition, since we're proving something about our program after the loop has terminated. It says that the loop couldn't have finished without either exhausting the interval or finding enough occurances of p(k). And we know, from the case we're in, that we haven't found enough occurances:

> sh(card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE}))

  ...
omitted for clarity
  card({kk | kk ∈ aa..iiz$7777 ∧ pp(kk) = TRUE})+1 ≤ nn

  iiz$7777 = bb

Thus, i < b cannot hold, and since i ∈ {a-1, ..., b}, i = b follows. This plan leads to the idea of a proof by contradiction, which is luckily a technique that B supports. The proof ct basically means "take the negation of whatever the target statement is as a new hypothesis, and I'll prove it leads to a contradiction".

> ct

  bfalse

Our new target is bfalse, which is the unsatisfiable predicate. The only way to prove bfalse is to show a contradiction. Which is exactly what we want to do. First of all, if i ≠ b, then i < b because of i's type:

> ah(iiz$7777 < bb)

  iiz$7777+1 ≤ bb

> pp(rt.1)

  iiz$7777+1 ≤ bb ⇒ bfalse

> dd

  bfalse

(pp(rt.1) is the same as pp, except it favours type hypotheses, and thus finishes before timing out)

But if i < b then that means we've exited the loop by finding n occurances:

> ah(card({kk | kk ∈ aa.iiz$7777 ∧ pp(kk) = TRUE}) = nn)

  card({kk | kk ∈ aa.iiz$7777 ∧ pp(kk) = TRUE}) = nn

> pr

  card({kk | kk ∈ aa.iiz$7777 ∧ pp(kk) = TRUE}) = nn

  bfalse

It's time to get confident:

> pr

  iiz$7777 = bb ⇒ not(nn ≤ card({kk | kk: aa..bb ∧ pp(kk) = TRUE}))

> pr

  
done

Which leaves us with the following complete proof:

So, one down and 68 more to go... which is disheartening to say the least. Luckily, most of the generated proof obligations are so simple that a single run of pr can solve them. In our example program, 53 can be proven automatically, leaving 16 proofs to be written manually (including PO69 which we've just finished). Those 15 remaining can be proven in the same technique presented above.

You can download the complete package with specification, implementation and proofs here. It also defines some useful lemmas which are then used by some proofs.

As a conclusion, let's summarize what we did here. Using a formal system, we devised a proof of correctness for our software. This proof can be verified by independent proof checkers, so you only have to trust them, and then you can finally trust your own code. Yes, this process is unwieldy, and no, it is certainly not something that should be done for all software. But when you absolutely positively need correct software, there is no easy way out: any arguing about the correctness of a program short of a formal method is, at the end of the day, just handwaving.

Of course, the Hoare-Dijkstra model of using least preconditions is but one of several formal systems for arguing about software correctness. I hope to soon write another introduction about proving properties of functional programs, using dependent types.


« Internet-celebritás lettem... 
All posts
 The B Method for Programmers (Part 1) »