In this post I shall discuss the proofs of two statements in real analysis, one of which is clearly deeper than the other. My aim is to shed some small light on what it is that we mean when we make that judgment. A related aim is to try to demonstrate that a computer is in principle capable of “having mathematical ideas”. To do these two things I shall attempt to explain how an automatic theorem prover might go about proving the two statements in real analysis: in one case this is quite easy and in the other quite hard but by no means impossible. In the hard case what interests me is the precise ways that it is hard, which I think say something about the notion of depth in mathematics.
The first statement is that if a function
is continuous, then for every
and every sequence
that converges to
we also have that
converges to
. Most experienced mathematicians would not regard this as a deep statement because proving it is “just an exercise” rather than a result that “needs an idea”. (Spotting that it is a useful statement is a different matter, but it’s not what I’m talking about here.)
I shall now give a proof of the statement in such a way that every step of the proof is the obvious thing to do, where by “obvious” I mean in some sense “algorithmic”. I could go into more detail about how these obvious steps could be carried out by an actual program, but then what I wrote would be much less readable. So I’m compromising by giving a slightly higher-level discussion. But if anyone thinks I’m using that to sneak in a trick that only a human could spot then I’ll be happy to elaborate on parts of the proof later. So here goes.
We want to show that if
is continuous, then it commutes with taking limits. Suppose then that
converges to
. We would like to prove that
converges to
. Therefore we must let
and find some
such that
whenever
. (That is just translating the definition, which is clearly an automatic process.)
How can we find such an
? Well, what could possibly imply that
? To answer this question we write out the information we have and simply look at it to see if anything has a statement resembling
as a conclusion. And we notice that the definition of continuity of
ends with the assertion
. (For safety’s sake, we give all our dummy variables different names.) So we focus on the whole definition:
.
So the obvious thing to try is taking
to be
,
to be
and
to be
. We’re free to do the first two as the above assertion starts with “
“, but what about the third? Well, we also have a “
” but it comes with the condition that
, or rather
, should be at most
, which depends somehow on
and
. So we can conclude the following: if
then
.
Now let’s recall precisely what we want to prove. We would like to show that there exists
such that if
then
. From what we have just shown, it will be sufficient to prove that
. And now we see that just such a conclusion appears at the end of the definition of the convergence of
to
, and it is easy to see that the premise is exactly what we want too as it gives us our
.
Incidentally, the converse of this statement can also be proved in a fully justified doable-in-principle-by-computer way too. (In fact, I discussed both directions many years ago on my web page.) However, I only really need one sample of “non-deep” mathematics to illustrate my point, so I won’t discuss the converse here. Instead, let’s move to the second statement, which is a beautiful problem that is often set to Cambridge undergraduates. It can serve either as a hard problem for those who have done a first course in analysis (one that perhaps one or two people per year are capable of solving) or a hardish exercise in applying the Baire category theorem.
It’s quite interesting to discuss how a computer could solve the problem if it had the hint that the Baire category theorem should be applied, but then it becomes more like the first example: it can be done by “pattern matching”. But then one would feel that the computer had cheated and been told the idea. So it is even more interesting to see how a computer could solve the problem from scratch, succeeding where all but a handful of students fail.
The statement in question is this. Let
be a continuous function and suppose that for every
the sequence
tends to 0. Prove that
tends to 0 as
. If you haven’t seen this before and want to get the most out of this post then you should (of course) make a serious attempt to solve this beautiful problem before reading on.
On then to the proof. The aim, as with the previous result, is to present the proof in such a way that every step is “the obvious” thing to do, or at least sufficiently obvious that it would be one of the first things that a well-programmed computer would try.
Let us begin with steps that really are automatic, such as translating the problem into more formal language and converting “for all”s into “let”s (by which I mean that if you want to prove a statement that begins “for every x in A” you start by writing “let x be in A”, and so on). In this case we are trying to prove that
tends to 0, so we let
be positive. We would now like to find
such that
whenever
. But simple pattern-matching lets us down: the obvious (and only) statement available to us that resembles
is the statement
that comes at the end of a formal definition of the hypothesis of the problem. And that is true only if
is at least as big as some
that depends on
in a way that we know nothing about.
This is a point where many human mathematicians will feel stuck. But one move that sometimes helps is to do something else that can be easily automated and aim for a proof by contradiction, so let’s try it.
If we cannot achieve our goal then for every
there exists
such that
. And now, if we want a contradiction, we must prove that there exists
such that
does not tend to 0. Writing out this last statement in full gives us
.
Now we make three observations. First, it is fairly clear that we shall need to use the fact that
is continuous. (I have thought quite hard about how a computer might come to this realization, or at least construct, when asked, an example of a discontinuous function
that does not tend to zero despite the fact that
always does. I’ll save my conclusions about that for another post. For now let us be satisfied with the idea that the continuity of
was given as a hypothesis and the computer will naturally tend to see what comes of using the hypotheses.) Second, there is a promising resemblance between
and
. Third, we are trying to construct a real number that satisfies an infinite set of conditions, one for each
.
Let us think about the last observation first. How does one construct a real number with infinitely many properties? A standard answer, and one that is closely bound up in the very idea of a real number, is to construct it as the limit of a sequence. But what will make that limit satisfy all the properties? To make this question slightly more concrete, let us call the properties
and let the sequence be
, converging to
. We don’t have much to play with here: all we can say about each
and each
is whether or not
has property
. And our main information about
is that the
get close to it.
As we build our sequence
, how can we make sure that its limit
at least has property
? A natural answer is to insist that every
belongs to some closed set
, all of whose elements satisfy property
. Here I am using “closed” in the sense of “closed under taking limits”. The most basic examples of closed sets are closed intervals, and this thought leads us to one of the basic theorems of real analysis: that a nested intersection of closed bounded intervals is non-empty. But I prefer to think of this statement as yet another version of the completeness axiom, and it leads us to the following basic real-number-constructing principle, which I would imagine a computer as having been taught rather than as having invented: if you want to construct a real number
that has properties
, then see if you can construct a nested sequence of closed bounded intervals
(of non-zero length) such that every
has property
. If you can, then any
in the intersection will have all the properties simultaneously.
Now let us return to the problem at hand. Property
is the property that there exists
such that
. So let us suppose that we have already constructed the closed bounded interval
and see if we can find a closed subinterval
all of whose elements satisfy
.
It would be nice to avoid the language of intervals, so let
be the interval
. Our task is then to find real numbers
and
such that
(that makes
a subinterval) and such that for every
with
there exists
such that
(that makes every element of
satisfy property
).
Before we attack this (by now not terribly hard) problem it feels as though we need to choose our
. From string-matching we more or less know that
will depend on
(since the only lower bounds we have for values of
are that there are arbitrarily large
with
). And here there is a very useful method that human mathematicians use all the time: just guess the simplest possible dependence and make adjustments if it doesn’t work. The simplest non-trivial dependence would be
, which turns out to work later if we “adjust” it to
. Another method, again used by human mathematicians, is “let
be a positive real number to be chosen later”. Here one sort of pretends to have chosen
and as the proof proceeds one finds that
must satisfy certain conditions for the argument to work. One then shows that these conditions can be satisfied. Let us adopt the latter approach here.
At this stage a computer may well not see its way to the end of the proof, but another thing it can certainly do is look about for places to apply the continuity hypothesis. And since there is only one thing we know about the values taken by
(that they have modulus
for arbitrarily large
), there is only one place we can apply this hypothesis.
To elaborate, we know that
, and we also know that
. To apply the second statement we must choose values for
and
. The only real number we have around is
(which depends on
) so let us choose
to be
. That gives us the statement 
Rearranging this to bring the quantifiers as far as possible to the left, we obtain the equivalent statement 
Now an applying-the-triangle-inequality module will leap into action and observe that from the final two inequalities it follows that
. The end of the conclusion we are trying to obtain is
for some
, so string-matching tells us that we would like
to be positive, and an elementary-inequality module will tell us that the simplest
that is both positive and less than
is
. So let us make this choice of
, take
to be
, and see what we have when we put all the quantifiers back. We have
. (Incidentally, there should, strictly speaking, have been “
“s in some of the earlier statements, but it is quite common, if sloppy, to regard those as sort of implied by the symbol “
“.)
Now we observe that we are in a stronger position than before. Although
is smaller than
, all we knew about
was that it was positive, and that’s all we know about
. So from the perspective of proving the result, there is absolutely no loss in passing from
to
. (This is itself a useful principle that it would be quite good to make a bit more formal.) On the other hand, the range of values where
is at least
has increased from a single arbitrarily large
to an interval of arbitrarily large
s.
Suppose we have some
and
such that
implies that
. String-matching tells us that we will be done if we can choose
and
and an integer
such that
and such that
whenever
. Elementary manipulation of inequalities tells us that for this we need
and
. Let us pretend that we have chosen
. Then the Archimedean principle tells us that we can find
with
and a standard reflex tells us that if there exists such an
then there exists a minimal one. So let
be minimal such that
. Can we now find
with
such that
? Obviously we can if and only if
. Now we have not yet chosen
, so let us try to choose
and
in such a way that
and
.
Let us focus first on choosing
. If we have chosen
then we will be able to find our
if and only if
and
. Let us choose the minimal
that satisfies the second inequality. Then
, so
, so we are done if
, for which we need
to be at least
. We have the lower bound
, and
can be made arbitrarily large (because we may as well impose the bound
if it helps, which it does). Therefore, we can have our lower bound on
.
That argument can of course be very significantly tidied up. The basic idea is that every sufficiently large real number
is a multiple of an element of
, since the intervals
start to overlap when
gets large. So if
is sufficiently large then we can choose
and
with
and
. (But the computer needs an extra idea to try for the stronger statement
rather than the weaker statement
.) Having chosen such a pair
we can easily choose
in such a way that
and
.
If you agree that the above account does demonstrate that the proof can be generated in a fairly automatic way from a rather small and simple set of precise problem-solving techniques in real analysis, then there are two possible reactions. One would be to say that the proof is less deep than it looks, and only appears deep in its usual presentation because the Baire category theorem is used as a piece of magic (which we might refer to as the “insight” of certain mathematicians from around a century ago). Another would be to search for differences between the way the second proof is generated and the way the first one is. And a definite difference is that the second involves a process of construction: of the nested closed intervals
. The fact that we just went ahead and did the construction in a fairly automatic way does place some upper bound on the depth, and it is perhaps the automatic nature of this stage of the proof that encourages one to abstract out what has been done and formulate the Baire category theorem.
Let me be more precise about the difference. At times during both proofs we needed to find real numbers with certain properties. However, only in the second proof did we need to find a real number with an infinite sequence of properties, which resulted from the fact that the string of quantifiers in the statement we wished to prove had “
” after the “
“. In order to find the real number with infinitely many properties, we converted the problem into one about constructing a sequence, which then meant that we were back in the world of finding real numbers with just finitely many properties.
I’ll end with a brief remark that I hope to elaborate on in a future post. It seems that a significant challenge for an automatic theorem prover is to deal with statements that do not have uniquely obvious proofs. (This of course relates to my earlier post on when two proofs are the same.) And this non-uniqueness often arises when a construction is involved. If one is asked to construct a mathematical object, one can sometimes create an artificial uniqueness by imposing extra properties on what one is constructing, or restrictions on how one goes about constructing it (such as always trying the simplest thing first, whatever that may mean in the given context). But it is not always easy.
To illustrate this, here is a simple problem where the non-uniqueness seems to create difficulties for a computer: find an injection from
to
. In a later post I shall discuss this example, but for now let me just try to explain why it is difficult. The reason is that if you want to approach it systematically, then you need to choose values of the function for each pair of integers
in turn. But what does “in turn” mean? It seems to require you to put the pairs in order, and that is rather close to what you were asked to do in the first place. (However, “rather close” turns out not to mean “identical”.) I mention this problem here just to suggest in a tentative way that non-uniqueness of this kind may also be a major contributor to our perceptions of depth in mathematics.
