WHEN MUST ONE STRENGTHEN ONE'S INDUCTION HYPOTHESIS? ──────────────────────────────────────────────────── Anders Lundstedt (joint work with Eric Johannesson) Extended version of a manuscript for a talk at the Stockholm logic seminar on Nov 29, 2017 I should start by mentioning very recent research (published this month) which is very much related to what I will present: Hetzl and Wong (2017): "Some observations on the logical foundations of inductive theorem proving". Eric and I have independently from Hetzl and Wong arrived at considerations similar to the considerations in their paper. Hetzl and Wong look at the problem from an automated theorem proving perspective. Our perspective is more philosophical, in the sense that we are mainly interested in how well we can make precise sense of the imprecise notion of "proof by a strengthening of the induction hypothesis" and of the imprecise thoughts (entertained in mathematical practice) that this or that fact "must be proved, or is most easily proved, by strengthening the induction hypothesis". Thus I do not think that Hetzl and Wong's paper made our research superfluous. At least we do have some results not present in their paper. ∗∗ Let us start by looking at two proofs that one could intuitively call "proofs by a strengthening of the induction hypothesis". ∗∗ We have 1+1/2²+1/3²+⋅⋅⋅+1/(n+1)² < 2 for all natural numbers n, or, avoiding ellipsis notation, the following. ∗∗ DEFINITION. f₁ : ℕ → ℚ is recursively defined by f₁(0) := 1, f₁(n+1) := f₁(n)+1/(n+2)². ∗∗ FACT 1. f₁(n) < 2 for all natural numbers n. ∗∗ PROOF ATTEMPT. "Straightforward" induction: – Base case: f₁(0) < 2 ≡ 1 < 2 (by definition). – Induction step: f₁(n+1) = f₁(n)+1/(n+2)² (by definition) < 2+1/(n+2)² (by induction hypothesis). But 2+1/(n+2)² ≮ 2 for any natural number n so how do we proceed from here? ∗∗ Let us abort this proof attempt and instead prove Fact 1 by proving a "stronger" fact, like follows. ∗∗ ACTUAL PROOF. It clearly suffices to prove the "stronger" fact f₁(n) ≤ 2-1/(n+1) for all natural numbers n. (In what sense is this fact stronger than Fact 1? It is at least stronger in the sense that for all f : ℕ → ℚ, f(n) ≤ 2-1/(n+1) for all n implies f(n) < 2 for all n, while the converse implication does not hold for all f : ℕ → ℚ.) We prove this fact by induction. – Base case: f₁(0) ≤ 2-1/(0+1) ≡ 1 ≤ 1 (by definitions). – Induction step: f₁(n+1) = f₁(n)+1/(n+2)² (by definition) ≤ 2-1/(n+1)+1/(n+2)² (by induction hypothesis) = 2-1/(n+2)-1/(n+1)(n+2)² (by lots of elementary arithmetic) ≤ 2-1/(n+2) (by more or less obvious facts). □ ∗∗ It seems that the induction hypothesis was "not strong enough" when we tried to prove the induction step in the aborted proof attempt. In the actual proof, we proved, by induction, a fact which was "stronger", in some intuitive sense, than the fact we wanted to prove. In particular, in the actual proof, the induction hypothesis turned out to be "strong enough" in the proof of the induction step. Thus, the common practice of calling this proof for something like "a proof by a strengthening of the induction hypothesis" seems well motivated. ∗∗ Let us look at another example of "a proof by a strengthening of the induction hypothesis". We have that the sum of any initial segment of the odd numbers is a perfect square, that is, for all natural numbers n: 1+3+5+⋅⋅⋅+(2n-1) = k² for some natural number k, or, avoiding ellipsis notation, the following. ∗∗ DEFINITION. f₂ : ℕ → ℕ is recursively defined by f₂(0) := 0, f₂(n+1) := f₂(n)+2n+1. ∗∗ FACT 2. For all natural numbers n: f₂(n) = k² for some natural number k. ∗∗ PROOF ATTEMPT. "Straightforward induction": – Base case: f₂(0) = 0 = 0² (by definitions) so f₂(0) = k² for k = 0. – Induction step: f₂(n+1) = f₂(n)+2n+1 (by definition) = k²+2n+1 (for some k, by induction hypothesis). But k²+2n+1 is not a perfect square for all natural numbers k and n so how do we proceed from here? ∗∗ Let us abort our proof. We instead prove Fact 2 by again proving a fact that is in some sense "stronger" than the fact we want to prove. ∗∗ ACTUAL PROOF. It suffices to prove the following "stronger" fact, of which Fact 2 is a logical consequence. f₂(n) = n² for all natural numbers n. (This fact is not a logical consequence of Fact 2, so it is stronger than Fact 2 in at least that sense.) We prove this fact by induction. – Base case: f₂(0) = 0² ≡ 0 = 0 (by definitions). – Induction step: f₂(n+1) = f₂(n)+2n+1 (by definition) = n²+2n+1 (by induction hypothesis) = (n+1)² (by elementary arithmetic). □ ∗∗ Again, it seems well motivated to call this proof for something like "a proof by a strengthening of the induction hypothesis". ∗∗ For each of Facts 1 and 2, one might wonder if the aborted proof attempt "could have worked", or if it was in fact necessary to "strengthen the induction hypothesis", that is: (Q1) Must Fact 1 be proved by strengthening the induction hypothesis? (Q2) Must Fact 2 be proved by strengthening the induction hypothesis? These questions are not mathematically precise. To make precise sense of them we have come up with a precise, but perhaps not fully satisfactory, answer to the question which is the title of this talk: (Q) When must one strengthen one's induction hypothesis? Allowing myself to suppress details, our precise answer to (Q) is a definition of a ternary relation, parameterized by first-order languages L "of arithmetic", between L-theories T and L-formulas φ(x) and ψ(x): "ψ(x) witnesses that T proves ∀x.φ(x) with and only with strengthened induction hypothesis". (It should be noted that our definition, though independently arrived at, is more or less the same definition as the one provided by Hetzl and Wong.) Using this relation, we get a precise version of (Q1) for each first-order theory in a language of arithmetic "suitable" for expressing Fact 1 and its proof. Similarly we get multiple precise versions of (Q2). We have some preliminary results toward answers to some precise versions of (Q1). We have positive precise answers to some versions of (Q2). We think one of these precise positive answers to (Q2) should be interpreted as showing that, under certain (quite natural) circumstances, Fact 2 must indeed be proved using a strengthened induction hypothesis. ∗∗ Before going into further details, I want to pause to give some reasons for why we think it is interesting to make precise sense of (Q) and for why we think it is interesting to answer and make precise sense of (Q1) and (Q2): – Making precise sense of (Q) allows us to make precise sense of (Q1), (Q2) and similar imprecise questions. We claim that mathematicians do entertain such imprecise questions, even sometimes answering them. (These answers cannot then be based on proof; more likely such answers are based on intuition and experience.) Some examples in support of this: – Textbooks teach the proof technique of "strengthening one's induction hypothesis", presumably assuming something like "some (natural) facts must be proved, or are more easily proved, by strengthening the induction hypothesis". – When teaching, mathematicans will recognize an assignment problem as more difficult when it seems to "require, or be more easily proved by, a proof by a strengthening of the induction hypothesis". – Work in automated inductive theorem proving routinely talks about "proof by strengthened induction hypothesis" and about the problem of "finding the right induction hypothesis". Furthermore, some of these works routinely claim that this or that fact must be proved "using a strengthened induction hypothesis". Thus we think that making precise sense of (Q) is interesting, since it is a case of making precise sense of imprecise things commonly said or thought in mathematical practice, which we think is of general interest. – Questions (Q1) and (Q2) are of the imprecise form (F) Must one prove a fact from the class of facts Y in order to prove fact X? Here are two more, quite natural, examples of imprecise questions of the form (F): (F1) Must one prove that terms of first-order logic have equally many left and right parentheses in order to prove that formulas of first-order logic have equally many left and right parentheses? (Intuitively: Yes.) (F2) To show Γ ⊬ φ, must one exhibit a countermodel M ⊭ φ? (Intuitively: No. Consider for example Gentzen's proof of PA ⊬ ⊥.) Here is a slight variation on (F), which we we also find interesting: (G) Must one use method M to prove fact X? Some imprecise questions of the form (G) have received considerable attention at MathOverflow (see for example https://mathoverflow.net/q/158823 and https://mathoverflow.net/q/46970): (G1) Must Cantor's theorem be proved using a diagonal argument? (Our intuition: None.) (G2) Must the uncountability of the reals be proved using a diagonal argument? (Our intuition: None.) (G3) Must Gödel's incompleteness theorems be proved using a diagonal argument? (Our intuition: None.) We do not know of any general method for making precise sense of questions of the form (F) nor do we know of a general method for making precise sense of questions of the form (G). We do not even have a clue how to make precise sense of the specific instances (F2) and (G1)–(G3). (For answering and making precise sense of (F1) we have a simple idea, but I will not discuss that here.) These looks to us like very hard problems. Thus we find it interesting to make precise sense of (Q1) and (Q2) and to precisely answer them, since that would at least show that questions of the form (F) are not always impossible to make precise sense of nor are they always impossible to answer precisely. – One could also give some practical motivations. As mentioned, work in automated theorem proving routinely and imprecisely talks about things like "proofs by a strengthening of the induction hypothesis", even sometimes imprecisely claiming that this or that fact "must be proved by strengthening the induction hypothesis". Thus we have the following motivations for why it is interesting to make sense of (Q): – Verify that some imprecise talk in research on automated theorem proving makes sense. – Check, or at least make it possible (in principle) to check, whether some of the imprecise claims in research on automated theorem proving are true (once made precise). – Provide new insight into problems in automated theoring proving. We think that if one is primarily guided by these more practical motivations then the work of Hetzl and Wong should be more relevant than our work. ∗∗ Our goal then is to make precise sense of when one must strengthen one's induction hypothesis. The "when" quantifies over situations the working mathematician can be in when trying to prove some fact. With regard to our question, what is relevant about such a situation is which self-evident or previously established facts the mathematician has at its disposal and what fact the mathematician is trying to prove. We can make this precise by formulating these facts in some logical langauge, so we will have something like this: some logical language L, T = set of L-sentences corresponding to self-evident or previously established facts, φ = L-sentence corresponding to the fact to prove. Given this, the question now is: Must a proof of φ from T be a "proof by strengthened induction hypothesis"? To make this precise we would now want a precise notion of "proof". Here the situation is not as simple. For most choices of L we will indeed have several deductive systems to choose from. But many of these are not accurate models of proof, but rather models of (restricted) provability; for example a cut-free first-order deductive system does definitely not provide an accurate model of the proofs found in mathematical practice. ∗∗ Some variants of natural deduction are perhaps the most accurate models of proof that we have. But even these system have obvious shortcomings: – Natural deduction for some variant of first-order arithmetic cannot model higher-order reasoning. It is true that some (but not all) higher-order reasoning is achievable via encodings, but mathematical proofs do not use such encodings. – I believe that a natural deduction system for dependent type theory might be the most accurate model we have of actual proofs. Still, there are proof methods not natively available in such systems, for example proof using diagrams or proofs using ellipsis terms. ∗∗ Empirically, proofs by "straightforward induction" are of the form ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ φ(0) ∀x,φ(x)→φ(x+1) ─────────────────────── ∀x.φ(x) . ∗∗ Empirically, proofs by "induction with strengthened induction hypothesis" are of the form ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ψ(0) ∀x,ψ(x)→ψ(x+1) ─────────────────────── ∀x.ψ(x) ⋅ ⋅ ⋅ ∀x.φ(x) . ∗∗ If we do not impose any additional constraints on the forms of different types of proofs by induction, then any proof of ∀x.φ(x), in particular proofs of the "proof by strengthened induction" form, can be used to build a proof of ∀x.φ(x) of the "proof by straightforward induction" form: ⋅ ⋅ ⋅ ⋅ ∀x.φ(x) ⋅ ───────── ⋅ ⋅ ∀x.φ(x) ⋅ ───────── ⋅ φ(0) ∀x,φ(x)→φ(x+1) ─────────────────────────── ∀x.φ(x) . ∗∗ We have so far not investigated further from the point of view of syntactic forms of proofs. ∗∗ DEFINITION. The ∗full (first-order) language of arithmetic∗, notation ℒ[full], is the first-order language that for each natural number n has – a constant symbol n, – a function symbol f of arity n+1 for each function f : ℕⁿ⁺¹ → ℕ, – a relation symbol P of arity n for each relation P ⊆ Nⁿ. ∗∗ DEFINITION. The ∗minimal (first-order) language of arithmetic∗, notation ℒ[min], is the ℒ[full]-reduct with signature ⟨0, 1, +, ⋅, <⟩. ∗∗ DEFINITION. A first-order language L is a ∗(first-order) language of arithmetic∗ if and only if L is an ℒ[min]-expansion and an ℒ[full]-reduct. ∗∗ Thus a first-order language of arithmetic has as symbols natural numbers and functions and relations on natural numbers. This is just a convenient "trick" which allows us to easily define the standard model. ∗∗ DEFINITION. Let L be a language of arithmetic. – The ∗standard L-model∗ has domain ℕ and each symbol interpreted as itself. – The L-theory ∗true L-arithmetic∗ is the theory of the standard L-model. – Any subset of true L-arithmetic is a ∗theory of L-arithmetic∗. ∗∗ DEFINITION. Let L be a language of arithmetic and let φ(x) be an L-formula with at most one free variable x. The ∗induction instance∗ for φ(x) is the L-sentence IND(φ) :≡ φ(0)∧∀x(φ(x)→φ(x+1))→∀x.φ(x). ∗∗ Why do we consider such a large class of languages? In some sense it would probably suffice to consider only the minimal language of arithmetic, since most proofs we want to study can be stated in one way or another in that language. But in stating a proof with ℒ[min], we would have to introduce codings and representations for any functions and relations not in the language. This would not only be inconvenient, but it could also allow for the situation where it is no longer very clear if the proof (formalized using ℒ[min]) we study is the most accurate characterization of the informal mathematical proof we want to study. The working mathematician freely introduces new symbols via definitions, as for example we did when stating and proving Facts 1 and 2. Thus we will allow very rich langauges, in order to make it more straightforward to model informal proofs and to make it easier to convince oneself whether a formalization is an accurate characterization of an informal proof. ∗∗ We are now ready for our characterization of "proofs by strengthened induction hypothesis". Again, it should be remarked that the following definition is more or less the same as the one given by Hetzl and Wong. ∗∗ DEFINITION. Let L be a language of arithmetic and let T be an L-theory. Let φ(x) and ψ(x) be L-formulas both with at most one free variable x. Say that ∗ψ witnesses that T proves ∀x.φ(x) with and only with strengthened induction hypothesis∗ if and only if (1) T,IND(φ) ⊬ ∀x.φ(x), (2) T ⊢ φ(0), (3) T ⊢ ψ(0), (4) T ⊢ ∀x,ψ(x)→ψ(x+1), (5) T ⊢ ∀x.ψ(x)→∀x.φ(x). ∗∗ Let me try to motivate this definition. – We are interested in proofs in arithmetic. Thus we consider proofs in a language L of arithmetic. – A proof is allowed to use previously proved results and self-evident facts. We model these previously proved results and self-evident facts as an L-theory T. When modeling an actual proof we would of course have that T is a theory of arithmetic, but we can just as well make this definition more general by not requiring it. – As illustrated before, when proving ∀x.φ(x) by "strengthened induction hypothesis" we first prove a "stronger" statement ∀x.ψ(x), from which we then derive ∀x.φ(x). To model this we let φ(x) and ψ(x) be L-formulas with at most one free variable x. – (1) together with (2) capture the failure of being able to prove, using previously established results and self-evident facts, the inductive step corresponding to the "weaker" statement. – (2) says that we are being able to prove the base case for the "weaker" statement, so that (1) does correspond to a failure of being able to prove the inductive step for the "weaker" statement. – (3) simply says that we can prove the base case for the "stronger" statement. – (4) says that for the "stronger" statement, we can indeed prove the induction step. – (5) says that we can indeed prove our "weaker" statement by proving our "stronger" statement. In particular, (3)–(5) implies T,IND(ψ) ⊢ ∀x.φ(x). which may be seen as saying that a proof by a strengthened induction hypothesis does work to prove our "weaker" statement. ∗∗ Some possible shortcomings of the definition: – We do not capture higher-order reasoning, since we talk about first-order provability. This could be fixed by moving to a higher-order logical system. – We restrict ourselves to formulas in one variable. This could easily be fixed. – We cannot account for "nested" induction proofs. This could possibly be fixed by generalizing the definition to allow for sets of induction instances. To keep things simple, we have so far accepted these shortcomings. ∗∗ DEFINITION. The ℒ[min]-theory PA⁻ is axiomatized by the following. – 0 and 1 are distinct: 0≠1. – Associativity of addition and multiplication: (x+y)+z=x+(y+z), (x⋅y)⋅z=x⋅(y⋅z). – Commutativity of addition and multiplication: x+y=y+x, x⋅y=y⋅x. – Distributivity of addition over multiplication: x⋅(y+z)=x⋅y+x⋅z. – 0 is an additive identity and a multiplicative zero: x⋅0=0. – The order is irreflexive: x≮x. – The order is transitive: x 0 if and only if zₙ > 0, p > q if and only if p-q > 0. ∗∗ DEFINITION. ℤ[X]⁺ = ⟨ℤ[X]⁺, 0, 1, +, ⋅, <⟩ is the non-negative part of ℤ[X]⁺, that is, a polynomial p from ℤ[X] is in ℤ[X]⁺ if and only if p ≥ 0. ∗∗ Elements of ℤ[X]⁺ are polynomials z₀+z₁X+z₂X²+⋅⋅⋅+zₙXⁿ with z₀,z₁,...,zₙ in ℤ and zₙ ≥ 0 and if n ≠ 0 then zₙ ≠ 0. The constant polynomials are n (n in ℕ) and the non-constant polynomials are z+pX (p in ℤ[X], z in ℤ, p > 0). ∗∗ FACT. ℤ[X]⁺ ⊨ PA⁻. ∗∗ DEFINITION. The language of arithmetic L₂ is ℒ[min] expanded with the function symbol f₂. ∗∗ DEFINITION. The theory of arithmetic T₂ is defined by T₂ := PA⁻∪{f₂(0)=0, ∀x.f₂(x+1)=f₂(x)+2x+1}. ∗∗ DEFINITION. The L-formulas φ₂(x) and ψ₂(x) are defined by φ₂(x) :≡ ∃y.f₂(x)=y⋅y, ψ₂(x) :≡ f₂(x)=x⋅x. ∗∗ FACT. ψ₂ witnesses that T₂ proves ∀x.φ₂(x) with and only with strengthened induction hypothesis. PROOF. We need to verify conditions (1)–(5). (2) Trivial. (3) Trivial. (4) The proof of Fact 2 given earlier can be straightforwardly formalized in PA⁻. (5) Trivial. (1) We exhibit an L₂-model satisfying T₂ and IND(φ₂) but not ∀x.φ₂(x). Let M be the L₂-expansion of ℤ[X]⁺ with f₂ interpreted as follows. f₂ᴹ(0) := 0, f₂ᴹ(n+1) := f₂ᴹ(n)+2n+1, f₂ᴹ(pX-1) := pX² f₂ᴹ(pX-1+(n+1)) := f₂ᴹ(pX-1+n)+2(pX-1+n)+1, f₂ᴹ(pX-1-(n+1)) := f₂ᴹ(pX-1-n)-2(pX-1-n)+1. The right hand side of the last equation does indeed define a polynomial of ℤ[X]⁺ since by construction the degree of f₂(pX) is larger than the degree of pX for all p in ℤ[X]⁺. (We get the last equation by solving f₂(pX-1-n) = f₂(pX-1-(n+1)+1) = f₂(pX-1-(n+1))+2(pX-1-(n+1))+1 for f₂(pX-1-(n+1)).) We have f₂ᴹ(X-1) = X² which is a perfect square in M and we have f₂ᴹ(X) = f₂ᴹ(X-1)+2(X-1)+1 = X²+2X-1 which is not a perfect square in M. Thus: – M ⊭ ∀x.φ₂(x) since M ⊭ φ₂(X). – M ⊨ IND(φ₂) since M ⊭ ∀x,φ₂(x)→φ₂(x+1) since M ⊨ φ₂(X) but M ⊭ φ₂(X+1). By construction we also have M ⊨ T₂ so we are done. □ ∗∗ DEFINITION. g : ℕ → ℕ and h : ℕ → ℕ are recursively defined by g(0) := 1, g(n+1) := (n+2)²g(n)+h(n), h(0) := 1, h(n+1) := (n+2)²h(n). ∗∗ We have f₁(n) = g(n)/h(n) so the inequality f₁(n) < 2 can be rewritten as g(n) < 2h(n). Similarly the inequality f₁(n) ≤ 2-1/(n+1) becomes (n+1)⋅g(n) ≤ (2⋅n+1)⋅h(n). ∗∗ DEFINITION. The language of arithmetic L₁ is ℒ[min] expanded with the function symbols g and h. ∗∗ DEFINITION. The L₁-sentences DEF(g) and DEF(h) and the L₁-formulas φ₁(x) and ψ₁(x) are defined by DEF(g) :≡ g(0)=1∧∀x.g(x+1)=(x+2)²g(x)+h(x) DEF(h) :≡ h(0)=1∧∀x.h(x+1)=(x+2)²h(x), φ₁(x) :≡ g(x)<2h(x), ψ₁(x) :≡ (x+1)g(x)+h(x)≤2(x+1)h(x). ∗∗ CONJECTURE. There is an L₁-theory of arithmetic T such that (1) T ⊢ DEF(g), (2) T ⊢ DEF(h), (3) ψ₁ witnesses that T proves ∀x.φ₁(x) with and only with strengthened induction hypothesis. ∗∗ For future work, we would of course like to settle this conjecture. Adapting the previous proof——cleverly interpreting function symbols on ℤ[X]⁺——will not work to settle this conjecture, as the unique L₁-expansion M of ℤ[X]⁺ that satisfies DEF(g) and DEF(h) has gᴹ(p) = hᴹ(p) = 0 for all non-constant polynomials p. In fact, if M ⊨ DEF(h) is the non-negative part of a discretely ordered commutative polynomial ring R[X] then hᴹ(p) = 0 for all non-constant polynomials——for if hᴹ(p) ≠ 0 for some non-constant polynomial p then the degrees of hᴹ(p), hᴹ(p-1), hᴹ(p-2), ... would form an infinitely descending chain of natural numbers. It does not help if M instead is the non-negative part of a discretely ordered commutative ring R[X,X⁻¹] of Laurent polynomials——in that case if hᴹ(p) ≠ 0 for some non-constant polynomial p then there is a natural number n such that the degree of hᴹ(p-n) is less than its order. ∗∗ We expanded ℤ[X]⁺ to an L₂-model M in order to show that T₂ proves ∀x.φ₂(x) by necessarily non-analytic induction. For each sentence σ of true L₂-arithmetic false in M it is natural to ask whether adding σ to T₂ lets us prove ∀x.φ₂(x) with an analytic induction proof. One simple sentence of true L₂-arithmetic that is false in M is "all numbers are odd or even", that is σ₂ :≡ ∀x∃y: x = y+y ∨ x = y+y+1. ∗∗ CONJECTURE. ψ₂ witnesses that T₂∪{σ₂} proves ∀x.φ₂(x) with and only with strengthened induction hypothesis. ∗∗ To more systematically settle conjectures like the one above one could attempt to establish more general results——instead of hand-crafting countermodels for each particular case. We hope that the literature on first-order arithmetic already contains lots of applicable results.