Diagonalisation & Paradox




Diagonalisation is a method of indirect proof (via reductio) developed by Georg Cantor. It is difficult to overestimate the importance of the method. It is the basis for transfinite number theory, set theory, the set theoretic paradoxes, the semantic paradoxes, Gödel’s incompleteness theorem, Tarski’s indefinability theorem, the Church-Turing thesis, decidability results,  etc.


The method is called ‘diagonalisation’ after the intuitive geometrical presentation of Cantor’s results on the cardinality (size) of sets, and real numbers in particular. Informally, the method consists in the demonstration that the assumption that a given ‘realm of objects’(let us keep to sets of numbers, although the procedure applies also to formulae; see appendix I) satisfying some condition is denumerable - in 1:1 correspondence with the natural numbers - leads to contradiction, i.e., one can construct a set that satisfies the condition but which can’t be on one’s ‘list’ on pain of contradiction. The set is formed by a kind of accidental self-reference. By the enumeration, each set has a number, a numerical name, as it were. Therefore, one can define a set in terms of a condition pertaining to the membership relation between sets and their ‘names’. In itself, this is trivial and does not necessarily lead to paradox or contradiction. It was Cantor’s genius to realise that certain conditions can be specified which entail that their satisfying sets can’t be ‘named’.


Formally, the typical technique is a construction of a set numbers defined by an anti-diag formula:


(D) ("x)[x ε D(E) ↔ x Ï Sx]


D(E) is here the defined set formed from the enumeration E; it is defined such that it consists of all those numbers x that are not members of the set S which they ‘name’ in E. For example, 5 is in D(E) just if 5 is not in the fifth set of E.



Cantor’s Theorem: The power set of set S (the set of subsets of S) is greater than S


Hypothesis: P(S) £ S

(The power set of S is less than or equal to S.)


Assume P(S) < S.

(The power set of S is less than S.)


But, ("x)($y)[x ε S ↔ y f S & x ε y], i.e., y = the unit set {x}.

(Anything is a member of S just if it is a member of a subset of S.)


Therefore, P(S) Û S.

(The power set of S is not less than S.)


Assume P(S) = S.

(The power set of S is equal to S.)


Therefore, ("x)($y)[x ε S → y f S & f(x) = y];

(Anything is a member of S only if there is a subset of S to which it corresponds. Let ‘f(x) = y’ mean ‘the set to which x corresponds is  y.)


It follows that x ε f(x) v x Ï f(x).

(x is either a member of the set it corresponds to or isn’t.)


Take the set: {x: x Ï f(x)}.

(The set of members of S which are not members of the sets to which they correspond.)


So, ($z)[f(z) = {x: x Ï f(x)}]

(There is some member of S which corresponds to the set of members of S which are not members of the sets to which they correspond.)


Assume f(a) = {x: x Ï f(x)}.

(Let a be the member of S that corresponds to the set of members of S which are not members of the sets to which they correspond.)


Assume a ε {x: x  Ï f(x)}.

(Let a be a member of the set of members of S which are not members of the sets to which they correspond.)


Contradiction: a corresponds to and is a member of the set of objects which are not members of the sets to which they correspond.


Assume a Ï {x: x Ï f(x)}.

(Let a not be a member of the set of members of S which are not members of the sets to which they correspond.)


Contradiction: a  corresponds to and is not a member of the set of objects which are not members of the sets to which they correspond.


Therefore, ¬($z)[f(z) = {x: x Ï f(x)}]


Therefore, ¬("x)($y)[x ε S → y f S & f(x) = y].


Therefore, ¬(P(S) = S)


Therefore, P(S) > S


Russell’s Paradox


Sainsbury’s Reading


Sainsbury seeks to support Russell’s reasoning via an appeal to Cantor’s Theorem; that is, the generation of paradox in Russell’s case is legitimate, for the same diagonal reasoning is employed in Cantor’s Theorem.  This is a total misreading of the situation.


For Russell, Cantor’s ‘Theorem’ was a paradox. Consider:


There is a greatest of all infinite numbers, which is the number of all things altogether, of every sort and kind. It is obvious that there cannot be a greater number than this, because if everything has been taken, there is nothing left to add. Cantor has a proof that there is no greatest number… But in this point, the master has been guilty of a very subtle fallacy, which I hope to explain in some future work.

¾ Russell, ‘Mathematics and Metaphysicians’, 1901




In other words, whereas one can ‘diagonalise out’ new sets, certainly up to and beyond ±0, one can’t diagonalise the putative set of everything. In a letter to Frege (June 1902), Russell said he had an argument for the class of all things being of the same cardinality as the class of all classes. (Presumably, Russell’s argument traded on the thought that classes are ‘things’; if there were more classes than things, then classes wouldn’t be things.)  Cantor, on the other hand, was utterly sanguine about an open-ended sequence of alephs.


Russell was “led to [the Russell paradox] in the endeavour to reconcile Cantor’s proof that there can be no greatest cardinal number with the very plausible supposition that the class of all terms has necessarily the greatest possible number of members” (Principles, §100, 1903).


By the time of Principles, Russell had softened: there was nothing wrong with Cantor’s reasoning, but it leads to contradiction when confronted with “large classes”. Should we dump such classes? “[I]t is undesirable to adopt so desperate a measure as long as hope remains of some less heroic solution” (ibid. §344). 


Russell’s Paradox for Cantor, 1903


(1) Let S be Russell’s class of everything, i.e. the set of all sets.


(2) Let f: U µ P(U) such that f(x) = x, if x is a set; f(x) = {x} otherwise.


(3) Applying Cantor’s anti-diag construction to S gives us R, i.e., the class of all classes that are not members of themselves.


(4) Is there an x such that f(x) = R?


(5) By Cantor’s reasoning there can be no such x, i.e., the lack of such an x shows that there are more subsets than members - R is ‘diagonalised out’.


(6) But, transparently, by (2), f(R) = R.


(7) So, where x = R, if x ε f(x), then R ε R.


(8) If R is a member of R, then it is a member of itself, but R consists of all and only those sets which are not members of themselves. So R can’t be a member of R.


(9) But, if x Ï f(x), then R Ï R.


(10) If R is not a member of R, then it is not a member of itself, but  R consists of all and only those sets which are not members of themselves. So R must be a member of R.


(11) Therefore, there is no R.


(12) Therefore, there is no diagonalisation from S.


Russell only later realised that the paradox applies not just to Cantor’s reasoning in particular, but to his own assumptions about sets. Hence we have the paradox in its familiar form (expressed in a letter to Frege (June 1902) and published in Appendix A of Principles).


Frege’s ‘Comprehension Axiom’ (Axiom V of The Basic Laws): ("F)($y)[Fxx ε y]




x ε {x: R Ï R}↔ x ε R,


This gives us the contradiction:


R ε R &  R Ï R


Therefore, ¬("F)($y)[Fxx ε y]






Zermelo-Fraenkel Set Theory


Russell’s paradox is not a refutation of Cantor, for Cantor simply didn’t accept a set of everything. Russell requires an independent argument for the coherence of such a notion.


In ZF, Russell’s paradox is avoided by two axioms.


Axiom of Foundation

("x)($y) ¬($z)[x ¹ {x: x ¹ x}→ z ε y & y ε x &  z ε x]

(Every non-empty set x has a minimal member y such that there is no member z of y that is a member of x.)


Axiom of Replacement (schema)

f is a function → ("z)($y)("x)[x ε y ↔ ($w)[w ε z & f(w) = x]]

(For every set z, there is a set y such that, for all x, x is a member of y just if there is a member w of z and x is a value of a function on w. In other words: every set can be specified in terms of a function defined over the subsets of a set.)


These axioms suffice to block Russell’s Paradox and, with the rest of ZF+C, are not known to be inconsistent.


Sainsbury suggests that we need a philosophical resolution to the paradoxes, not ad hoc stipulations (axioms?), and Russell, as opposed to Cantor, presumably, is to be credited with seeking such a resolution. This is spurious on three grounds.


(1) The ‘paradoxes’ are formal in the first place insofar as we are concerned with set theory. The only problem is: if sets cannot be defined as intelligible aggregates, i.e., if Cantor/Frege naïve set theory is inconsistent, then what should be our conception of a set? The air of paradox is removed by so little as saying that the Russell paradox was a reductio of naïve set theory.


(2) AF and AR are neither ad hoc, nor merely stipulative. They express the idea that a set is a cumulative object, i.e., the members of a set are formed/specifiable independently of the set itself.


(3) Cantor was well aware that his own diagonalisation method could be applied to his sets. He corresponded with Hilbert about the ‘Burali-Forti paradox’ in 1896 - a year before Burali-Forti’s publication. Besides, it is well appreciated in the literature that Cantor presumed well-founded sets insofar as he took all sets to be reducible to ur-elements. The same can’t be said for Frege.






The Liar




(i) Gaps

Strengthened liar


(ii) Hierarchy

Artificial. No evidence for the hierarchy. The cure is worse than the disease.


(iii) Syntax

(a) Too inclusive

(b) Syntactic self-reference is a legitimate technique in logic/maths

(c) The liar doesn’t depend on syntactic self-reference anyhow.



Comments on Hierarchy and Tarski


Sainsbury reads Tarski as arguing that “our actual concept of truth is incoherent”; it “needs to be replaced by a series of concepts of truth, hierarchically arranged, and each expressed in a language different from any natural language” - “This is the famous “Tarski hierarchy””. All of this is nonsense.


(1) Tarski didn’t think that colloquial truth is incoherent, or that natural languages are inconsistent.




At first blush it would seem that [“everyday”] language satisfies [the conditions for the generation of paradoxes], that that therefore it must be inconsistent. But actually the case is not so simple. Our everyday language is certainly not one with a specified structure. We do not know precisely which expressions are sentences, and we know to an even smaller degree which sentences are to be taken  as assertible [Tarski means ‘provable’]. Thus the problem of consistency has no exact meaning with respect to this language. We may at best only risk the guess that a language whose structure has been exactly specified and which resembles our everyday language as closely as possible would be inconsistent.

¾ Tarski, The Semantic Conception of Truth, 1944.


(2) Tarski had no interest in offering the correct ‘theory’ of truth. Consider:


I do not have the slightest intention to contribute in any way to those endless, often violent discussions on the subject ‘What is the right conception of truth?’ I must confess that I do not understand what is at stake in such disputes; for the problem itself is so vague that no definite solution is possible. In fact, it seems to me that the sense in which the phrase ‘the right conception’ is used has never been made clear. In most cases one gets the impression that the phrase is used in an almost mystical sense based upon the belief that every word has one ‘real’ meaning (a kind of Platonic or Aristotelian idea), and that all the competing conceptions really attempt to catch hold of this one meaning; since, however, they contradict each other, only one attempt can be successful, and hence only one conception is the ‘right’ one.

(op. Cit.)


(3) There is no Tarski hierarchy.


A Tarski definition of a truth predicate (for objectlanguage L in metalanguage L+) is, in effect, a relative consistency proof for L. That is, the definition shows that L is consistent relative to L+. Contrapositively: if we couldn’t consistently introduce a truth predicate into L+, then L would be inconsistent.


Tarski wasn’t the least bit interested in hierarchies. He was concerned to offer rational reconstructions of semantic notions (truth, definition, consequence, etc.) in terms of which we can consistently theorise the properties of specific formal systems (especially, for Tarski, arithmetic, algebra, and set theory). Thus, the import of a definition of a truth predicate is exhausted by the specifiable consequences for the objectlanguage. The lack of a general definition of truth is no lack at all for Tarski.



Mind the Gap


(1) A ‘gap’ account faces perhaps a graver problem than the strengthened liar. A gap account is necessarily one which says that it’s not the case that the liar is true or it’s negation is true:


(G) ¬[Tr(k) v Trk)].


But G must fall into any gap into which the liar falls. Hence, a gap theory can’t cover the putative truth of its own thesis.

     Kripke recognised this deficiency of gaps and conceded that our intuitive truth concept is bivalent: “[W]e still cannot avoid the need for a metalanguage … the goal of a universal language seems elusive… [a Tarskian predicate] express[es] the genuine concept of truth [for L]” (‘Outline of a Theory of Truth’, 1975)


(2) Sainsbury is quite wrong to treat the truth teller - This sentence is true - as on a par with the liar. The truth teller falls into no gap; rather, it’s truth value is arbitrary (in Kripke’s language, it can be in the extension of TRUE or FALSE at any fixed point). The liar tells us that the union of truth and our logic is potentially inconsistent. The truth teller tells us that the union of truth and our logic is incomplete.




A Solution


Whereas gap accounts say that the liar is neither true nor false, it might be more felicitous to say that the liar is true and false. This just means that we accept that any ‘global’ truth concept admits diagonalisation. The price we pay is that  ‘true’ has no fixable extension.



Appendix I: Godel’s Diagonalisation


Gödel’s ‘diagonal argument’ for the incompleteness of arithmetic proceeds along a different course. Gödel proves a ‘diagonal lemma’ that effects ‘self-reference’ within the theory T (a first-order axiomatisation of PA).


Gödel first constructs a diagonalisation of an expression v:


(D-v) ($x)[x = évù & v],


where, if v is a formula of arithmetic with only x free, then (D-v) says that v is true of its gödel-number, i.e., (D-v) is true iff v is true of its gödel-number.


Gödel further constructs a function diag(x) such that if n is the gödel-number of v, then diag(n) is the gödel-number of the diagonalisation of v.


Gödel proves that, for a theory T in which diag(x) is representable, for all formulae F(y), with just y free,  there is a sentence G such that


T G ↔ F( é G ù )


In other words, for every open formula of arithmetic F(y), one can construct a sentence, via the substitution of a numeral for y, that is provably equivalent to the sentence whose gödel-number has the numeral for which y was substituted.


In particular, one can construct a formula - ¬($x)[B(x, y)] - which says that there is no x such that x is a proof of y. The above results show that, where the numeral for the gödel-number g of the formula is substituted for ‘y’, there will be a provably equivalent sentence G such that, if G were provable, then there would be a proof with gödel-number k, hence B(g, k), and ‘($x)[B(x, k)]’ would be provable, contradicting the consistency of T. If ¬G were provable, then either B(0, k) or B(1, k) or…, in which case ‘($x)[B(x, k)]’ would be provable, contradicting the consistency of T. Therefore, there is a formula of T such that, on the assumption that T is consistent, neither it nor its negation is provable.


Appendix II: Löb’s Paradox


How to ‘prove’ that God exists by just using propositional logic with identity.


(D) If D is true, then God exists.


1 (1) D is true                                                         A

1 (2) ‘If D is true, then God exists’ is true             1 substitutivity =

1 (3) If D is true, then God exists                           2 T↔

1 (4) God exists                                                      1,3 MPP

   (5) If D is true, then God exists                           1,4 CP

   (6) ‘If D is true, then God exists’ is true              5 T↔

   (7) D is true                                                          6 substitutivity =

   (8) God exists                                                      5,7 MPP


It would appear that we can prove anything we like! But…


If we read the conditional materially (i.e. false just if the antecedent is true and the consequent is false), then:


(a) If God exists, then D is true, for the consequent of D is true, which suffices for the truth of D.


(b) If God doesn’t exist, then D is paradoxical.

(i) Assume D is true. Since D’s consequent is false, it follows that its antecedent must also be false if D is to be true. But the antecedent says that D is true. Therefore, D can be true only if D is false.


(ii) Assume D is false. Since D’s consequent is false, it follows that its antecedent must be true if D is to be false. But the antecedent says that D is true. Therefore, D can be false only if D is true.