This is a mathematical post rather than a Tricki post, but the title comes from the fact that the issue I want to discuss has arisen because I have made a statement in a Tricki article and can’t quite justify it to my satisfaction.

The article in question is about a useful piece of advice, which is that if one is having trouble proving a theorem, it can help a lot to try to prove rigorously that one’s approach cannot work. Sometimes one realizes as a result that it *can* work, but sometimes this exercise helps one to discover a feature of the problem that one has been wrongly overlooking. I have a few examples written up, but one of them is giving me second thoughts.

The example in question is this. Let be a subset of for some prime . Define a *gap* in *of length* to be a set of integers mod that contains no elements of . A nice (and well-known) question one can ask is the following. Let be of size at least . How small can one make the largest gap if one dilates ? That is, if we write for the size of the largest gap in a set , what is the largest possible value of over all sets of size at least ? (Here stands for the set .)

If you take a random set of size , then with high probability the largest gap in any dilate of will have size at most , where depends on only. After a bit of experimenting, it becomes natural to conjecture that this is the best possible bound. But is it?

This looks like a combinatorial question until one observes the following. Let be a prime of the form and let be the set of all non-zero quadratic residues mod . Then every dilate of is either the set of all non-zero quadratic residues or the set of all non-zero quadratic non-residues. Suppose it is the case that all of are quadratic residues mod . Then, since is a quadratic non-residue, all of are quadratic non-residues. Therefore, every dilate of has a gap of length . So the conjecture above will be false if there are infinitely many primes congruent to mod such that the first numbers are quadratic residues for some that tends to infinity faster than .

Suddenly we have a number-theoretic question to solve, and it turns out to be a well-known unsolved problem. Nobody has even ruled out that there is some such that for infinitely many primes all the numbers from to are quadratic residues mod . Even with the help of the Generalized Riemann Hypothesis this has been improved to rather than .

I want to claim that this example demonstrates that the original problem is “not really” a combinatorial problem. But to do that, I have to argue against someone whose reaction to it is, “Ah, this shows that an apparently number-theoretic problem is in fact a combinatorial one.” This is hard, because there *are* some very nice examples of number-theoretic statements that have been hugely generalized and then solved combinatorially. So what is it about this problem that makes it appear that purely combinatorial arguments are doomed to fail?

One relevant factor is that many combinatorial tools apply to large systems of sets. Initially, it looked as though the set of dilates of was a large set system, but the example of quadratic residues shows that it can in fact be very small. This fact does seem to lie behind the general feeling (or at least, I think it is a general feeling) that the problem cannot be solved combinatorially, but I would like a more general answer to the following question. Suppose you are trying to prove a theorem of the form “For every , .” Then suppose you come across a particular example of an for which you cannot prove . When is it clear that you will not be able to tackle the general theorem without first having a thorough understanding of why it is true for ?

Clearly the answer is not “always”. For example, might just be an example that is sufficiently complicated not to have any special features. It may then be slightly easier (for notational reasons, say) to think about , but thinking about isn’t interestingly different from thinking about the general case in such situations. One relevant feature of the example I have just discussed is that the appropriate tools for tackling the question about quadratic residues seem to be number-theoretic, which makes the special case *different* from the general case. Another sign of that is that one can easily imagine a solution to the question about quadratic residues not helping all that much to solve the general problem. But why isn’t that an argument for aiming at the general problem and ignoring the particular one?

Another feeling I have about this question, but can’t do proper justice to, relates to the discussion in the previous post. If the conjecture is true, then you can’t find a prime such that the first integers are quadratic residues for a very large . However, we can create something like a “probably inconsistent model” in which such a prime does exist, and then for any combinatorial argument about a general set , we can interpret it as referring to our fictitious that consists of quadratic residues modulo our strange prime . And in some sense our combinatorial arguments “never realize” that the model is inconsistent. Are there any reverse mathematicians out there who can make sense of this? Is it conceivable that there is some weak axiom system that is sufficient for combinatorial arguments, in which one might possibly not be able to prove that such a prime cannot exist?

January 20, 2009 at 3:07 pm |

It is known that there are infinitely many primes for which the smallest quadratic non-residue is at least c log p, for some positive c, and probably this can be ensured with the condition that p is 3 modulo 4. (This goes back to Chowla and Turan, and can be guessed by considering the values of the Legendre symbol (x/p) for small x, and checking that, essentially, they behave like independent Bernoulli random variables for x up to a small constant times log p). Moreover, on GRH, Montgomery has improved this to c(log p)( log log p), so the natural probabilistic guess is (probably) not true for this problem. (And this tends to confirm that the problem is not purely combinatorial, if we believe that the Riemann Hypothesis is not combinatorial).

January 20, 2009 at 3:11 pm |

Thanks for that — I rather suspected that my knowledge about this problem was incomplete. So perhaps my question should be slightly modified: why does it seem hopeless to find a combinatorial proof even of the weaker assertion that there is a dilate such that the gap has size at most ?

January 20, 2009 at 5:44 pm |

Well, from the point of view of classical analytic number theory, the $p^{o(1)}$ bound for the least quadratic non-residue (which is known in that area as a conjecture of Vinogradov) is considered out of reach at the moment because all “standard” attack methods depend on results which more or less directly involve properties of the L-functions which are considered extremely deep. (If not the Riemann Hypothesis, it can be the Lindelöf Hypothesis, or estimates for very short exponential sums).

Actually, that doesn’t mean there couldn’t be a clever combinatorial re-formulation that would lead to a proof of this conjecture. For instance, typically, the standard approaches lead to much more precise results than a bound just for the least quad. non-residue: they lead to equidistribution of residues and non-residues in very small sets; conversely, assuming conjectures on the size of this least quad. non-residue does not imply much about L-functions (as far as I know). So maybe it’s not “hopeless” to go around the L-functions, but at least this special case shows that the problem lies quite deep.

(Maybe an analogy could be drawn with the proof of RH for varieties over finite fields: Grothendieck’s “programme” for that was highly geometric, and maybe it was felt at some point that it was hopeless to look for an arithmetic proof, but in the end Deligne’s work did involve a lot a arithmetic — and a lot of geometry, but it didn’t involve proving first the intermediate results Grothendieck had in mind).

January 20, 2009 at 6:13 pm |

I know that when the sum-product phenomenon was discovered, and Jean Bourgain used it to show that thin sets of with multiplicative structure (e.g. multiplicative subgroups) expanded rapidly under addition to fill the whole group, there was a bit of excitement as to whether one could show the reverse direction (i.e. whether thin sets of with additive structure, such as an interval, expanded rapidly under multiplication to fill the whole multiplicative group), since this would in particular say something about the least quadratic non-residue (in particular, the analogue of Bourgain’s results would give a bound of ). But it turns out that the techniques are not symmetric (Bourgain crucially uses at a key point that multiplication distributes over addition, whereas addition does not distribute over multiplication), and besides, the ultimate ingredients used in the sum-product analysis are too “elementary”, and do not seem to point out any obvious inconsistency (or at least improbability) in an alternate universe in which there was a large p with a big interval of quadratic residues.

It seems to me that any proof of the least quadratic nonresidue problem must, at some point, introduce a new (or at least underused) fact about quadratic residues that was not fully exploited in the past, as it seems that having a large least quadratic nonresidue is perfectly compatible with the “standard” facts we know about residues, though there is no real formalisation of this that I know of. (For instance, my theorem with Ben that the primes contained arbitrarily long arithmetic progressions used extremely little number theory, but it still needed to know something moderately non-trivial about the primes, namely that (roughly speaking) they were a dense subset of a pseudorandom set of almost primes – a fact which had been implicitly used a couple times in the past, e.g. in papers of Ramare and of Green, but perhaps not as systematically as it should have been.)

January 20, 2009 at 7:26 pm |

Actually, I just read (I didn’t know it before…) that Graham and Ringrose have proved unconditionally that the least quad. non-residue is at least c(log p)(log log log p) for infinitely many primes, so the log p probabilistic estimate is indeed false.

January 20, 2009 at 10:58 pm |

For the general question—when it is clear that you need to understand a specific X before you can prove it for all X—I am tempted to say that the answer is, “Never.” There are certainly cases where it *seems* that you need to understand the particular case before you can hope to prove the general case, and the quadratic nonresidue problem may be an example. But is this *clearly* true? I’m skeptical. For comparison, note that many people felt that the implication “every elliptic curve is modular => Fermat’s Last Theorem” showed that it was hopeless to prove that every elliptic curve was modular until we understood enough number theory to prove Fermat’s Last Theorem. Strong intuitions can be wrong and different people can have different intuitions about these kinds of things.

The one exception to my “Never” would be a situation where you can easily deduce the general case from the special case.

January 21, 2009 at 7:56 am |

Certainly generalisation to a more abstract setting that “forgets” some of the original structure can be a powerful technique – though in order for such an argument to not be easily transferable back to the original concrete setting, the argument must somehow exploit a freedom in the abstract setting that is not easily available in the concrete special case. For example, one could imagine working on the gap problem by some sort of iterative procedure that starts with an arbitrary set and performs some operation to increase its min-max gap until it is extremised, at which point some useful structural information on that set is obtained. This is not an approach one would think of if one insisted only on working on the set of quadratic nonresidues. (Though it may end up that the quadratic nonresidues are already somehow the “worst” case, being so self-similar; one may even hope to formalise this with some sort of equivalence between the gap problem and the least quadratic nonresidue problem, though I don’t see how to achieve this. Curiously, Ben Green has recently had some thoughts along these lines, specifically that quadratic residues may somehow be the only obstruction to improvements to the large sieve inequality.)

It is of course quite difficult, as Timothy points out, to definitively rule out the utility of tackling a general case first before looking at the special case, because there may be ways to exploit the generality that one simply has not thought of yet. The one exception I can see is if the general case has an important “near-counterexample” that forces any proof of that general case to be very delicate, whilst the special case, being somehow “very far” from this counterexample, looks more amenable to cruder and simpler techniques. [Of course, if the general case has a

genuinecounterexample, then it becomes very easy to decide whether to spend any time trying to prove the general case. 🙂 ]January 21, 2009 at 5:04 pm |

Two small remarks. One is that it may well be that the conjecture suggested by the random case is also something like : although you can’t do better than if you want the gaps to be small with high probability, there is quite a bit of independence around, so it is conceivable that probabilistic methods could show that the bound occurs with positive (but small) probability. I don’t know how the GRH proof goes for the quadratic-residue problem, but my guess is that it is philosophically similar and does in fact show (i) that quadratic residues mod behave like random sets as varies and (ii) that for that very reason you expect from time to time.

As for the general case, I suspect that, as has been suggested, it may be hard to give a convincing reason for its being impossible to prove the theorem by purely combinatorial means. However, for any given purely combinatorial attack, it may be possible to use the quadratic-residues example to show that it does not work.

I am trying, but so far failing, to think of a problem where it definitely does happen that a single example kills off all hope of solving the general problem by general means. A first question then might be: if you can’t prove it by general means, then how

canyou prove it? A possible answer to that is that you first do some sort of classification and then deal with each class separately. So there might be examples in group theory, say, where it is hopeless to prove a theorem without using CSFG. But that would still leave the question of whether for any given problem there are clear signs that one has to use a classification. There are examples in group theory of classification-free proofs of theorems that had been thought to need classification, so perhaps the answer to this question is no.In the Tricki article, I wanted to find circumstances where it is

probablybetter to concentrate on the special case, so I’d settle for a reasonably convincing and generalizable argument that that is the case here, even if the argument didn’t establish it beyond all possible doubt, and even if one could think of exceptions to the general rule.January 21, 2009 at 8:59 pm |

Here’s an extreme example of interaction between particular and general case, but which may be useful: if there existed a general algorithmic way to decide whether any given diophantine equation has an integral solution, we know from the existence of a universal equation (a particular case!) that we would get a contradiction (i.e., this is the argument solving the Hilbert 10th Problem).

In the opposite direction, but still in this area, Robinson had shown that the general problem would be unsolvable, provided a single particular “exponential” relation could be encoded in polynomial form, and this had apparently been taken to mean that her approach was doomed to fail…(see the famous review MR0133227 (24 #A3061)).

January 21, 2009 at 9:00 pm |

P.S. I should have written “Davis, Putnam and Robinson”, not simply “Robinson”.

January 22, 2009 at 7:30 pm |

By the way, this may seem like a purely theoretical question, but the problem of how big the gaps can be is very closely related to the analysis of a very practical data structure in computer science, hash tables with linear probing. If one views the “gaps” as parts of the table that are filled in by hash table entries, and the dilation mod p as being a hash function, the gap length tells you what the longest access time can be.

See, e.g., [A. Pagh, R. Pagh, and M. Ruzic. Linear probing with constant independence. In Proc. 39th STOC, pages 318–327, 2007], which shows that hash functions with limited independence achieve good hashing behavior. Here “limited independence” is still rather more independence than you would get by dilating by a single randomly-chosen t, though.

January 24, 2009 at 12:55 am |

Dear Tim and others,

A couple of points. Firstly, there is something on lower bounds for this gap problem in the book of Konyagin and Shparlinski, but I forget exactly what they prove.

Secondly, the p^{o(1)} conjecture, even when delta = 1/2, implies the Kakeya conjecture! I wrote some notes on this a long time ago, and here they are:

http://www.dpmms.cam.ac.uk/~bjg23/restriction_kakeya_phenomena/notes10.ps

The connection is basically due to Bourgain from about 1994.

Tim’s “probably inconsistent model” is very close to the idea of doing analytic number theory in which one admits the possibility of a zero of some L-function (or, worse, a Siegel zero). Occasionally one can work through arguments keeping track of the contribution of the hypothetical zero and it can even help (cf. Heath-Brown’s paper proving twin primes on the assumption there is a very strong type of Siegel zero)

Best

Ben

January 28, 2009 at 2:09 am |

Minor issue: I think you may want p to be 3 mod 4 instead of 1 mod 4 in order to make this work.

[Thanks — I’ve changed it, and also changed Emmanuel Kowalski’s first comment.]February 20, 2009 at 4:44 am |

If we examine the random model in the case of quadratic non-residues, then the expected upper bound should be logp loglogp, the same as the lower bound proven by Montgomery conditional on GRH. The model goes something like the following. The values of the quadratic character is completely multiplicative, and hence determined by its values at the primes. If we assume that the character can be modeled by a random variable which takes the values 1 and -1 both with probability 1/2, then the upper bound x should satisfy 2^pi(x) is the same size as p, where pi(x) is the number of primes up to x. Assume pi(x) = x/log x, and solve for x. (Everything depends on x and p being “large”.)

November 20, 2009 at 11:23 am |

[…] be to start by finding a counterexample to the Littlewood conjecture … (This is an issue that I have discussed before on this blog. Sometimes combinatorial generalizations seem to be no easier to tackle than the […]

September 4, 2021 at 12:57 pm |

Here is what I would do – I think that the answer is what you could call ‘half-way between’ P=NP and P\=NP. Because I actually think that there are 3 levels of infinities in the continuum hypothesis, not that I would know how to prove it.

Calculate the observable volume of the Universe in the Hoyle-Narliker steady state model i.e. integrate up to the particle horizon.

Same calculation for your prime-studying computer algorithm. Erm, sorry can’t help (at the moment).

Here is a copy-paste of some unedited SPAM I sent to Peter Scholze in January – but I haven’t thought about this much since January, so will have to reconstruct my ideas – for the moment, here is the incomprehensible SPAM version:

===========BEGIN SPAM================

“Maybe a simpler way to explain – a graph of y=e^x from minus infinity to plus infinity which describes the ‘expansion of the number line. Then a graph of

– the integers are generated from the sums and differences of the whole numbers.The definition of the whole numbers 1,2,3…. is where it gets its direction from (and this direction can be reversed using a kroneker-delta functin i.e.delta_ij as addition versus subtraction).

– The t axis is not commensurate with the x,y or z axis – the dimensionality of the xyz axis is related to that of the t axis by the speed of light, as x=ct, where c=2.998….x10^8ms^(-1)

– All the rational numbers can be made from a/b where both a and b are on the t-axis.

– That the rational numbers can also be made from a/b where a and b are take from commensurate spaces (e.g. t1/t2 or x1/x2 or y1/y2 or z1/z2).

– The complex numbers come from the relationship of the t-axis with the xy plane. By convention we choose (I think…) the y-axis as the imaginary axis when we use the complex plane. The z-axis has a different relationship to the t axis than the x-axis and the y-axis if we define the z-axis to be the one we are moving along in space-time. This is related to why things are left-right reversed in a mirror but not up-down reversed in a mirror (more on that later).

– The definition of the vector product is where we get the sign of theta from (where Z= a cos(theta) + i sin (theta) – I think that the vector product is fundamental to our construction of txyz space, but I have read stuff on the internet criticizing its use (e.g. ‘bad physics’).

– The irrationals come from e.g. the lengths of straight lines in e.g. xy, yz or xz. I think they can come from other places too, but this is still a work in progress…

Regarding the relationship of all this to the continuum hypothesis: The transfinite numbers (I think….. I’m not an expert and this is something new I’m basically just ‘making up’) come from for example lengths of lines tx, ty or tz where the 2 dimensions involved are not commensurate in terms of dimensionality – so are not rational or even irrational by (depends on your definition of irrational – but I think that there is a difference between irrational numbers in 2 commensurate dimensions e.g. xz and 2 non-commensurate dimensions i.e. t, z). This can give you numbers which are not included in the current definition of the irrationals (which admittedly I need to check up on), thus violating the continnuum hypothesis and giving rise to ‘Aleph-one’, which is intermediate between Aleph-0 and Aleph^(2N). There might be more Alephs too related to the symmetry related dimensions above 4 – I think so, but this is still a work in progress. HOWEVER ALL OF THE ABOVE MAY WELL BE WRONG – if the different dimensions have constant scale factors between them, then clearly there is a mapping – and everything I labelled Aleph-1 can in fact be mapped to Aleph-0 or Aleph^(2N).

But……. We can’t assume that xyz are exactly the same if there is an observer moving along e.g. x, y or z (I believe observer vs observed to be important (from quantization) within the txyz dimensional universe travelling along a space-time trajectory e.g. tx. And the subject/object distinction is important when we’re considering whether to use the ‘centre of the space-time point’ (if it is a point being objectively observed by ourselves, or from the point of view of another observer we are pretending to be) or the ‘boundary of the space-time point) (where these space-time points are extremely tiny but finite in size) in our calculations. Then of course we might change direction and start moving along e.g. ty – thus changing the constant of proportionality between the xyz dimensions, at least as observed by us. So it seems hard to make a 1-1 mapping between e.g. tx and xy when always time-space trajectories keep changing direction (e.g. first we travel along tx then we travel along xz – this destroys any one-to-one mapping with xy. I think we are within a universe where the observed dimensionality of x, y or z depends on whether it is parallel or perpendicular to our direction of motion c.f. special relativity. Additionally we are not the only observer moving about the Universe.

So I’m not sure if the above is rigorous. So in quantum theories we have ‘observer versus observed’ and spacetime points are likely quantised. I think that it is important whether space time points count as a ‘subjective observer’ in which case they can only observe (infinity minus 1) other points or whether they count as ‘objective reality of the point in question’ (observed by an outside observer) in which case they are one of (infinity) points rather than (infinity minus 1) points – and you can’t map (infinity) to (infinity minus 1) as they differ by one – and a difference of the integer of one is not something that can be mapped by any kind of ‘multiplicative mapping’ of one kind of infinity to another. And if there is a set which is something to do with something which contains two subjective observers, then (infinity minus 2) is the relevant quantity that needs to be mapped to. I think that a dimensional pairing of subjective-subjective (e.g. t-t from the t-axis (aleph-0)) is not the same as the mapping between subjective observer/objective reality (aleph-1) is not the same as the mapping between objective reality-objective reality (aleph(2^N)). Because the observer ‘takes up a quantum of space’ and only can observe ‘outside of the quantum of space he occupies’ – cannot ‘observe himself’. i.e. 3 levels of the continuum hypothesis not 2, and I think possibly this observation could help resolve some paradoxes, but I am not sure of all the paradoxes yet – still a work in progress…….

I am still very confused about what you do when there are 3 or more subjective observers involved in your set – this maps to infinity minus 3. But with 3 subjective observers maybe (infinity minus 3) somehow decomposes/maps 1-1 to (infinity minus 2) + (infinity minus 1) therefore not adding an extra level to the types of sets you can have. Not that I know really how to add the infinities and I don’t really know the proper notation. So we do need (infinity minus 0), (infinity minus 1) and (infinity minus 2), but we don’t need to add (infinity minus n) for n>2 – we just need 0, 1, 2. Whilst 2 can be made from 1+1, this isn’t a mapping as it involves the same set twice?????????????? (not sure) but 3=2+1 which is (can be?) a 1-1 mapping. I think?

Currently maths is based on Zermelo-Fraenkel (ZF) set theory (which I think from the general things I have read about how it is used to be correct but incomplete). But the version used most of the time is ZFC set theory – which is ZF plus the axiom of choice. The axiom of choice was introduced to explain various paradoxes, and it seems to – but I believe it to be incorrect in detail, at least in its most general version (obviously it works most of the time, else people wouldn’t be using it). In terms of what fixes the issues it tries to fix, I think it is most likely what I am saying about the dimensionality of the txyz+extra dimensions of symmetry, quite likely also in combination with observer versus observed. (i.e. subjective versus objective reality). With the extra dimensions of symmetry derived from mappings to the physical laws of nature, it should be possibly to extend ZF set theory so that the axiom of choice is unneccesary to explain these paradoxes it was introduced to explain. I quote from wikipedia “The continuum hypothesis………….The answer to this problem is independent of ZFC, so that either the continuum hypothesis or its negation can be added as an axiom to ZFC set theory, with the resulting theory being consistent if and only if ZFC is consistent. This independence was proved in 1963 by Paul Cohen, complementing earlier work by Kurt Gödel in 1940.” So I think that ZFC is inconsistent and that the continuum hypothesis is false, and that the axiom of choice is false – all of which is consistent with the work of Paul Cohen – albeit that it seems a little that his work was used (by some) to advocate that axiom of choice was correct, not that I really know that much about maths history.”

============END SPAM===============

September 4, 2021 at 1:02 pm |

Just noticed an important typo in the SPAM

The answer to this problem is independent of ZFC, so that either the continuum hypothesis or its negation can be added as an axiom to ZFC [EDIT – THIS SHOULD OF COURSE READ ZF NOT ZFC] set theory, with the resulting theory being consistent if and only if ZFC is consistent. This independence was proved in 1963 by Paul Cohen, complementing earlier work by Kurt Gödel in 1940.

September 4, 2021 at 1:29 pm |

So what I think maybe (but can’t prove) is that P=NP – but that the degree of the polynomial is likely twice (or some small multiplier, maybe 3/2? Just guessing) that of the degree of the of the polynomial for ‘the time the computation would take’ divided by ‘the cosmological time the computation would take’. i.e. P=NP, but there might be no algorithm that can solve it in the age oft the Universe. Recall that a quartic can (is it always or often?) be factorised into quadratics, but a quintic does not have an algorithmic solution. (I learned about this in the book ‘The equation that could not be solved’ by Mario Livio.

Also, recall that

I did some googling and this paper came up as possibly relevant, but I wouldn’t understand it: However, I do recall having a silly idea one time about the imaginary axis going from (minus infinity +integer) to (plus infinity – integer) in a circle, once we got rid of Zero. However, I was redirected to the Riemann sphere upon putting forwards this very vague theory.. However, maybe the following paper by Nag and Sullivan puts this concept into formal maths:

“Teichmuller Theory and the Universal Period-Mapping via Quantum calculus and the H^(1/2) Mapping on the circle”

I think that this article (https://www.quantamagazine.org/with-a-new-shape-mathematicians-link-geometry-and-numbers-20210719/) might be very relevant to what I was saying earlier about how i think that the continuum hypothesis has 3 levels not 3,, specificaly in that it is a one-way correspondance:

“Specifically, they came up with two different kinds: Coherent sheaves correspond to representations of p-adic groups, and étale sheaves to representations of Galois groups. In their new paper, Fargues and Scholze prove that there’s always a way to match a coherent sheaf with an étale sheaf, and as a result there’s always a way to match a representation of a p-adic group with a representation of a Galois group.

In this way, they finally proved one direction of the local Langlands correspondence. But the other direction remains an open question.

“It gives you one direction, how to go from a representation of a p-adic group to a representation of a Galois group, but doesn’t tell you how to go back,” said Scholze.”