Professor John Barrow
What we are going to do today is to look at a particular mathematical series which has an unusual property. First, we will have a little look at the mathematical background to that series, and then we will look at some of the unexpected applications of the properties of that series to a collection of relatively everyday things, such as records, recruiting people for jobs, planning expeditions over long distances and other things like that.
But let us start off by having a look at a few equations. We are going to look at two sorts of series. A series is a sum of different terms; each term is generated by doing something to the previous term. The first idea to get straight is that some series, when you add them up in a never-ending sequence of terms - so an infinite number of terms - the sum you get is a finite number, such as three or five. There are other series where the sum is not finite, and if you keep on adding terms, you can make the sum of the terms as big as you please. The first sort of series, we call a convergent series, because the sum converges to take a particular value; and the second sort, we call a divergent series, because it does the opposite and it just gets larger and larger, or we say that its sum is infinite.
This series here, which, if you think back to your school days, or if you like to work out your compound interest on your savings account, looks a bit like this:
S(n) = a + ar + ar2 + ar3 + ... + arn-1
So we have a first term, which we'll call a, and then the next term, we just multiply that by some quantity, the ratio r, and then we multiply that term by r again, and then by r again, and so on forever. This is typically called a geometric series. You can see how it relates to the compounding interest in your savings account. You start off with a certain amount. After one year of interest at rate r, the amount you have got is a + ar, and so on.
How do we work out what the sum of this term would be if we have some particular number of terms - if there are 'n' terms here? The formulaic answer to this looks something like this:
rS(n) = 0 + ar + ar2 + ar3 + ... + arn-1 + arn
This means that you simply multiply the series by that ratio, and you notice that what happens is that you get an ar, an ar2 - each term just shifts along one. So you can see that this formula is the same as the one see looked at just before, except that you are missing the first term, call it zero, but you have got a new term on the end, arn-1 x r which is arn. All these terms will cancel out, leaving the first term, ar, minus the last one, arn. So, if we want to know the sum, we can just divide by one minus r and take out the a:
(1-r)S(n) = a - arn
S(n) = a(1 - rn)/(1-r)
S(n ∞) = a / (1-r)
So the sum is a rather simple manipulation that you met at school, and you can see, because this ratio, if we specify that it has to be less than one but bigger than minus one, so it is a quantity like a half, say, if you raise it to a larger and larger and larger power, ½n goes to zero. So if you had an infinite number of terms, the sum of this series will just approach the first term divided by one minus r. Which, in this instance of both n and r equalling ½, would make the result 1:
If a = ½ and r = ½
S(n → ∞) = ½ / (1 - ½)
S(n → ∞) = 1
So it is a very simple formula that tells you what is the sum of any number of these terms will add up to, and this is the same sum even if it is an infinite number of those terms?
To give a simple illustration of this, suppose that the first term of your sequence is ½, and the ratio is ½. This would make the series you are concerned with look like this:
1/2 + 1/4 + 1/8 + 1/16 + 1/32 + ... = 1
Having put it into our formula we can see that ½ over one minus ½ is a ½ over ½, which is one. So that's a simple example of a convergent series. But you can intuitively see the correctness of this result just by drawing a picture. Suppose that we have a piece of square paper - it is one unit along the sides, and one unit along the top and bottom, so the area is one times one, which is one square unit. But now, suppose we cut that paper up: first we cut it in half, and then we cut those halves in half again, making the original paper into quarters, and so on, making the paper into eights and sixteenths and so on forever. Now, you know that the total area is one times one. Also, if you add together the area of all your pieces - 1/2 plus a 1/4 plus an 1/8 plus 1/16 plus 1/32 and so on forever - you cannot have any more than the whole sheet of one by one paper, and you won't have any less. Therefore, we know that the sum of all that halving will be a piece of paper of area one unit area. This can be seen very easily with a picture:
This is something of a pictorial proof of what we just saw; that the sum of that series is equal to one. It is the demonstration of the equation which says the total area is equal to the sum of the areas of all the pieces that you get by halving the large sheet.
There is a nice example of these sorts of series in VAT (Value Added Tax). If you have to do look after VAT accounts, you will know that it is always a rather awkward number, such as 17.5%. But it is not quite as awkward as it looks, because it enables you to work out VAT in your head. Suppose there is something which costs £80, how do you work out 17.5%? Well, you notice that 17.5 is 10 plus 5 plus 2.5. So your 10% of 80 is 8, halve it and you have got 4, which is the 5%, halve it again and you have got 2, which is the 2.5%. Now that you have the value for 10%, 5% and 2.5%, you can just add those three numbers together to make it up to the 17.5% you were looking for. So you have got your 8 + 4 + 2, which is 14 - that is the total VAT. So, despite initial appearances, the choice of 17/5% is rather a nice choice.
A while ago, I wrote a little article which shows up again in my 100 Essential Things You Didn't Know You Didn't Know book. You could try and predict what will the VAT rate be in the infinite future. Well, there were two possibilities of the next possible steps in the VAT if you want to maintain this simple pattern which is easy to work it out:
Present rate: 17.5% = 10% + 5% + 2.5%
either, 18.75 = 10% + 5% + 2.5% + 1.25%
or, 15% = 10% + 5%
But what will happen ultimately? You can see that, if you kept on adding terms, the pattern is going to be of the same type as the one we have already met? So the VAT series is one where the first term will be 10% multiplied by one, 10% multiplied by ½, 10% multiplied by ¼ and so on:
10% x (1 + 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + ...)
So, in the infinite future, you would expect the VAT rate to be the sum of this infinite series times 10%:
10% x (1 + 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + ...)
We can see that the infinite series within the brackets is the same one as we met before (whose sum was one) with an extra one at the start, therefore, the sum is one plus one. So the sum in the brackets is 2. Therefore, you would expect the VAT rate in the infinite future to be 20%, if it followed this nice, computationally attractive pattern:
10% x (1 + 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + ...) = 20%
i.e. 10% x 1/(1-½) = 20%
But let us look at a different sort of series now. Not all series converge and have these nice properties, and it is a series like this that we are going to focus upon now. This is the harmonic series:
H = 1 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6 + 1/7 + 1/8 + ...
This series was studied very closely back in the 1300s by people like Nicolas Oresme, who was the counsellor of Charles V in France, a great polymath, one of the pioneers of musicology, the first person to use graphs and calculational diagrams in doing mechanics, and he was also an expert on astronomy and many other areas of mathematics. He was much interested in infinite series, series like this, and also ones where the signs alternated, plus and minus.
Well, on first sight this harmonic series looks a bit like the last one. It has got terms that go on forever, and each term gets smaller than the previous one, so you tend to think that the same type of thing is going to happen: if you keep on adding them up indefinitely, eventually it is all going to peter out and we are going to get closer and closer to some number. But in fact, that is not the case. If you keep on adding terms to this series, you can make the sum as big as you like. So if you specify any large number, eventually you will be able to add enough terms to this series to surpass that number. So that is what we mean by saying it is divergent: its sum is bigger than any number you care to choose beforehand. This series is a particular case of a situation, which is there is many series: if we were to look at a series where, instead of having a 2, a 3 and a 4 as the denominator, but something else - let us represent it with 2p and 3p and so on forever - then if p is smaller than one or equal to one, then that series is going to diverge; but when p is bigger than one, like it was for the last series we looked at, then there is a finite value for the sum:
1/1p + 1/2p + 1/3p + ... → ∞ if p ≤ 1
But is finite if p > 1
So 'p is one' is a critical dividing line between series which are going to diverge and those which are going to converge, and, for a mathematician, this is closely related to the properties of these integrals. If you integrate one over x up to infinity (which is log x) and it diverges to infinity, but if you integrate 'one over xp', where p is bigger than one, the answer is finite.
But, how do you get that proof? It is a beautiful and very simple proof, and everyone should know it.
If we look at our series again, we can imagine that there are a lot of terms:
H = 1 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6 + 1/7 + 1/8 + 1/9 + 1/10 + 1/11 ...
The proof is simply to notice that, if we group the terms in the right way, its properties become obvious and transparent. So it is a good example of how, in mathematics, just looking at a problem in the right way can avoid an awful lot of time and trouble and work. So, what we can do is bracket the first two terms, and then the next four terms, and then the next eight terms, then the next sixteen terms and so on in that pattern:
H = 1 + (1/2 + 1/3) + (1/4 + 1/5 + 1/6 + 1/7) + (1/8 + 1/9 + 1/10 + 1/11... 1/15) + ...
It should now be clear that, in the first bracket, you have got ½ and ¼, so if we replace them by ¼ and a ¼, we would always have a smaller sum, because ¼ is less than ½ and it is less than 1/3. Similarly, for the next one of four, we can replace every term by 1/8, and for the next one we can replace every term by 1/16, and so on.
H = 1 + (1/4 + 1/4) + (1/8 + 1/8 + 1/8 + 1/8) + (1/16 + 1/16 + 1/16 + 1/16... 1/16) + ...
So we are replacing it by what is the next term in the series. So the result is that we have got something which is smaller than our original sum, because we have made sure that every bracket is smaller. Now, if you look closely at the brackets, they have a very simple property: each of them is equal to ½. The first bracket is adds up to 2/4, the next bracket adds up to 4/8, the third to 8/16, the fourth to 16/32, and so on. So, every bracket has been arranged so that it is equal to ½. The number of them is infinite, there is a never-ending number of those halves, and so the sum of this series is bigger than an infinite number of halves, which is obviously infinite.
H > ½ + ½ + ½ + ½ + ... ∞
So the technique is: show that the series you are interested in is bigger than another series which is easier to handle; show that that other series diverges, so your first series must diverge as well, because it is bigger.
So this is an interesting manipulation. There was a great prejudice against dealing with divergent series historically in mathematics. Niels Abel really captures this with this statement of his:
Divergent series are the invention of the devil, and it is a shame to base on them any demonstration whatsoever.
The idea that there are 'an invention of the devil' meant that people were always worried that it was possible to make some illegal move by manipulating these infinite quantities. That is a serious worry. A real proof starts with this idea, but does something much more rigorous on paper.
The thing about 'H' is that it may be infinite, it may diverge in the way we have proved, but the key point about it is that the sum takes an awfully long time to get to infinity, so it grows very slowly.
To look at some examples, suppose we label it in this way:
H(n) = 1 + 1/2 + 1/3 + 1/4 + 1/5 + ... + 1/n
So after we have got n terms, we will call it H(n). So after two terms, its value is 1.5; after four, it has got to a bit over two; by the time you have got to ten, it is only at 2.9; after 100 terms, it is only 5.19. So it is growing very slowly. After 1000 terms, it is only 7.5; after a million terms, it is only 14.
H(1) = 1
H(2) = 1.5
H(3) = 1.833
H(4) = 2.083
H(10) = 2.93
H(100) = 5.19
H(1000) = 7.49
H(1,000,000) = 14.39
So you can see this might be of some practical significance. Although the series is going off to infinity, it does so very slowly, and so parts of this sequence might be very practically useful.
In fact, as Euler first showed, there is a very good approximation as to what this sum is over a long period of runs. It looks like a particular constant that we usually call Euler's Constant, which begins 0.577. It is one of these mysterious numbers of mathematics, like pi and e, and I think it is especially mysterious. I think there is not even a proof that this number is irrational or transcendental as yet. But the other part of the answer is the natural logarithm of n.
H(n) = 0.577 + logen
So this quantity grows just as the logarithm of the number of terms.
Let us now have a look at what that means in practice, to bring these bare numbers to life a little. If you are a philosopher, you might be a bit shocked by this series. For instance, suppose you thought that human progress was always going to be diminishing, so in each year or century in the future of human existence, the amount that we would learn would always be smaller than what went before. The average person thinks, therefore, that it must all die out and tend to a constant value, and there will eventually be no progress. But from this series you see that this is not really the case. If the amount that we added to our knowledge every century of human existence followed this pattern, even though the increase was forever diminishing, there would still be no limit to the sum of knowledge after a very large amount of time.
Perhaps a more practical application of this series is in records, and in particular, let us focus on rainfall records. So if you are interested in any problem where you are focusing on 'Is this year a record performance? Is it a record rainfall? Is a record temperature?', where you believe the process to be a random one, that is there is no systematic process by which this year's rainfall is determined by last year's rainfall, then that may be something you want to test. So one of the things this would not apply to would be sporting records, because they are not random. People train systematically, they improve in a systematic way, so the performance of 100 metre sprinter is targeted on being better than the existing record. In contrast, the rainfall high in each year is just a random event.
So let us see how this works. Suppose, in year one - the first year we keep our rainfall record - we measure the rainfall and, obviously, in the year one, we have only got one rainfall datum, and so it must be a record. So the number of record years after the first year of record keeping is one.
In the second year, you have got a second year of rainfall information. We are assuming it is independent of the first year. So there is a 50/50 change it could, with 50% probability, beat year one, so it would be a new record. But it could, with 50% probability, be less than year one and so not be a record. So the expected number of record years after two years is 1 + 1/2.
If you go to year three, there are six ways in which the rainfall values could come out: there are six ways to write down the numbers 1, 2, and 3 in different orderings.
123, 132, 321, 213, 312, 231
I have highlighted, in bold, the only two cases where the third year rainfall is the highest. This is shown in the sequence because they are listed in the order of volume: the lowest rainfall, to the middle rainfall level, to the highest. So there are just two cases, 123 and 213, where year three comes out at the top. So if they are random and independent, if there were six ways for things to turn out, only two of them produced a record in year three, so there is a 2/6, or 1/3, chance of a record in year three. So the expected number of record years after three years is 1 + 1/2 + 1/3.
You should hopefully be able to see the way this is going: we can carry out the same argument and write down four numbers in different orders and we will see that there is going to be a 1 in 4 chance of a record year after four years, and the number of record years expected after four years is going to be 1 + 1/2 + 1/3 + 1/4. So we see that a harmonic series is gradually emerging here.
If you carry on this argument, after n years of record keeping, assuming the rainfall levels are independent in each year, the expected number of record years for the rainfall is given by our harmonic series:
H(n) = 1 + 1/2 + 1/3 + 1/4 + 1/5 + ... + 1/n
Let us try this out. If you look at the Kew Gardens records, which have been kept from 1748, which makes over about 256 years of UK rainfall data. So what is the value of H(256)? How many records should you find over that period? Well, that value of H is around 6.124, which means that you should only find about six record rainfall years. To find even eight record rainfall years, you are really looking at more than 1,000 years you have got to go to before you are going to find that number of records. So if the rainfall is really random, the values are independent year-on-year, and there is not some systematic process like global warming creating a new record every year because of external, non-independent factors, records really are rare.
So, to look at some pictures of long time runs, we can look at Australia and its run of records from about 1900 up to the present:
If we only look at the data until the year 2000, making it a period of 100 years, you would expect there to be five or six record years. If we look at the graph, we can see that this is confirmed from what was found - there are six record rainfall years.
To look at an Indian plot, where the values are plotted around the mean, we can see the same thing.
So if you just look at the maxima, here you go from 1870 or so up to the present, so you would expect about six. If we look at the data we can see that there are indeed six record years.
Here is the so-called Central England Temperature Record, so this is the standard way to calibrate temperatures in the UK because it is taken from some place in the heart of the country, far from the sea:
Again, you can look at how many record high temperatures there have been over this period of about 350 years. You would expect there to be around six and a bit. It is quite good because there are six, and then recently, all sorts of things have been happening. There is a significant departure from these statistical trends in rather recent years, with almost every year becoming a record - which is rather alarming.
Of course, I have talked about records, and I have talked about highest temperatures, highest rainfalls; but everything I have said applies to record lows as well. If you turn any of these pictures upside down, you can apply the same reasoning to the lows in the temperature or the lows in the rainfall and find the same types of predictions confirmed in the data.
Another application of records is bunched traffic. It does not sound as though it has got anything to do with the harmonic series, and nothing to do with rainfall records, but bunched traffic is very like it once you look into it. So if you go on a long car journey you get rather used to the phenomena of bunches of cars forming on motorways and other dual carriageways, where some driver will be going rather more slowly than people behind, and you will create this bunch that is created by the slowest driver. Some distance in front of that slow driver, there will be another bunch behind another slow driver. So what is creating the bunch, or the bunching effect, are a series of record drivers - they are the record slowest drivers on the road. So you can now apply the same reasoning to asking, if you have a certain number of cars on the road how many bunches of traffic do you expect there to be forming in this way? The answer is exactly the same as applied to the rainfall records. You are asking how many record low speeds are you going to encounter as your run through the traffic. Each time you get a record low speed driver, there will be a bunch behind that driver. The answer we have seen, is exactly the harmonic series:
H(n) = 1 + 1/2 + 1/3 + 1/4 + 1/5 + ... + 1/n
So if your n cars of traffic is 1,000, then you would expect there to be about 7.5 of these bunches, with gaps between them.
You will see this phenomena if you go through a long tunnel. You notice how the spacing tends to alter. You enter the tunnel with a certain amount of space between cars, and then when you go out at the other end, the spacing has changed so the faster cars are going in smaller, more widely separated bunches than the cars that are near the beginning of the tunnel.
Once you start thinking in this way of simple examples of records, you can soon imagine other simple applications. Another one we may thin of which, again, at first sight seems to have nothing to do with harmonic series, is testing components. Suppose you have a business and you have thousands of components that in some sense are rather precious, that is you do not want to smash them all up. Maybe they are glass tubes, or pieces of metal which are going to bear some load. What you want to know is what the tolerance is, what the strength is, of these components. So you want to get an idea of what the weakest link is, as it were; what is the minimal tolerance. Well, the bad way to do it is to test every single one of them, and odds on, you will break them all except one, maybe the last one, so you do not want to do that. You want to figure out how it is possible to get a very good idea of the breaking strength by having a particular strategy of testing only some of them.
Again, we are interested in a certain sort of record. We are interested in the record weakest component. So let us call the strength of one of these bars Br, and what we are going to do is pick the first one, and we will add a stress to it until it breaks. So we know the breaking strength of the first one, and we will call that B1. Now we will pick up the second one and we will apply a stress to it equal to that breaking stress B1. If it does not break, we will know that the strength of number two is greater than number one. If it breaks first, we will note B1 because it is the weaker, so it is giving you a guide to the weakest beam. Let us now move on to the third component and stress it to a stress level equal to the smaller of B1 and B2. If it breaks, then it is weaker and we will make a note of it; otherwise, move on to the fourth one. Stress that to a level equal to the minimum of one and two and three. If it breaks, make a note of B4; otherwise, move on. So keep on going, we are attacking this record for the weakest component, and after we have taken n of these bars, the expected number of ones that we will have broken will be the harmonic series. Since we can imagine that we have 1,000 of these components, you are only going to expect to break about 7.5 of them to get your estimate of what the breaking strength is.
There is another curious application, for those of a suitable age, who used to collect things as a child. I think this is more a little boys' hobby ground, but when I was a child, everybody collected things - football cards, cigarette cards, team cards, models, things out of breakfast cereals, etc. Everything you bought had an object in it that you could collect to accumulate a set, so your mother ended up with fifty packets of Cornflakes in the kitchen at home so that you could collect all the cards that you had to get to complete the set.
The challenge here is, suppose you know there's a set of fifty of these collecting cards. How many should you expect to have to buy in order to collect the whole set?
On the other hand, suppose you are a businessman and you are making these objects, you want to be able to work out how many people are likely to have to buy of your cards, your bubble gum or whatever, in order to get their set. So you can manipulate the situation so that they buy a nice, large number before they get the set. (Of course, if you are unscrupulous, you simply do not make so many of the Bobby Charltons - that was the one that nobody could ever get! - But let us ignore this possibility, so that it really is random what cards you get).
So, what happens when you are trying to collect sets of things? Well, it is a bit like the last two problems. For the first card you buy, you will certainly need it for your collection. You have not got any cards, so you always need that first one. If you then buy a second packet of bubble gum, what is the chance that you have not got the card in there? Well, if we suppose there is a set of fifty, there is a 49/50 chance that you have not already got that card. If the cards are independent, equal numbers, if you buy the third packet, there is going to be a 48/50 chance that you have not got that card already, and a 2/50 chance that you have, and so on. So you can see how the pattern is going to go.
To jump on, suppose you have got forty cards, for the next one you buy, there is going to be a 10/50 chance that the next one is one you have not got, and you are going to have to buy on average another 50/10, or five cards, to have a better than even chance of getting one that you still need. So you can see the pattern is rather straightforward: the number of cards you are going to need in order to get the whole set, with a 50% probability, looks like this series:
50/50 + 50/49 + 50/48 + 50/47 + ... + 50/3 + 50/2 + 50/1
So for the first card, there is a chance of one, for the second card it is the inverse of that probability, inverse for the third card, and so on, and the last card, you will have a 1 in 50 chance of completing the set. So this series is the answer you want for the number of cards you will need to buy to complete the fifty-card set. You will notice that every term has got a fifty as the numerator, so we can take out the factor of fifty, and what we are left with then is fifty multiplied by 1 + 1/2 + 1/3 and so forth, plus 1/50:
50 x (1 + 1/2 + 1/3 + 1/4 + 1/5 + ... + 1/50) = 50 H(50) ≈ 225
I have reversed the order - 1 + 1/2 + 1/3, and so on, up to 1/50. So there is our friend, the harmonic series, all over again, and the number of cards that you are going to have to buy is 50 times the sum of that series, to 50 terms, and that number is about equal to 225. So that is how many you should expect to buy.
You can see, if you change the number of cards in the series, this number is going to get bigger, and roughly, you see what the pattern is: the sum, the number of cards you need if there are n cards in the series, is just ntimes the harmonic series to n terms. So, same argument, if there are ncards in the series, there is the pattern:
N/N + N/(N-1) + N/(N-2) + N/(N-3) + ... N(½N + 1)
≈ N × (ln(N) + 0.58 - ln(N/2) -0.58) = Nln(2) = 0.7N
So you can always work out pretty exactly how much you are going to have to spend to accumulate your set.
One of the things you will notice about this is that there is a peculiar asymmetry. You do not need to buy that many cards to accumulate the first half of the set. All the effort starts to come later on, because you have got more of the cards. Each time you buy another one, it is harder and harder to get those last few cards. So, suppose we just wanted to accumulate half of the set, so we ended this series at n/2, then the total number of cards you would have to buy is just 0.7 times the total, so it is really not that many. For a set of fifty cards, you only need to buy 35 cards to get half the set, but, more than likely, you will have to buy another 190 cards to get the rest of the set. So this is one of the magnetic appeals of the collector: in the early days, you think things are going swimmingly and your little sticker book is filling up, but it is an enticement to keep on buying more and it becomes harder and harder to complete the set. The standard deviation, the variance, as it were, on this answer, N(log(N)), is about 1.3N, so there is quite a big fluctuation around the average.
Another thing that you can do, if you collect things, is to try and cheat the manufacturers, in a sense, by swapping. So the calculations I have just talked about tell you how many cards you are going to have to buy if there is just you buying cards, but if you and your friend are both buying cards, then you can swap the duplicates and considerably reduce the amount of time and money that you will have to invest in getting the whole set. This is because, in effect, if you have got, say, some number of friends, F, then the number of cards you are trying to accumulate between you is the total number of cards in the collection multiplied by the number of people there are collecting them, NF. Therefore, you are not necessarily wanting to accumulate all NF for yourself, but just the first N. As we have seen, that is much easier to do than to accumulate the whole lot. In fact, you just apply the same logic, again, to what we have looked at:
Without swapping: (F+1)N(ln(N) + 0.58)
With swopping: N x (ln(N) + F ln(lnN) +0.58)
So, if you have got F friends, you are looking at a number like F+1 multiplied by N log N to be how many cards you would have to buy to complete your set and your F friends' sets if you did not swap; but if you do swap, the number is much smaller. So swapping, as is obvious, saves you from having to buy a great many unneeded cards, but this formula effectively tells you how much you economise by.
The next example of this that I want to look at is again related even though it is not obviously so. Traditionally, it is known as the 'Secretary Problem'. It is really the recruitment problem, but it is called the 'Secretary Problem' because the first time it was set out in this way. It was about the recruitment of a secretary from an enormous number of job applicants. So if you just have five applicants for a job, then it is fairly straightforward to see that you will be able to interview each of them, make a judgement, and compare each with all the others to see which is the best one. But if you have an awful lot of applicants, it will not be so easy; it is not so obvious what you should do. Once the number of applicants becomes too large, it is not cost effective or practical to interview everybody.
You might take the view you are not going to interview anybody and you are just going to pick the person at random for the job. Suppose there are N applicants and that there is a best candidate out there. What is your chance of getting the best applicant if you pick at random? Well, it is just 1/N, and when N is big - suppose N is 1,000 or 100 - that gives pretty slim odds that you will get the best person. This means that you are pretty much giving up any hope of getting the best person. So that is almost the worst strategy.
The best strategy is spending months and months interviewing absolutely everybody, but that is expensive, on time and money, and goodness knows what else. Is there an in-between strategy, a sort of Goldilocks strategy - one that is not too long and not too random? What is that strategy?
Well, the idea is to again think of this as one of these record problems. You are interested in the best person. You keep a check of what the best is for a while, test the next person, then the next person. Is there a strategy which does this in the following way? You decide we are not going to see all the candidates; we are going to see, say, the first C of them. Suppose there are 100 candidates, you might see the first fifty or the first twenty, and as you see that first number of candidates, you keep a check on who is the best candidate that you have seen. Then, you see the next candidate, and if that candidate is better than any of the first C that you have seen, you hire them and do not interview anyone else. If they are not better, then you interview the next person, and so on until you find one that is better than that first group. The big question is: is there a way of choosing the size of C here so that you have the best possible chance of getting the best candidate?
You can see the approximation relation of this to the problem of testing the beams. Testing the beams, you did not break every beam, you did not test every beam individually, and likewise, here, you are not going to interview every single candidate one by one.
First of all, just consider the situation where we have got three candidates, so that we can get grip on everything that is going on. Let us suppose that the actual state of affairs is that Three is actually the best candidate, Two is the second best, and number One is the third best, and let us see how different strategies would or would not arrive at the best candidate.
So if we picked at random, we would have a 1 in 3 chance of hitting on candidate number three, who is really the best. So that is what would happen if we picked at random.
Suppose we saw these candidates in various different orders when we interviewed them, and these are all the possible orders of those numbers:
123, 132, 213, 231, 312, 321
We could apply a strategy where we suppose that we say we are always going to take the first candidate we see. So we will just pick the first candidate that we see. In the first instance in our list of possible orders of candidates, we would be picking number one, it would be the same in the second ordering, then it would be Two for the next two and then Three in the last two orderings. So in two of the six cases, we would actually end up with the best candidate, by simply having a rule that says we will pick the first candidate we see. So there is a two in six, or one in three, chance of getting the best candidate if we do that.
A slightly more sophisticated version is where we in effect have C as two. So we will always let the first candidate go, but we will note how good they are, and we will pick the next one who has a higher rating. So if we take it that Two is better One, and Three is better than Two, we can see than in the second ordering in our list (132), we would let One go and then we would pick Three and we would make the right decision. In the first ordering (123), we would let One go and we would pick Two which would not be the right decision. With the order 213, we would let Two go, we would let One go, and then we would pick Three. So these three cases, 132, 132 and 213, are the three cases where we would get the best candidate, so our chance would be 3 in 6, 1 in 2, of getting. So we have gone up to a half in our chance of getting the best candidate.
If you move on, we can imagine that we have got some very large number of candidates that we are going to call N, and rather than use the specific numbers like we just did, we are going to imagine that the best candidate is going to be sitting in the R+1 position, so our C, as it were, is going to be R. We are going to skip the first R candidates and take the next best one that we see. If the best candidates in the R+2 position, we will pick them with this chance, 1/N (that is the random chance) multiplied by R over (R + 1), which is the chance that we let them go in the R + 1stposition:
1/N x (r/(r+1))
We can also work out what the probabilities are of their coming in any of the positions after R+1 by this same method. Therefore, our probability of getting the best candidate out of N, if the real candidate is in the R + 1st position, is just this series of terms. So each term gives us the chance that we will pick out the best candidate after letting one larger number of candidates go. So working out the sum of this is not too difficult:
P(N, r) = 1/N x(1 + r/(r+1) + r/(r+2) + r/(r+3) + r/(r+4) + ... + r/(N-1))
It is one over the number of interview candidates, and R is this critical place where we are going to stop seeing people. It is a finite series of terms, it looks a bit messy, but we can tidy it up because what is inside the brackets looks very much like the logarithm for a series, the log of N-1 over R times R:
P(N, r) ≈ 1/N(1 + r ln((N-1)/r))
The interesting thing about this quantity is that, if you were to plot its graph, it goes up to a maximum and then it comes down. That maximum value occurs at a place where this logarithm, N-(1/R), is equal to E.
This picture is upside down because it is where 1 over E is the maximum value of xlogx. So there is a maximum value for this probability when we choose this logarithm term here to be equal to E. What that means is that this is when E is equal to (N-1)/R. But when N is big, (N-1)/R looks like N/R because N-1 is relatively very close to N in the series. So, when N is large, this thing has this rather simple structure - its maximum value occurs when N/R is roughly E, or when R is N/E. What does that mean? R is the number of people that you should let go by; N is the total number; E is 2.7, it is just a number. So what we see is that the maximum value in our problem will occur when this quantity N/R is equal to E. We plug that back in - the actual value of the maximum is about 0.37. So what we should do in this problem is reject the first N/E, that is N over 2.7, candidates, so the first 0.37 x N candidates, the first 37% of them, let them go by, take a note of who is the best, and then you take the next one that comes along that is better. If you do that, your chance of getting the best candidate will be 37%. So it is a rather remarkable result. So if you have 100 candidates: you pick at random, you would have a 1 in 100 chance of getting the best candidate; here, if see 37 of them, pick the next one who is better than any of those, and see nobody else, you will have a 37% chance of getting the best candidate.
This has a sort of application of a quasi-sociological and sort of philosophical nature. In all sorts of areas of life we make choices about things based on seeing lots of examples. It might be choosing a house or it might be choosing a husband or a wife. In practice, if we think about this, we tend not to see many examples of these very important things before we choose one, and this example highlights this; that we have a tendency to stop searching too soon. In terms of our workings, this means that we choose our value of C to be very small. So we look at half a dozen houses, we get fed with looking, and we choose the next one that we see that we like more than those, but we would not dream of looking at 1,000 houses. Maybe it is the same with husbands and wives and jobs and venues for meetings and things like that. We tend not to apply the logic of this example. We never allow N to get big enough. But that is just a philosophical interlude about that.
The last of these problems I am going to show you are sometimes called the Exploration Problem, and it is about using vehicles to search over a large distance. It is a problem that has a long history, because it was much discussed in different forms during the Second World War, and it was finally solved, in print at least, in 1947, but I think lots of people solved it in different ways for strategic purposes during World War II.
The idea is that you want to go from A to B, where B is as far away as possible. So you might be an explorer, and you want to cross a desert, and what you have got at the outset are lots of vehicles and lots of fuel, and you have to figure out how to make use of them in such a way that you can get as far across the desert as you can. So there is no point just taking one jeep with loads of fuel on it - there is a limit to how much fuel you can carry on one jeep - so in one jeep, you will eventually run out of fuel and you will not go any further. So what should you do?
We will measure the distances according to how far on one full tank of petrol, so that one unit of distance is the distance that a jeep goes with one tank full of petrol. So, let us suppose that we try and use two jeeps. So one jeep can go distance one unit, but what about two jeeps? The subtlety here is to be able to swap fuel from one jeep to another. What we do is that you go one third of a unit with the two jeeps, so they have both used one third of a tank; then you switch one third from jeep two to jeep one, so that tops up the first jeep's fuel to two-thirds, and drops the first jeep down to one third which is enough for it to go back home, but it allows the first jeep to go on the distance two-thirds. So what is happening, if we do that little trade between two jeeps, you have found a way to get the first jeep out to a distance of 1 + 1/3 distance units.
So let us now start with three jeeps now, and instead of switching a one-third of a tank between the jeeps, let us have three jeeps stop after they have gone a distance one-fifth, and then jeep Three puts a fifth of a tank of fuel into jeeps One and Two, and jeep Three stays there. So jeeps One and Two are now just in the situation they were in our first example, and they go on as before. Then, after another third, they switch a third from jeep Two to jeep One, and jeep Two then goes back to join jeep Three. By the time it reaches jeep Three it is empty because it has used its third of a tank to get back there - but jeep Three has got enough fuel left to get them both back to base. So by this recipe, jeep One is now able to get a distance 1 + 1/3 + 1/5 from base. And you should be able to see how this is going to go now from the pattern that is emerging.
You have four jeeps next, and they stop after using a seventh of their fuel, and they then switch one-seventh of the fuel into the other jeeps, allowing jeep One to go a distance 1 + 1/3 + 1/5 + 1/7, and three of the other jeeps to return to the other one, refuel and go back to base.
So by following this recipe, you can see, if you start with N jeeps, you can get them out to a distance that looks like the sum of 1/(2N-1) and all the other jeeps get back to base to refuel, so there is no limit to how far you are able to go.
Here is the N jeep situation:
1 + 1/3 + 1/5 + 1/7 + ... + 1/(2N-1)
So if you have some number N jeeps that you are coordinating in this way, then this is the total distance that you can get one of your jeeps with all the others returning to base. You can see that it is the harmonic series with the even terms missing. If you put the even terms in, it is equal to the harmonic series, but as it is, it is just half the harmonic series. So if you think about it, what happens to this series when it gets very large is that it just looks like ½log(N). So you have got an unlimited supply chain if you want it. You have a way of making one of the jeeps go as far as you want in this exploratory purpose, with all the others returning to base, using a minimum amount of fuel.
Well, that's a good place to end. I hope that I have given you some sort of introduction to the harmonic series, and to show how it pops up in all sorts of places. There are many other problems I have not had time to talk about: indefinitely piling up books or playing cards - you can make them get as far overhanging the table as you wish by using the same idea. So divergent series are not necessarily an abomination of the devil, but can be extremely useful in all sorts of practical purposes.
©Professor John Barrow, Gresham College, 11 December 2008