25 April 2008
History of Optimization
Professor William Shaw
First of all, I want to thank both Gresham College and the Society for inviting me to give this presentation. The theory of optimisation is a truly massive subject, so I?m going to tread a certain path through it, but just to give you the scope of what the subject is about and what I'm going to talk about, optimisation can refer to many, many things. We can talk about profit and risk, and that will be the main theme of this discussion, but there are all kinds of problems. One of the classic ones that is often thought about is the classic travelling salesman problem, which these days finds its expression in the analysis of complex vehicle routing problems. In relation to Mark's talk, you can think about using utility maximisation philosophies to actually go into the pricing of options. I will actually focus on something that I've been involved with, with the investment banking community, which is largely on portfolio theory - what assets to buy or sell, and how much of each to have. So this is very much a history coloured by my own experience with the investment banking community over the last 25 years, supplemented by some interesting cultural themes and also, perhaps also, coloured by my interest in computational finance. So I'm going to leave several things out, but hopefully give you a pretty good idea of a certain part of the theory.
Now, being an applied mathematician, I'm going to run a common trick, which is to work on two different timescale. The first timescale is really going to be about the last 50 years, and I'm going to start with something that a good number of you may have even studied at school, and that's the basic idea of linear programming. It's these days something you find in many high school syllabuses.
I'll start with a very simple example. You've got a farmer who can think about keeping x hundred sheep or y hundred goats. The farmer will make so much money out of keeping each type of animal, but there are various costs and constraints associated with keeping these animals and feeding them, and there's fields to be grazed on, and one tries to maximise one's profit subject to certain constraints. Now, when we look at this at school, we drew some very simple things: we draw some graphs to show what is the feasible region. So in this idea, what we're going to have is that the farmer can, say, keep up to 400 sheep, or 300 goats, and these are real animals, okay - you can't go short goats, you can't graze negative goats on your field! So, in the early history of this, all of these were thought as real quantities which are non-negative, and the classic theory of optimisation starts off with real objects. I noted from Penny's talk we got into this idea of positive and negative quantities. The early history of optimisation actually looks at real objects which are actual positive real variables and, in many cases, integers. For various reasons, the farmer has some constraints. He can keep up to 400 sheep - you can't have negative sheep - 300 goats - not negative goats - and then maybe, because of the size of the field, there's a constraint on the total number of animals.
So we can build a feasible region which is this polygon here, bounded by the zeros, the three, the four, and a constraint, say, from the size of the field. That's a very simple idea. What happens next?
Well, it turns out that sheep are twice as profitable as goats, so the profit structure for this is essentially 2x + y, twice sheep plus goats, and so you get a profit structure which is a multiple of 2x + y. We can look at the lines of constant profit, and you have an increasing function going up in this direction. Every configuration on here, on one of these lines, is equal profit.
Now, how do you maximise your profit? Well, you combine those two structures, the profit structure and the constraints structure, and you work along the profit lines looking at where you finally hit the edge of the region, and it's at this point here, where you can't increase the profit any further by going outside the so-called feasible region.
Now, a very simple idea here: we're looking at maximising a linear function, linear functions never have zero gradient, so there are never stationary values, so maximisation is all about edge conditions, boundary conditions, where do you get to the edge. You can see here that, in this case, we've actually hit a vertex of the bounding polygon. That's an absolutely typical situation for linear programming.
Now, that simple idea gave rise to a whole industry in the period starting around the end of the Second World War. Linear functions never have stationary values, so the extreme values in the linear case are always on a boundary and typically on a vertex, as in that example. That leads you to the idea of searching amongst the vertices of the problem, and an idea for an algorithm is to find one that works and then work your way around the bounding polygon increasing your profit at each move from vertex to vertex. That idea was finally crystallised into a very important algorithm by George Dantzig just after the Second World War, the late 1940s, and it's the so-called simplex algorithm, listed by one computational journal as one of the top 10 algorithms of the 20th Century.
The basic idea of course is very simple, as in those graphs. Those of you that have never looked at this, when I teach this, I refer my students to the book by Bill Press and colleagues, Numerical Recipes, which gives a wonderfully practical discussion of it. Another early contributor to this was John von Neumann, who also introduced the concept of duality into these problems, so finding a complementary problem that you could also think of solving with the same solution.
Now, what happened next? Several people in the post-War period have worked on either improving the simplex method or doing it differently. There are a great many contributors - I want to just highlight a few.
At the beginning of the 1970s, Klee and Minty showed that Dantzig's method, if you construct a suitably annoying feasible set, can be forced to visit all of the vertices of this big n-dimensional polygon. So that leads to certain efficiency questions. Having to search through all the vertices, the problem becomes very big as the problem size increases. Dantzig started off with I think around...figuring out how to optimise 70 people doing 70 jobs, something like that. As the problem gets much, much bigger, you get into potentially quite large computation times. There was great interest in finding methods that brought down the growth of the problem as the number of variables increases, and there was important work by Khachian in 1979. In 1984, I remember I was a junior research fellow in Cambridge, and the applicable maths community in Cambridge, there was great excitement because Karmarkar had introduced a whole new way of thinking about these problems which involves moving through the interior of the feasible region, trying to find this optimal boundary point, and that led to a competing algorithm with better growth as the problem size increases. I'm not - I sort of looked at this again in preparing for this lecture, and it's not clear to me that the competition between the classic simplex method and its evolution and the Karmarkar method, it's...they get pretty similar results on a large class of problems. You can find problems that are better for one and not the other, and so on.
Now, these days, the basic idea of the simplex method is built into many computational tools. So I can, for example, give that problem to this little computer here, and change the profit structure, fiddle around with it and change the optimal points, and you can just get off-the-shelf tools that will do that.
Now, using that, I want to show you an interesting feature. I'm going to draw a little picture here, and this is actually very important for the financial applications. You could think about changing the relative profitability of sheep and goats, and this you could think about changing the expected return structure in a financial portfolio optimisation problem. As we do this, I'm going to change the relative profitability of the sheep and the goats. Now, at the moment, these profit lines are wandering off increasing in this direction, and the boundary of my polygon is up here. Now, as I change my relative profitability, what happens when we get to here is that the solution of the optimisation problem jumps discontinuously to another vertex. In fact, at this level point, the optimum could be anywhere along that bounding line. So we can see right away that there's an instability in the optimal configuration as a function of the problem parameters. As we change the profit structure again, the optimal point jumps once more, and if I were to go to an extreme, it would jump to the bottom. So we have this situation that these optimal configurations are, in some sense, unstable functions of the parameters. Now, that is a recurring theme in this subject, and it's continuing to exercise people to this day in the context of more complicated problems, but it even occurs in the linear case.
Now, this is a purely linear problem. There's no risk here. There's no idea of the sheep getting sick or anything like that. It's a pure profit maximisation problem. If we want to think about adding some risk, we then go into a much deeper problem, and indeed, to ideas that actually go back much further in history. In mathematical finance, the idea of maximising a profit is, of course, very important, but it's modulated by the need to simultaneously manage the risk of loss, and of course this is something we're all acutely aware of in the present market conditions. This all boils down to a sophisticated version of the very everyday idea of not putting all your eggs in one basket.
Now, I had a look just to see how many different places that idea comes up, and there's an astonishing number of cultural references in history to it. In Don Quixote, written in the 17th Century, it said: "It is the part of a wise man to keep himself today for tomorrow and not venture all his eggs in one basket." One of my favourite quotes on this actually introduces an altogether more sophisticated idea. William Shakespeare, in the Merchant of Venice, cunningly introduced the idea of actually having temporal diversification in your asset structure, when Antonio, in a conversation with his friends, told his friends he wasn't spending sleepless nights worrying over his commodity investments, and he says: "My ventures are not in one bottom trusted, nor to one place, nor is my whole estate upon the fortune of this present year, therefore my merchandise makes me not sad." Hidden in there is this idea of thinking about having a balance between the near future and the rather more distant future in terms of the return on investments.
So there it goes back a few hundred years. In fact, as many of you will know, it goes back rather older than that, rather further than that, but let's just formulate the maths a little bit. If you're going to spread your assets around into many different baskets, the question is which baskets are you going to use, and how much are you going to put into each one, and that's the critical question. Mathematically, we think of this as a weight determination problem. We have a collection of weights. Initially, again, they're all thought of as positive. In the old days, you couldn't go short; you held positive numbers of shares, and we would have a collection of weights, all non-negative, adding up to unity. So that's represented by this equation here, and the question is how do you pick these WI, the fractions of your investment that you put in each asset. People were worried about this for a very long time. Now, until recently, as I say, these weights were assumed to be non-negative. These days, we explicitly allow for short selling conditions.
This - the solution to this is well-known. I first heard of it as a result of a Newton Institute workshop talk a while back. The oldest reference I'm aware of is the 4th Century by one Rabbi Isaac bar Aha. You have to be able to read Aramaic if you want to see the original, but this is, for me at least - and I appreciate the correction if there is any - the earliest known written example of an asset allocation strategy, and it's the simple equal allocation strategy, and what you should do with your wealth, you should hold a third in land, a third in merchandise, and a third, as he put it, "at hand", which means in cash. That is the so-called one over n strategy. If you have n assets, you put an equal amount into each one. That is actually widely studied, and is actually a very robust approach because, if you keep doing that, you're not necessarily turning over your portfolio very frequently when you move from one configuration to the other. So it has certain attractive features, and you can find quite a lot of recent research on looking at that.
Now, in preparing for this talk, I wondered what the Ancient Chinese had been up to, and, not being an expert, I asked Xunyu Zhou, one of the world's leading optimisation theorists, and Chinese, as to whether the Ancient Chinese had had any thoughts on this matter, and he said something very, very interesting to me a few days ago. He said that, as far as he could tell, optimisation was largely absent from Ancient Chinese thought and that it was, in the main, a concept introduced to China from the West. His view was that this might have something to do with Confuciun culture, where you stay in the middle and never venture off to extremes, so that the idea of maximisation was somewhat alien to the culture. I asked Xunyu this question, having rummaged around looking at various texts on the internet, and Confucius did not actually say anything along the lines of our Rabbi, but he did find one other thing which I thought was quite good fun, especially in current market conditions, and it translates very well: "He who will not economise will have to agonise," which is something that weighs heavily on my mind, just having had to remortgage.
I'm slightly surprised at Xunyu's response because the idea of, say, balancing what one does across several areas, a risk minimisation idea, seems to me to be consistent with that view of staying in the middle. On the other hand, the optimisation process seems to be fighting with the principle, so I...will look into that.
Now, the joke goes that for the next one-and-a-half thousand years, people were busy with writing grant proposals and not getting very much real research done, and it had to wait for Harry Markowitz and co-workers after the Second World War to develop this idea further. Harry Markowitz, at the end of World War Two, went to work for the Rand Corporation and met George Dantzig and got interested in optimisation. This led to a great deal of fascinating work, which I will now drastically oversimplify.
The idea was that one should, in one version of the problem, minimise the combination of risk minus profit, and the relative weighting of those two is a parameter of the problem that measures your aversity between risk and return. So, for example, in pension investments, certainly as one gets towards later life, one thinks about a very small version of Lamda - you want to keep the risk down - whereas, perhaps when you're younger or you have a more balanced collection of investment, one thinks of a large value of Lamda. So this is the new idea.
Now, this conceptual idea and the mathematics of it was something that was introduced in Harry Markowitz's PhD thesis. I was very amused by Mark's comment about the reaction to an earlier PhD thesis because, during Markowitz's thesis defence, the eminent economist Milton Friedman thought that these ideas for portfolio management were not economics, which is quite amusing because this work led to the so-called Nobel Prize for Economics or, I guess, as Mark raised the issue, the Nobel Memorial Prize for Economics, but let's not go down that route, let's not have that argument! Anyway, as with all great innovators, there was initially some considerable scepticism on the matter.
Now, there are various related formulations of this problem. You can think about maximising return subject to a [risk] bound. You can - let me just blow this up a bit in fact - you can think about minimising risk subject to a return goal.
Here's something that fund managers are very fond of. They like to work relative to an index. So fund managers have this annoying habit of not working in terms of absolute returns but in relative returns, and this allows them, for example, to make comments that - the tech fund crisis was a good example of this. So you would get a dreary economic statement saying that your technology ISA had lost 85% of its value, but compared to the Pacific stock technology index, we outperformed by 8%...! So a lot of the investment theory is formulated in terms of returns relative to a background index, which is very convenient for the fund managers.
Either way, whether in terms of absolute or relative weights, the classic choice of optimisation function is a generalisation of the linear structure to some kind of quadratic structure. Now, where's that quadratic structure come from? It's come from the notion that the risk is characterised essentially by the variants of the configuration. That's an important...it's an important idea, and it's a rather important limitation. Markowitz was aware of its limitations in the early version of the theory, but it was only really with this particular structure that it was possible to make theoretical progress in those early studies. So, for several decades, the work has focused on this type of risk structure. You will note that this, in trying to minimise this, you are simultaneously minimising upside deviation and downside deviation, okay - it's total risk. The asymmetry between being happy about profit and being worried about loss isn't there in this form of the theory, but Markowitz was aware of that.
Now, in reality, there are various constraints, and in the early days, the weights were assumed to be non-negative, and you can inject into this all kinds of matrix equalities and inequalities, equalities and inequalities, and these could be to do with sector exposure, so you might not want to have more than 10% exposure to the oil sector, 5% to banks, and so on. So these problems come with a fairly complicated constraint set, but we get the classical theory of quadratic programming out of this.
Now, in linear programming, the optimal solution is always at a boundary point, as we saw earlier. In the quadratic case, it might also be in the middle, but it might also be on the boundaries. So we now get a more interesting theory, where it's not just to do with boundaries, it's not just to do with differentiation to find stationary values; it's a hybrid of the two.
Let me just show you how that works a little bit because it's interesting. What I'm just going to introduce here is a parameterised covariance matrix, three assets, and I'm actually going to work out some weights, and in the middle of the problem, if all the weights are non-zero, we can get a formula as a function of our correlation parameter, but if you look at the boundary structure, you see there is actually a more complicated interaction. So, here's a classic QP, quadratic programming, solution for three assets, so I've got three assets - one, two, three. For large negative values of the correlation, we're entirely in the first two assets and not in the third, and then as we change the correlation structure, what happens is that for some points, we're invested in all three assets, and then the contribution of one of them drops to zero and we remain in the other two. We can visualise that a little bit - let me show you just what's going on here.
What I've got here is a diagram in which I've plotted the risk as a function of two of the three weights. My first weight along here is 0-1, the second weight along here 0-1, and the third weight is obtained by solving the constraint, and the final constraint that the third weight is between 0 and one gives us this boundary line here, and this surface is the risk of the configuration.
Now, what I want to show you here is the geometry of this. As we change the correlation structure, what happens initially is that the optimal point, that's the minimum of this surface subject to the constraints, rolls along the boundary - so it's a certain type of problem - and then it starts to migrate into the middle and finds the [minimum] of the surface. Then, as we increase the correlation some more, it migrates to the outer edge and then just sticks there. So you've got this interesting interplay between differentiation and boundary conditions. Again, you can get quite big changes in whether or not, say, in this case, you don't hold Asset 2 at all, and in this case, you do hold Asset 2, so again, you have this idea of quite big changes in your optimal configuration coming out of the problem.
Much has happened since then, and I'm not going to pretend to survey it all. I want to draw people's attention to a very nice contribution by Professor Michael Powell of DAMTP in Cambridge, one of the foremost numerical analysts of his generation. Mike did something very interesting. You see, at the background to all this is the notion that the risk of the structure is just characterised by the variants. We know in reality that financial risk is not just the normal distribution. The mean and the variants aren't everything. You could have skew distributions, you could have kurtosis, you can have skewness, you can have all kinds of structure in there, or you could try and work not just with the moments but the whole distribution. So one question is what do you do if you want to manage this risk/return balance problem in a situation where you're not just looking at quadratic functions.
Mike Powell, in 1989, wrote a Cambridge DAMTP report and a computer programme which actually solved the problem of maximising non-linear functions, not necessarily quadratic, subject to linear constraint sets. So in other words, what he did, though it was buried in a big pile of FORTRAN, was to actually allow a certain class of risk minimisation problems to be solved without the assumptions of it being a quadratic structure. Now that work was done for the computer vendor selling the IMS cell numerical libraries, and they really didn't do very much with it. Mike has said...was rather irritated, and said very publicly that they didn't do very much with it, and it was because you had to be able to differentiate the function and do certain things with it. But it's, in my view, an important contribution because it takes us outside of this quadratic framework, so you can think about non-quadratic risk structures.
Another important practical point about this is that Powell was giving away source code, and in comparison with some of the commercial optimisation vendors, who wanted to charge you many thousands of pounds or dollars for black boxes, Mike Powell actually put his source in the public domain and the numerical analysis report was also there as well, so you can get at it, see how it works, you can write code which talks to it. His algorithm allows for lower bounds on weights, upper weights on weights, budget constraints, sector exposure and things. Anyway, as the time is running short, I'm not going to sort of implement that in detail here, but if anyone is interested in getting that or seeing it working, get in touch with me at King's. Anyway, there's some free optimiser software you can get.
Okay, let's go back to the theoretical developments. Use of this four covariance structure is not the only way to go, and there are...there's a thread of developments in mathematical finance and economics which links to what it is one's trying to optimise.
So, if we go back to the quadratic notion, there is a collection of ideas based on the capital asset pricing model, and the idea here is - and this is viewed with my optimisation glasses on - not to use to use the four covariants to characterise the risk, but to try to think what are the key drivers of risk. One model of the key drivers of risk is to say that assets are correlated through their link to the main market index. The two key equations here in this framework - it was developed by Bill Sharpe, Treynor, Lintner, Mossin, and many others - was to say that asset returns, first of all, have a component beta times the index return, plus an asset-specific component, and then there is some noise in the structure. Now, with this model and these being independent, you end up with a covariant structure where all of the correlation components are actually determined by these betas. So if you take the variant of the index, all the correlation in the system is governed by a structure like this, and then there is a further diagonal term which is asset-specific volatility. That's a one factor model. Then, there are some - that's one factor, there's four covariants - and then there are many things in between that, where you can say, let's suppose that the returns and the risk are dependent on two or more factors, and those two or more factors could be purely abstract mathematical objects, defined by an eigenvalue analysis of the covariant structure, or they could be based on real, observable economic factors, like foreign exchange rates, oil prices, as well as index values. So you get a family of models based on this, where the covariant structure is resolved into a kind of vector of betas, which gives you the multi-factor component, plus again some residual asset-specific risk.
Now, one issue with all of this is that the four covariants approach, the CAPM approach, and this approach, which collectively goes by the name of APT, all give you different answers for the weights, so it's a little bit of an issue to decide what to do.
Let's just talk briefly about time dependence. There are many times of generalisation, this idea where you fold in some time dependence. One is to analyse the idea of rebalancing your portfolio periodically. So, every three months, you want to create a new optimal configuration, and there, people think, for example, about market impact studies. So if I've got one collection of weights and I need to go to another which is different, how do I get from A to B at minimal cost? There's work by various people on that - Almgren & Chriss in 1999, a paper, did quite a lot on that. There's the whole idea of doing - the whole [?] problem in continuous time, and that's been thought about by Xunyu Zhou and his collaborators in the last few years. So there are a host of time dependent variations.
I want to talk briefly about so-called post-modern portfolio optimisation theory. This largely revolves around the idea that variance is not everything. There's a bit more to it than that, but variance is not all of risk in the realistic non-normal world, so there are various types of realistic objective function under consideration, semi-variants, which was actually one of Harry Markowitz's original ideas, which was to think purely about the downside risk. You can think about modulating risk by having not just variants but skewness and kurtosis, and you can avoid moments completely by having measures that are functions of the entire distribution. You can find ideas - the Sortini Ratio, Keating's Omega Parameter, and so on - all of which give ideas for more interesting classes of objective function which allow you to combine both the sort of non-normality of the system together with the one-sided thing you do, which is worrying about a loss but not a profit.
Robustness? Okay, this is very important. There's been a lot of work really in the last few years on this. You've seen, even in my little sheep and goats example, that the solutions to these problems are unstable. Instability is going to come back again later I suspect. But here we have a situation where sheep and goats, if you change the profitability, the optimal configuration changes. We know, for example, that as we vary from a four covariance model of risk back down to CAPM or back up through APT, we can get different answers for the weights. Why is that a problem? It's just a fact about the system.
It's a problem for two reasons - at least two reasons. Firstly, it costs money to change from one configuration to the next, so if your optimal configurations are jumping from one state to another, you're transacting for instability reasons. The second thing is that we don't actually know the parameters of these functions we're trying to maximise that well. We don't know the future returns precisely. We just have estimates - don't have a clue. The covariance structure is an estimate based on historical data. There might not be enough historical data to get a covariance matrix that actually has the right properties.
Now, Fenny cheered me up wonderfully by referred to Girolamo Cardano towards the end of her talk. Cardano was the person who completed this sequence about thinking of different types of numbers. You had whole numbers, you had fractions, and then you had reals and you had negative numbers, and when it came to Cardano, he finally came up with imaginary and complex numbers. Now, one of my dubious claims to fame, I was the first person in the history of Nomura international, in the 1990s, to come up with complex numbers for option prices! Now, how did I [struct that way]? It's actually related to these covariant structure degeneraties, because if you have a covariant structure and you don't enough data to define it properly, it can be degen - it can have zero eigenvalues in it, and that means that there can be a whole valley in the risk structure and you can move along there without changing your risk. Not only could it be degenerate that way, if a trader has done a "what if" calculation and fiddled with a correlation, you can actually go to a covariant structure that's got negative eigenvalues. When you apply this to multi-asset option pricing and diagonalise it, you end up with negative pseudo-volatility squares and imaginary volatilities, and then you end up with imaginary numbers in your Black Scholes model, and of course that's ridiculous. Fortunately, I was able to point the finger of blame at the trader who'd changed his covariant structure, so it was all alright, but it's a reflection of the fact that these covariant structures are really hard to deal with [carefully], and in particular, you don't know them that well, so near degeneracy in covariance can cause absolute chaos, and then the covariance is not that well-known anyway, and the question is what do you do about that uncertainty.
So in the last few years, there's been work at Imperial College, managing it with various scenarios. In the United States, Goldfarb-Iyengar have said these covariance numbers are not precise things - we'll have little balls, and they can live within those balls. Some people have rectangular uncertainty domains, and so - and there's work by some former colleagues of mine in Oxford, Rafael Hauser, Dennis Zuev, Ray Tutuncu - I think he works for Goldman's now - all looking at this idea of robustifying the solution, and what they do is to introduce the idea of finding the best worst case. So you think about all the bad things that could happen, and then you improve those as much as possible. One thing that's going on with that is that most of the approaches I've seen on this keep the classic quadratic structure. One thing that really needs to be done is to develop, in detail, combine this robust approach with a more realistic, you know, skew, one-sided optimisation structure.
Okay, so I think we're probably all getting close to gagging for lunch. I'll skip this last thing. One thing that everybody's worrying about in the fund management business these days is the so-called 130/30 philosophy of investing, which involves, or not, various degrees of optimisation, but I will skip that and go on to my summary.
Optimisation is a topic that's under active investigation at the present. We've got - and there's all kinds of different ways of looking at it. Mathematicians are interested in the mathematical ideas - how should you characterise the risk structure? Numerical analysts and computer scientists are interested in finding efficient algorithms. Fund managers are interested in their bonus. All of us, most of us here, will have pensions that we would like managed well and on a rational basis, so it's important to everybody that this gets done right and there's a lot of work on it.
And then beyond the financial thing, you know, we would like to see food distributed around the country with minimal fuel costs, minimal environmental impact. One of my old school colleagues runs an optimisation company solely devoted to vehicle routing on that basis. Very important work.
Tasks need to be allocated efficiently amongst workers, and that was Dantzig's original problem; chips designed efficiently; if you're interested in home cinema, you want shortest possible video pathways in your AV amps, and so on and so on and so on.
So there are all kinds of wonderful problems linked to this, and it's a - this subject is really only 50 years old, and there's going to be a lot more to come.
©Professor William Shaw, 2008