As you might have noticed (for example here or here) I have an interest in prices of our nice little drink: tea.
I usually focus on old prices (sometimes among centuries) and I might still do that soon but another interesting thing is to focus on prices, on future prices. Don’t worry, I will not drown you into million of complex mathematical formulas or some barbaric letters like α or β but more into the minds and whereabouts of people doing this.
If you do an Internet research on this topic, you will find I am sure a lot of different things. I did it but only in English. I am sure there are other interesting findings to make in other languages such as Chinese or Japanese but since I don’t speak either, I couldn’t look in this direction, if you can and if you do, I would be interested to hear about it.
But let’s go back on our main topic.
After some research and cross references, I found several articles about price of tea and how to “predict” them in a mathematical way. However, guess what? The numbers and formulas weren’t the same. I hear you saying “how typical” and things like that.
But let’s get out of these clichés and look a bit more at what the different researchers/students worked on and how they did it.
First, some were working on Indian teas (with a further subdivision into Northern and Southern ones) and others on Ceylon ones. Obviously, if you don’t have the same geographical perimeter, you don’t get the same tea prices in the past and therefore not the same analysis.
Second, two different methods were used and both are valid. I will explain myself. In order to get models that match the past (and thus can predict the future if there aren’t too many changes in the overall conditions as mathematical models only work if “all other things being equal” (a standard sentence in this field of work), you must keep out of the system, the outliers (small mistakes or data that for one reason or another don’t fit into the general picture) and the things that mess up with the general trend. One of these things can be the season effect (which obviously is true for commodities), like for example the Christmas season is usually an important event but is not representative of the sales of the whole year and as it comes back every year, it can be discarded. On the other hand, one can make the hypothesis that the discarding of this data is a waste of efficient information (or because there is a lack of belief in the seasonality of certain things) and thus keep all the data.
With two simple explanations of the first steps when facing a lot of data, it is possible to understand why results are different from one research to another.
But how does it truly work?
OK. Let’s say that you’ve got your data (enough of it) and you cleaned it the way you want. There are several types of mathematical models that can be applied to a certain set of data to fit the past evolution and then to “predict” the future ones. This means either having the intuition or the knowledge that one might do the job (more or less) or having to test several before finding the one that suits the data you have. Earlier this was done with pen and paper but today, computers can do quickly and far more efficiently than we can, easing the task).
Random data points and their linear regression by Sewaqu
Once a model suits the data you have and you have all its values, you must check to see if the outliers you left out make any sense. In other words, is there a reason for the price dropping or increasing by 50% a peculiar month? For example tea prices could go up because the production isn’t up to what the market needs because there was flood or lack of rains or …. If you manage to find a plausible explanation for every out of place data, then it is likely that your model is adequate and that you can use it to predict the future price of tea leaves.
But obviously, it implies “all other things being equal”, which is always the tricky part and can lead to new results and a complete rework of what has been done.
And remember, Don’t worry. As long as you hit that wire with the connecting hook at precisely eighty-eight miles per hour the instant the lightning strikes the tower… everything will be fine.