Randomness is Not What it Seems

Harrison Gu
3 min readAug 1, 2021

In life, we often chalk up inexplicable events as being “random”. We then give the randomness labels of “good luck” and “bad luck”. If someone gets into an accident through no fault of their own, we might say something like “that was just unlucky”. While on the surface, this might seem like a random event, I believe that things like this are actually predictable, at least in theory.

Everything we encounter in our daily lives has a cause (or causes), and effect. The effect is the event that we observe, and the cause precedes the effect, serving as an underlying reason why that event occurred. A very basic example of this can be seen in physics. If I throw a ball in a controlled setting, I can calculate exactly where the ball will land. In this example, the causes, or predictors, are the weight of the ball, and the force and angle that I threw the ball with; the effect is where the ball lands. In an uncontrolled setting, I can still calculate where the ball will land, but I would need to know all the relevant predictors, such as the speed and direction of the wind, and perhaps the humidity of the air, to name a few.

The same idea can be applied to many of these “random” events we see day to day. The reason why we label these events as random is because we are ignorant to all the relevant predictors, and how they interact with the event that we observed. Using data science, we can attempt to explain some of these events, albeit rarely, if ever, with perfect accuracy. First, we need to obtain the data. It is very important to try to include as many relevant predictors as we can think of in the data. The more data points that we have, the more accurately we can simulate reality. We then run the data through multiple different models until we end up with a model that can predict the event at an accuracy we are satisfied with. The statistical models that we use in data science generally use linear algebra and calculus to reverse engineer coefficients for each predictor that we have. These coefficients serve to represent how the predictor interacts with the observed event. We can then use these coefficients to explain and predict if and when this event will occur again.

A real life example of the power of data science can be seen in the health insurance industry. There are many instances of insurance companies being able to predict certain health conditions in new applicants before doctors are even able to detect these conditions. While data science is a very useful tool to predict seemingly random events, there are still limitations to its predictive ability. In order to create a perfect predictive model, we need to include all of the relevant predictors, some of which are not obviously related to the event. Furthermore, to insure that our model is 100% accurate, we would also need an infinite number of data points, which by nature is impossible. Let’s just say that we are able to satisfy those two requirements. We are still bounded by the computational power of our current technology, and the amount of time that we have, as it would take a very long time to run through all of that data. Because of these limitations, data scientists usually aim to maximize the accuracy of the model, up until the marginal return on computational power and time becomes too small.

In conclusion, while daily events in our lives have a cause, they seem random to us because we are not aware of all the factors at play. In theory, data science models can explain these events, however, due to limitations of the real world, often times they are still inaccurate.

--

--