Predicting the Stock Market

Harrison Gu
3 min readOct 22, 2021

One of my biggest inspirations for wanting to learn about data science was my fascination with the stock market. Ever since I was in high school, I was interested in trading stocks. At the beginning, I started off just playing the simulated stock market game on marketwatch.com. Eventually, as I learned more and more about fundamental analysis and investing, I began to buy stocks with real money, with the goal of long term investing. It wasn’t until the summer after my sophomore year of college, when I interned at a hedge fund, did I learn about technical analysis. This trading strategy doesn’t care as much about economic events, but rather analyzes trends in price action, volume, and other metrics to determine the most probable next movement in price.

As I tried to implement technical analysis into my trading strategy, I quickly learned that it was much harder than I originally anticipated. There were so many patterns to look out for, and the ability to cut losses or take profits regardless of my emotions was hard to learn. I remember thinking “only a robot could do this effectively”.

Fast forward to March 2020. The stock market experienced some of the its biggest of most violent losses in history due to the COVID lockdown. Immediately, I thought to myself how well a good technical trader would do in these volatile times. Because of this, I started researching better ways to implement technical analysis, and one thing led to another, and I finally found out about data science. Even though my original expectations for data science were very different from what I know it to be today, my main goal for wanting to learn data science was to help me in predicting stock price action for the purpose of day trading.

The 4th project that I did in Flatiron’s data science bootcamp gave me an opportunity to work on time series modeling. My target was to predict the median average home price for various zip codes in order to determine the best real estate investment opportunity. The model would take monthly median average home prices for the past 30 years in order to predict future prices, and all the data was taken from zillow.com.

For our time series modeling, we first ran ARIMA models for each of the zip codes. Before training the ARIMA model, we have to make sure there are no trends in our data, meaning in the long run the data should not be increasing or decreasing. One way to tell whether or not the data has trend is to run a Dickey-Fuller test, and see whether or not the p-value is significant. There are a few ways to de-trend the data, namely log transforming, differencing, and scaling. For my models, I used log transforming, differencing by 1 and 2 periods, and a combination of log transforming and differencing. After de-trending the data, I did ACF and PACF tests to find p and q values (parameters for ARIMA models). Then I ran auto ARIMA, the gridsearch version for time series models, in order to tune the parameters. Additionally, I also ran FB prophet, another type of time series model.

Moving forward, I would like to learn how to use the SARIMAX model, which factors in exogenous variables such as economic news and investor sentiment in order to make predictions. This would be similar to combining fundamental and technical analysis for trading, all while having a robot (computer) do the work.

--

--