Example: Download Yahoo Stock Data in R
Example: Download Yahoo Stock Data in R
GNU R is one of the most useful scripting languages for analyzing and backtesting trading strategies. They have built in libraries for downloading Yahoo stock data as well as from other free sources. I’ll assume you have a basic knowledge of R in this short article; if not, then I highly recommend taking the highly interactive FREE course from Datacamp where you learn by working out examples.
First, you’ll need to download R for your operating system either from their website or by running “sudo apt-get install r-base r-base-dev” at the command prompt (without quotes) if you’re running Ubuntu or a variation of Debian Linux. Next, download RStudio which is a much better working environment and IDE than the default environment included with R.
Open RStudio and run the following commands to install 3 key packages:
1 2 3 |
install.packages("quantmod") install.packages("tseries") install.packages("timeDate") |
If these packages are already installed, you can simply load them into R directly:
1 2 3 |
library(quantmod) library(tseries) library(timeDate) |
Play around with downloading stock data. For example, to download the OHLC data for the stock Tesla:
1 |
get.hist.quote(instrument = "TSLA", retclass = "zoo", quiet = T) |
Or if you want only the Adjusted Close data for Apple:
1 |
get.hist.quote(instrument = "AAPL", quote = "AdjClose", retclass = "zoo", quiet = T) |
Next, designate a special folder on your hard drive for your data files that you’ll be working with in R. Create one if you don’t already have one set aside. Then use the following dialog in RStudio (shown below) to select that folder and register it as your working directory:
Using Yahoo Stock Data for a basket of securities
Once your working directory is set, download the following 2 files which contain a list of tickers for the S&P500 and Dow Jones Industrial Average respectively:
Remember to save them to the same working directory you set earlier!
(Note: these tickers reflect the indices’ composition as of December 2015.)
Then run the following code to download adjusted closing prices on every S&P500 stock since 2010 (be patient, it’s gonna take a while):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
symbols <- read.csv("sp500.csv", header = F, stringsAsFactors = F) nrStocks = length(symbols[,1]) dateStart <- "2010-01-01" z <- zoo() for (i in 1:nrStocks) { cat("Downloading ", i, " out of ", nrStocks , "n") x <- get.hist.quote(instrument = symbols[i,], start = dateStart, quote = "AdjClose", retclass = "zoo", quiet = T) z <- merge(z, x) } |
(Note, this is a variation of the code originally posted on the QuantTrader blog but the post is no longer available. Here’s an archived snapshot of the original post.)
If you can’t wait to download all 500+ stocks, then try downloading the 30 stocks of the Dow instead:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
symbols <- read.csv("dow.csv", header = F, stringsAsFactors = F) nrStocks = length(symbols[,1]) dateStart <- "2010-01-01" z <- zoo() for (i in 1:nrStocks) { cat("Downloading ", i, " out of ", nrStocks , "n") x <- get.hist.quote(instrument = symbols[i,], start = dateStart, quote = "AdjClose", retclass = "zoo", quiet = T) z <- merge(z, x) } |
The variable z will contain historical price data on all stocks in your basket merged together. One of the coolest features of the merge function is that it’ll attempt to align the data from all the stocks even if you have some missing data which is a godsend if you’re dealing with the numerous bugs with Yahoo stock data. Note that missing data will register as “NA” so you may need to run your own post-processing routine to make your final data usable.
In the future, I’ll post more articles on tips and tricks I’ve found with working with stock data in R, including using the more comprehensive Quandl data service. Let me know in the comments which platform you prefer to use in your backtesting.