Plotting live charts with Yahoo Finance data and ggplot2 in R

Active Analytics Ltd: posted 17 May 2013 12:14 by Chibisi Chima-Okereke [ updated 22 May 2013 05:34 ]

Google stock price with 'live' data from Yahoo Finance

News

There are a few announcements that need to be made before I launch into this week's blog.

Active Analytics Ltd : New R Course: R for Reserving Actuaries

This course is a two-day course for people that have already been on the three day Introduction to R course. The program is as follows

  1. Data structures for reserving data in R
  2. Plotting for reserving data
  3. Writing your own reserving methods and working with data in R
  4. Chain Ladder
  5. Cape Cod
  6. Curve Fitting methods
  7. Generalized linear models
  8. Introduction to the ChainLadder package
  9. Introduction to space state methods for reserving in R
  10. Simulation methods for reserving in R

The R In Insurance Conference

I would like to mention the R in Insurance Conference. There will be very interesting talks, from all major fields of actuarial practice: reserving, pricing, capital modelling, catastrophe modelling and many other interesting R-related tools and techniques that are relevant this field. R is a very important statistical, data mining, data manipulation and analysis tool in many sectors and is beginning to make inroads in the insurance market. Its adoption is really a no-brainer when you consider its tool-set and versatility.

Introduction

I have been pretty busy with work and travelling this week and the nature of this blog entry required an internet connection to ensure that the code works properly, so it has taken a little longer to get it together.

The principle aim is simple, you would like to pull some stock price data from Yahoo Finance, live (in this case it is delayed) and plot this in R. Of course you can choose to analyse the data live as well as plotting, but today we will keep it simple and focus on pulling the data from the Yahoo Finance CSV API and plotting in ggplot2.

You could also choose to plot using Google charts, and if you are working with R, there is a great package googleVis by Markus Gesmann. His googleVis blog is located here. Incidentally, he is one of the organizers of the R In Insurance Conference.

We require just two packages here, ggplot2, and gridExtra which as well as providing more functionality to the grid package allows us to arrange ggplot2 charts together.

options(stringsAsFactors = F)
require(ggplot2)
require(gridExtra)

Downloading stock quote data from the Yahoo Finance API

I use the 'live' to describe the live data because, it is made clear that the stock data is delayed. We will be concerned with loading data using the CSV api, which really only involves creating the the appropriate URL. A simple tutorial on creating this URL is given here and the property list of the items is given here.

The first thing I do is to create an environment to store the data, this is not essential but often with processes like this it may be safer not to have the source data frame hanging around the global environment so that it is not unexpectedly changed and so we are forced to get() and assign() values to it.

# Creating the environment
assign("envPrevData", new.env(), envir = .GlobalEnv)

We then use the paste() function to generate our URL query and download it using the read.table() function. The time given for the stock quote is to the nearest minute but we can download more frequently than this. I have chosen to use the system time Sys.time() on the machine which is will be the time just after I made the query. In addition, the time I am using is local here, BST but obviously "GOOG" is traded on NASDAQ which has EDT.

Each call to this function will obtain a row of data ready to be appended to the main data frame.

# Function to get the data
GetLiveData <- function(sSymbol = "GOOG")
{
    sAddress <- paste("http://download.finance.yahoo.com/d/quotes.csv?s=", 
        sSymbol, "&f=nsb2b3v0&e=.csv", sep = "")
    cat("Downloading data from ", sAddress, "\n")
    dfYahooData <- read.table(sAddress, sep = ",", header = F)
    # I chose to create my own time 
    Time <- Sys.time()
    # Appending the time to the data frame
    dfYahooData <- cbind(dfYahooData, "Time" = Time)
    # Adding the column names
    names(dfYahooData) <- c("Name", "Symbol", "Ask", "Bid", "Volume", "Time")
    dfYahooData
}

The stock update function

Once we have downloaded the data we need a function to ensure that it is appended to the main data frame. The other nice thing about working with environments is that any name you want is already a NULL and you can append to it. For instance, if dfStockData does not exist in the global environment, you cannot append data to it for example, this

# Here dfCurr exists but dfStockData does not
rbind.data.frame(dfStockData, dfCurr)

Would give an error since dfStockData does not exist but if we create an environment envPrevData, we can just do this

# Here dfCurr exists but dfStockData does not
rbind.data.frame(envPrevData$dfStockData, dfCurr)

and carry on spontaneously appending or assigning things as we like. I take advantage of this to append the data to a data frame the environment without worrying about initializing it first

# Function to update the current data
UpdateStockData <- function(sSymbol = "GOOG")
{
    envPrevData <- get("envPrevData", envir = .GlobalEnv, mode = "environment")
    dfCurr <- GetLiveData(sSymbol = sSymbol)
    print(dfCurr)
    try(envPrevData$dfStockData <- rbind.data.frame(envPrevData$dfStockData, dfCurr))
    assign("envPrevData", envPrevData, envir = .GlobalEnv)

    invisible()
}

The plotting function

The plotting function is a bit of a beast, it has to be admitted. Just consider that there are three parts.

  1. Some small data preparation to get the bid and ask movements. The midpoint is also calculated though it and the bid movements are not used.
  2. The first plot creates the bid-ask line range chart on the top.
  3. The second plot creates the volume bar chart below bid ask chart
    1. # The plot function
      plotChart <- function(){
          # Here we get the data
          envPrevData <- get("envPrevData", envir = .GlobalEnv, mode = "environment")
          dfStockData <- envPrevData$dfStockData
          # We only want to plot if we have five or more points
          if(nrow(dfStockData) > 4){
              # 1. We prepare the data
              AskMovement <- factor(sign(c(0, diff(dfStockData$Ask))), levels = c(-1, 0, 1), 
                  labels = c("Down", "No Change", "Up"))
              BidMovement <- factor(sign(c(0, diff(dfStockData$Bid))), levels = c(-1, 0, 1), 
                  labels = c("Down", "No Change", "Up"))
              VolumeChange <- c(0, diff(dfStockData$Volume))
              dfStockData <- data.frame(dfStockData, AskMovement, BidMovement, VolumeChange)
      
              dfStockData$Mid <- with(dfStockData, .5*(Bid + Ask))
      
              # 2. This is the first plot for the Bid-Ask
              bAPlot <- ggplot(dfStockData, aes(Time, Mid,
                ymin = Bid, ymax= Ask, colour = AskMovement))
              bAPlot <- bAPlot + geom_linerange(lwd = 1.5) + xlab("") + ylab("Price\n")
              bAPlot <- bAPlot +  theme(legend.position = "top", 
                  plot.margin = unit(c(0, .5, -1.5, 0), "lines"), 
                  axis.text.y = element_text(angle = 90), axis.text.x = element_blank()) + 
                  labs(colour = "Ask Movement") + 
                  xlim(range(dfStockData$Time)) + 
                  scale_colour_manual(values=c("red", "blue", "green"))
      
              # 3. This is the Volume change plot
              VolPlot <- qplot(y = VolumeChange, x = Time, data=dfStockData, 
                  geom="bar", stat = "identity", fill = AskMovement)
              VolPlot <- VolPlot + theme(legend.position = "none", 
                  plot.margin = unit(c(0, .5, 0, 0), "lines"), 
                  axis.text.y = element_text(angle = 90)) + 
                  xlab("\nTime") + ylab("Volume Change\n") +
                  xlim(range(dfStockData$Time)) + 
                  scale_colour_manual(values=c("red", "blue", "green"))
      
              # We use grid arrange to arrange the plots
              grid.arrange(bAPlot, VolPlot, nrow = 2, heights = c(1.5, 1))
          }
      }
      

      Executing the process

      We Finally bring everything together and run the code
      # Running the process
      CurrTime <- Sys.time()
      while(Sys.time() < CurrTime + 60*60){
          UpdateStockData("GOOG")
          plotChart()
          Sys.sleep(30)
      }
      

      Conclusion

      This is clearly a quick and dirty example, but shows how you can start constructing your own analysis tool based on 'live' quotes from Yahoo. The Yahoo Finance CSV api is very easy to use but there are some unfortunate problems. The bid and ask volumes can come down with commas for separating thousands zeros which is a great shame for a CSV formatted object. I have no idea why commas should be in numbers in the first place - unless you take them as decimal points in which case a very different separator should be used.

      To get interactive graphics, the Google charts api is good and the googleVis package is convenient for R programmers.

      The complete code for this blog is located here

      Data Science Consulting & Software Training

      Active Analytics Ltd. is a data science consultancy, and Open Source Statistical Software Training company. Please contact us for more details or to comment on the blog.

      Dr. Chibisi Chima-Okereke, R Training, Statistics and Data Analysis.