Use Rvest to download traffic data from Caltrans Performance Measurement System

Use Rvest to download traffic data from Caltrans Performance Measurement System

Recently I helped a friend of mine to download some traffic time-series data from the Caltrans Performance Measurement System. Basically we need to download the traffic data from all the major traffic census stations on the I-405 freeway, and the time span needs to cover a couple of months. After searching online for a couple of days and asking a few questions on stackoverflow (1,2,3) I finally assembled a piece of R code to accomplish what we need to do.

rm(list=ls())
library(rvest)
library(httr)
 
getTable <- function(resp){
  # This function extract the table from a response
  pg <- content(resp$response)
  html_nodes(pg, 'table.inlayTable') %>% html_table() -> tab
  return(tab) # return the content of table
}
 
generateURL <- function(siteID){
  # This function generates a URL for each input siteID
  urlPart1 = "http://pems.dot.ca.gov/?report_form=1&dnode=tmgs&content=tmg_volumes&tab=tmg_vol_ts&export=&tmg_station_id="
  urlPart2 = "&s_time_id=1369094400&s_time_id_f=05%2F21%2F2013&e_time_id=1371772740&e_time_id_f=06%2F20%2F2013&tod=all&tod_from=0&tod_to=0&dow_5=on&dow_6=on&tmg_sub_id=all&q=obs_flow&gn=hour&html.x=34&html.y=8"
  url = paste(urlPart1, toString(siteID), urlPart2, sep = '')
  return (url)
}
 
siteIDList = c(74250, 75020, 74020)
mainURL = "http://pems.dot.ca.gov/"
pgsession <- html_session(mainURL)
pgform <- html_form(pgsession)[[1]]
filled_form <- set_values(pgform,
                          'username' = 'segoviashu2000@yahoo.com',
                          'password' = 'house6y')
 
# slog is the logged-in session that can be reused
slog <- submit_form(pgsession, filled_form) 
 
# loop thru siteIDList to scrape all the tables
vectorOfTables <- vector(mode = 'list', length = length(siteIDList))
i = 1
for (siteID in siteIDList){
   print ("Working on site:", quote = F)
   print (siteID)
   newsession = jump_to(slog, generateURL(siteID))
   vectorOfTables[i] = getTable(newsession)
   i = i+1
}
 
# Show the first table in vectorOfTables
vectorOfTables[1]

And remember to always use caution when scarping!

2 thoughts on “Use Rvest to download traffic data from Caltrans Performance Measurement System

  1. Aos que nunca incólume familiarizados com a história compadecimento quarteto a Liverpool, na Inglaterra, depois de 1966,
    junto espetáculo executado dentro de Ileso Francisco, na Califórnia,
    Estados Unidos, os Beatles se aposentaram das apresentações
    ao acirrado a fim de afiliar-se na período na quem, perante escritório, criaram sua antepassados
    obras-primas, a começar por Sgt. https://netflix.joomla.com/

Leave a Reply

Your email address will not be published. Required fields are marked *