Use Rvest to download traffic data from Caltrans Performance Measurement System

Use Rvest to download traffic data from Caltrans Performance Measurement System

Recently I helped a friend of mine to download some traffic time-series data from the Caltrans Performance Measurement System. Basically we need to download the traffic data from all the major traffic census stations on the I-405 freeway, and the time span needs to cover a couple of months. After searching online for a couple of days and asking a few questions on stackoverflow (1,2,3) I finally assembled a piece of R code to accomplish what we need to do. rm(list=ls())…

Read More Read More

How to Download Your Fitbit Second-Level Data Without Coding

How to Download Your Fitbit Second-Level Data Without Coding

If you are a Fitbit user who wants to save a copy of Fitbit data on your computer but doesn’t have advanced programming skills , this tutorial is right for you! You don’t need to do any coding at all to save your second-level data. I have been struggling with getting all the so-called ‘intraday data’ for quite a while. I have found many useful resources online, for example, Paul’s tutorial,  Collin Chaffin’s Powershell module, and the Fitbit-Python API, but they are…

Read More Read More

How to Get 0.99+ Accuracy in Kaggle Digit Recognizer Competition

How to Get 0.99+ Accuracy in Kaggle Digit Recognizer Competition

Recently I have spent a lot of time working on the Kaggle digit recognizer competition and finally reached an accuracy higher than 0.99. I am quite happy with it and would like to share with everyone how I did it. Basically I used TensorFlow to build a neural network with these ‘highlights’: three hidden layers, with some dropout between each layer, but no convolution in them an 25 times larger training data set – generated by nudging original training images to up,…

Read More Read More

TensorFlow on Windows 10 Using Docker Installation Method

TensorFlow on Windows 10 Using Docker Installation Method

I am taking an online course of Deep Learning now and it requires me to use TensorFlow. I spent a lot of time searching around, testing different things, and finally managed to run TensorFlow on my windows 10 laptop. So I think maybe I should write a post to remind myself, just in case I need to do it again in the future. And I hope this post can save someone else’s time too. The overview section of Download and…

Read More Read More

My Experience with Udacity Data Analyst Nano-degree

My Experience with Udacity Data Analyst Nano-degree

After spending most of my spare time in the past 8 months, I finally graduated from the Udacity Data Analyst Nano-Degree program! Before I started this program, I have spent many hours searching online for reviews and discussions about it. Now I would like to share my whole experience with the internet and hope it is helpful to someone like me. Since I have also taken other courses at coursera.org and edx.org, I can make some direct comparisons which should be helpful…

Read More Read More

Have Flights Delays decreased Over Time?

Have Flights Delays decreased Over Time?

I think it is not just me who would think that, as technology advances, the flight carriers are able to reduce the arrival delay over time. Is that really the case? I looked into the historic flights data and found something surprising and interesting. First I downloaded all the historic flights data from stat-computing.org.  These 22 years (1987 -2008) of data add up to be more than 10 GB. So I wrote a piece of R code which can read each…

Read More Read More

Hello world!

Hello world!

This is my first blog entry! I plan to use this blog to post my data science codes, exercises, projects, and ideas from time to time. The languages I use include Python, R, and MATLAB. It is still a work in progress and I hope these personal notes can benefit other data scientists somehow.