Hello World!

Hello World!

This is my first blog entry! I plan to use this blog to post my data science codes, exercises, projects, and ideas from time to time. The languages I use include Python, R, and MATLAB. It is still a work in progress and I hope these personal notes can benefit other data scientists somehow.

Use Rvest to download traffic data from Caltrans Performance Measurement System

Use Rvest to download traffic data from Caltrans Performance Measurement System

Recently I helped a friend of mine to download some traffic time-series data from the Caltrans Performance Measurement System. Basically we need to download the traffic data from all the major traffic census stations on the I-405 freeway, and the time span needs to cover a couple of months. After searching online for a couple of days and asking a few questions on stackoverflow (1,2,3) I finally assembled a piece of R code to accomplish what we need to do. rm(list=ls())…

Read More Read More

How to Download Your Fitbit Second-Level Data Without Coding

How to Download Your Fitbit Second-Level Data Without Coding

If you are a Fitbit user who wants to save a copy of Fitbit data on your computer but doesn’t have advanced programming skills , this tutorial is right for you! You don’t need to do any coding at all to save your second-level data. I have been struggling with getting all the so-called ‘intraday data’ for quite a while. I have found many useful resources online, for example, Paul’s tutorial,  Collin Chaffin’s Powershell module, and the Fitbit-Python API, but they are…

Read More Read More

How to Get 0.99+ Accuracy in Kaggle Digit Recognizer Competition

How to Get 0.99+ Accuracy in Kaggle Digit Recognizer Competition

Recently I have spent a lot of time working on the Kaggle digit recognizer competition and finally reached an accuracy higher than 0.99. I am quite happy with it and would like to share with everyone how I did it. Basically I used TensorFlow to build a neural network with these ‘highlights’: three hidden layers, with some dropout between each layer, but no convolution in them an 25 times larger training data set – generated by nudging original training images to up,…

Read More Read More

TensorFlow on Windows 10 Using Docker Installation Method

TensorFlow on Windows 10 Using Docker Installation Method

I am taking an online course of Deep Learning now and it requires me to use TensorFlow. I spent a lot of time searching around, testing different things, and finally managed to run TensorFlow on my windows 10 laptop. So I think maybe I should write a post to remind myself, just in case I need to do it again in the future. And I hope this post can save someone else’s time too. The overview section of Download and…

Read More Read More

My Experience with Udacity Data Analyst Nano-degree

My Experience with Udacity Data Analyst Nano-degree

After spending most of my spare time in the past 8 months, I finally graduated from the Udacity Data Analyst Nano-Degree program! Before I started this program, I have spent many hours searching online for reviews and discussions about it. Now I would like to share my whole experience with the internet and hope it is helpful to someone like me. Since I have also taken other courses at coursera.org and edx.org, I can make some direct comparisons which should be helpful…

Read More Read More

Have Flights Delays decreased Over Time?

Have Flights Delays decreased Over Time?

I think it is not just me who would think that, as technology advances, the flight carriers are able to reduce the arrival delay over time. Is that really the case? I looked into the historic flights data and found something surprising and interesting. First I downloaded all the historic flights data from stat-computing.org.  These 22 years (1987 -2008) of data add up to be more than 10 GB. So I wrote a piece of R code which can read each…

Read More Read More