Browsed by
Month: May 2016

TensorFlow on Windows 10 Using Docker Installation Method

TensorFlow on Windows 10 Using Docker Installation Method

I am taking an online course of Deep Learning now and it requires me to use TensorFlow. I spent a lot of time searching around, testing different things, and finally managed to run TensorFlow on my windows 10 laptop. So I think maybe I should write a post to remind myself, just in case I need to do it again in the future. And I hope this post can save someone else’s time too.

The overview section of Download and Setup page says there are four different ways to install TensorFlow:

  1. Pip install
  2. Virtualenv install
  3. Anaconda install
  4. Docker install

Since I have heard about docker for a while but never get a chance to use it, I think it is a great opportunity for me to learn how to use docker. So I chose the Docker install method here. It looks pretty simple, only three steps.

steps

However, these three steps took me a whole morning…

First, I went to the Install Docker for Windows page and followed the instructions. I have no idea about whether the virtualization is enabled or not on my laptop, and my Task Manager looks different with the image shown on the instruction page.

Task Manager

I struggled with this and the BIOS for a while and found out that the virtualization IS enabled from the System Information (by runing msinfo32 command).

msinfo32

Next, I installed the Docker Toolbox since I am pretty sure I am use a 64-bit Windows. This process is very easy and straightforward. After installing Docker Toolbox, three more icons showed up on my desktop.

short cuts

I launched the Docker Toolbox terminal by double-clicking the Quickstart Terminal icon and made my very first docker command: “docker run hello-world“. So far so good.

terminal

 

So now I have finished the first step: “Install Docker on your machine“. But I had no idea how to do the second step: “Create a Docker group to allow launching containers without sudo“.  And the big lesson I learned here is that, this second step is NOT necessary, at least in my case. I skipped this step and went ahead to the third step: “Launch a Docker container with the TensorFlow image“.

I first tried “$ docker run it gcr.io/tensorflow/tensorflow” and everything looked good from the terminal, which said “The Jupyter Notebook is running at http://[all ip addresses on your system]:8888/“. Wait, what are “all ip addresses”? I typed in “localhost:8888” in my Chrome browser address bar but the Jupyter Notebook did not load…

localhost not working

Once again, a post on stackoverflow is my life-saver. I followed the answer and everything worked out. First I ran the command “$ docker-machine ip default” and figured out the ip address should be 192.168.99.100. Then I started the TensorFlow docker container again by using command “$ docker run -it -p 8888:8888 gcr.io/tensorflow/tensorflow“. Now the Jupyter Notebook is working at 192.168.99.100:8888.

Capture10

I opened the first notebook and made a test run on the first cell. It worked!

This is how I installed TensorFlow on my laptop via Docker. I hope it is useful to you. Feel free to leave your comments or questions below!

 

My Experience with Udacity Data Analyst Nano-degree

My Experience with Udacity Data Analyst Nano-degree

After spending most of my spare time in the past 8 months, I finally graduated from the Udacity Data Analyst Nano-Degree program! Before I started this program, I have spent many hours searching online for reviews and discussions about it. Now I would like to share my whole experience with the internet and hope it is helpful to someone like me. Since I have also taken other courses at coursera.org and edx.org, I can make some direct comparisons which should be helpful too.

First, I would say it really requires a lot of time to finish the degree. I roughly spent 15 + hours each week on this program in the past 8 months. This maybe does not sound like a lot of time to you, but actually it is, especially if you have another full-time job. So don’t jump into it if you can’t afford the time. As for tuitions, I have paid $1,600 for the program but Udacity will refund half of it because I finished the program within 12 months and I paid all of my tuition out of my own pocket. I haven’t received the refund yet because Udacity told me it takes 4 – 8 weeks to process. Just don’t forget to submit a request for this refund – Udacity will not automatically refund it to you.

The 8 projects covered a wide range of aspects in the data science field, including statistics, Python programing, R programing, machine learning, and D3.js data visualization. The  Python and R programing focused on data manipulation, wrangling, and visualization. The machine learning course is really condensed and does not go deep in algorithms and theories, compared to other machine learning courses. Overall, this nano-degree really focuses on the analysis skills such as process data and find interesting stories. If you want to be a data scientist instead of data analyst, this nano-degree is probably not the best choice for you.

There are many things I really liked about this program. First, Udacity has an amazing ‘customer support’ team. The coaches provide 1-0n-1 help sessions. Of course these coaching sessions need to be reserved first, which is fairly easy to do. Each help session is scheduled to be 20 minutes long, but a coach once chatted with me for more than an hour, until I really solved the problem. I only used online text chatting but it seems the coaches are open to other communication methods such as video-chatting or phone call as well. In addition, the discussion forum is a good resource that helped me finishing all the projects. The coaches reply to questions VERY quickly, usually in 30 min or less. And they are always very patient! The coaches also review the project submission in great details, give constructive feedbacks, and encourage the students all the time. I think this coaching team is the factor that makes this nano-degree program stand out, compared to other MOOC courses or specializations.

However, I believe this program still has some room for improvement. My biggest frustrations came from the course videos. Maybe it is because Udacity only consider the course videos as supporting materials, or maybe it is because the course are taught by mentors from the industry, I felt that the course videos are nothing like a real class. For a substantial amount of portion, the videos are just two or more mentors talking. The course videos did not really help me too much in finishing my projects.  I like the course videos on coursera.org much better because they are better organized and the contents are taught systematically. That is not the case with Udacity courses, at least for the data analyst nano-degree.

Another question people care about this program is that if it really help the students finding a job. Well, I can’t tell because I am just me, one sample, and there is not even a control sample. But at least the program gave me something to talk about data analysis during my interviews, so I would say, yes, it is useful.

Please feel free to comment below if you would like to take the program, are in the middle of the program, or have graduated. I’d be happy to answer any questions about this Udacity Data Analyst Nano-degree.

Have Flights Delays decreased Over Time?

Have Flights Delays decreased Over Time?

I think it is not just me who would think that, as technology advances, the flight carriers are able to reduce the arrival delay over time. Is that really the case? I looked into the historic flights data and found something surprising and interesting.

First I downloaded all the historic flights data from stat-computing.org.  These 22 years (1987 -2008) of data add up to be more than 10 GB. So I wrote a piece of R code which can read each csv file and extract the information I need for each carrier in each year.  I tried many different ways to aggregate the data, for example, using the yearly average, 75% quantile, 99% quantile, and the yearly maximum. Since there are so many flight records for each carrier each year, I only found the yearly maximum arrival delays had some clear trends over time.

Surprisingly, this exploratory data analysis suggested that the yearly maximum of arrival delays increased rather than decreased in these 22 years. This is somehow counter-intuitive to me because I thought the Information Technology has developed so much and should have helped to reduce the flights delay. Anyway, I used D3.js and created an interactive scatter-line plot to show these trends.  Below is a thumbnail of the plot, which is linked to the real html file where the plot is hosted. The legend is clickable so you can select which carriers’ data you would like to see or not see.

Time series plot

After making this plot and looking back into the data set, I realized it is reasonable that the yearly maximum arrival delays have increased in the past. The major reasons I can speculate includes:

  1. More and more people are taking airplanes to travel therefore there are a lot more flights to manage for each carrier.
  2. The number of longer distance flights increased and chances of longer arrival delay increased.
  3. The yearly maximum delay are probably caused by some extreme weather conditions or natural disasters, which seem to happen more frequently in recent years.

If you come up with other possible reasons, please leave it in the comments!