The question suddenly came up to me after getting 10 to-do-assignments for 5 subjects I take this semester. The classes just started last week, but now I’m already facing a lot of work and all due next week! I’m a bit worried of my survival this semester, but wonder if other students feel the same? Are they happy with this kind of situation? Or do they think negatively on this? Are they angry with the professors? I am so curious and determined to analyze their feeling through most recent tweets from Twitter.
But since I don’t have any prior experience in Python programming, I tried to expose to this new environment by following the Python documentation (Chapter 3-7) and NLTK book (Chapter 1-3). Following documentation without particular goals is kind of tedious, so I just decided to jump to the Twitter data directly. Before I start, my classmate Sam suggested me to use iPython Notebook, a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document. There are several ways to install, I chose to install iPython and dependencies manually from command line. When I ran the iPython, it crashed! I hardly tried to find the problem, but couldn’t find any. I reinstalled the iPython using Anaconda, but it crashed again! After some observation, I came to know that this was apparently due to socket issue. One magical command line that saved my life:
# ipython notebook --ip=127.0.0.1
I then started by collecting user’s timeline from Twitter. I used Python wrapper library to interface with Twitter API called python-twitter. Now Twitter requires key and access tokens in order to use their API, previously it wasn’t required. I simply created application on Twitter developer site and got the access tokens to put on my Python script. I successfully grabbed my own timeline and tweets from other friends as well. It looked like this: (sorry my tweets are mostly written in Bahasa)
Finally, I defined sets of emotional dictionary to categorize the words from tweets. Here the wording analysis came from NLTK library. I am now able to classify which tweets include happiness or sadness, and which don’t. The analysis results are plotted into simple visualization using matplotlib, a Python plotting library. I wish I could do D3 to visualize beautifully, but for now let me just use this simple one. The visualization of my tweets look like this:
My sister said that she tweets when she is sad, and the script proves it! Here is the result of my sister’s tweets:
I put hundreds of english words to the dictionary, so I doubt it will analyze non-english tweets accurately like my Bahasa tweets. I tried to analyze tweets that are actively written in English. Here is what Danco’s (my professor) tweets look like:
Apparently the students on my lab don’t actively tweet on Twitter, and even most of them don’t have twitter account, so I am not able to analyze their tweets. In my home country, Twitter is very popular. Jakarta is the most active Twitter city in the world, and my hometown Bandung is in the 6th position. It is mentioned here in Forbes article.
This activity reminds me of three key points worth mentioning:
- In this modern IT world, there are millions of useful libraries that we can use for almost anything! We don’t need to make from scratch anymore.
- In an IT research project, there are several methods to achieve our goals. Try to find the approach that will save our time and resources.
- Google has a lot of resources to help, don’t hesitate to look when we are stuck in the middle.
Please find the script I made in this project here.