What countries use Instagram as online store the most?

Instagram is an online photo sharing, video sharing and social networking service that has grown from a small startup company until the Facebook sensational acquisition for approximately $1 billion in 2012. Most of the users like to use Instagram to share photos, after using the filtering feature on their faces to look more beautiful. Some of them just love to showcase their traveling journey and adventure. But do you know some of the users use Instagram daily as their online stores? This is unintended use of Instagram service, but it is common in Indonesia. After playing around with Instagram API for a while, I came to know that this is also happening in other countries.

I started my experiment by writing Python script that includes the API. This time I don’t use any Python wrapper, tried to dig the JSON structured response from Instagram. I tried to search for “onlineshop” related tags in Instagram. It returned out 32,314,170 posts by the time I run the script (Feb 7, 2015). Beyond my expectation, there are plenty of onlineshop posts in Instagram! I found some of the tags that correspond to onlineshop as below:

  • onlineshop (17,538,001 posts)
  • onlineshopping (3,545,428 posts)
  • onlineshopindo (1,865,753 posts)
  • onlineshopindonesia (1,106,945 posts)
  • onlineshopmalaysia (516,325 posts)
  • onlineshopphilippines (321,518 posts)
  • onlineshopbali (69,795 posts)

To answer my research question, I wanted to dig into each post’s location, but I found location coordinate and place name are not included in most of the posts. Apparently, Instagram released the geotagging feature quite recently. When I looked at my Instagram account (I joined Instagram in 2011), most of my photos do not include geotag as well. So I decided to categorize  location by analyzing the text after “onlineshop” and count the total posts. Here is the result after the grouping:

  • Unknown: 21,735,045 posts related to onlineshop
  • 6,839,205 onlineshop posts in Indonesia
  • 2,828,913 onlineshop posts in Philippines
  • 835,354 onlineshop posts in Malaysia
  • 75,653 onlineshop posts in Hongkong

As expected, Indonesia dominates onlineshop posts in Instagram world. More than 21% of 32,314,170 onlineshop posts in Instagram are generated from Indonesia. Philippines and Malaysia come in the 2nd and 3rd position respectively. However, there are more than 20 million posts that I don’t know, which come from general tag “onlineshop” and “onlineshopping”. I was intrigued to dig the posts’ caption to know more, and wrote another Python script to crawl the data. I was overly ambitious to crawl 21 million posts, but finally terminated the script after running for 30 minutes, crawling on 60,209 posts. I am not familiar with any data mining technique, that I suspect make my crawling script runs super slowly. Moreover, the API request limitation also becomes obstacle. At least, by looking at the crawled posts, I am able to find other onlinepost shops from other countries. But still, Indonesia dominates the crawled posts.

This is an onlineshop post (raw) from Thailand:

🎉ฉลอง!! 17k follwer ต้อนรับเดือนแห่งความรัก😘😘😘 พร้อมส่งทุกใบ สั่งปุ้บ โอนปั้บ ส่งทันทีจ้า  กระเป๋า 890฿ ฟรี EMS  ย้ำ!! ทุกใบ 🎁🎁 ตลอดเดือนกุมภาพันธ์ 2558 line : peanutbn Tel : 0844667304 ✅รับประกันราคาถูกจริง 💯% ✅รับประกันงาน มีตำหนิเปลี่ยนคืนได้💯% ✅จัดส่งตามรูปที่ลงจริง 💯% ✅งานเกรดดี ไม่มั่วเกรดจร้า 💯% ✅ส่งจริง รีวิวเพียบ ไม่โกง💯% 👜กระเป๋ามาพร้อมอุปกรณ์ที่ครบเซ็ตทั้ง การ์ดการันตีงานและถุงผ้าทุกใบจร้าา 👜 #แคปเจ้อรูปภาพมาบอกแม่ค้าด้วยน้า# สนใจรีบติดต่อแม่ค้าเรยจร้าา  ราคาสบายกระเป๋าแต่คุณภาพสุดคุ้มราคาแบบนี้นานๆมาทีนะค้าา 🙏อย่ารอช้า โปรดีๆๆ มีที่นี่ที่เดียวนะค่ะ 🙏 งานคุณภาพนะค่ะ เกรดพรีเมี่ยม Topพรีเมี่ยม มิลเลอร์ hiend นำมาขายราคาส่งเรยจร้าไม่หลอกลวงมั่วเกรดแน่นอน ส่งของเป็นร้อยๆกล่องต่ออาทิต แม่ค้าไม่เอาชื่อมาเสียจร้า#lyn #chanel #prada #coach #tr15 #thailand #dior #lineid #onlineshop #adidas #ตามหา #ของถูก #กระเป๋า #bagbrandname #louisvuitton#longchamp#dior#zr1200#tr17#promotefree#casiothailand#newbalance#fitflop

And apparently this is from US:

Authentic Jansport BACK PACK for 1000 pesos only. ✖️✖️✖️✖️✖️✖️✖️✖️✖✖✖️✖️✖️✖️ ✈️Shipping is FREE nationwide 💻 Check our page on Facebook for more designs: Sole Goddess Online Shop 📱 Sms or vibe for orders and inquiries at 09261671908  HAPPY SHOPPING! ❤️ ✖️✖️✖️✖️✖️✖️✖️✖️✖️✖️✖️✖️✖️✖️✖️✖️ #jansportph #jansportcdo #jansportseller #directsupplier #authenticjansport #lookingfor #olshop #olshopph #bagsph #superbreak #onlineshop #jansportph #cdoonlineshop #cdobased #lookingforcdo #bagscdo #onlineshopcdo #freeshipping #cdobased #kathniel #kathrynbernardo #lizquen #authentic #jansportzebra #jansportgalaxy #janpsortcheetah #jansportduffelbag #duffelbag #duffelbagph #jansportoverexposed #overexposed #gymbag

From this, I am interested in knowing what kind of products are frequently sold in Instagram. It is clear that the best places to start with is the posts from Indonesia. However, posts from Indonesia are mostly written in Bahasa, which makes further analyses harder due to library limitation. I decided to collect posts from Bali, a tourism island in Indonesia. Hopefully I can find many English posts here. I put “onlineshopbali” tag to the script, and started to crawl posts from Bali. At the end of the crawl, there were 29,574 posts retrieved. The oldest posts are from January 2013. This number is different compared to tag searching in the beginning of this experiment (onlineshopbali 69,796 posts). Some possible explanations behind this data as my classmate Danny mentioned on his Cornell Confession’s Facebook project:

  • Some posts might be hidden or protected and require member’s permission to look.
  • Some posts might be deleted for specific reasons by the admin.

After crawling the posts (posts collected with username, timestamp and ID), I cleaned the data by following process:

  • Convert Unix timestamp retrieved from Instagram to readable date format
  • Remove hash symbol, unrelated links and emojis
  • Remove numbers
  • Convert to lowcase letters

Finally, the cleaned data are stored into text file for further analyses. Given the limited amount of time (as Danny also complained), I was not able to make product analyses and visualize the data. My further step on this project will be:

  1. Ignoring non-english words or make dictionary in Bahasa? (I couldn’t find any in the Internet. The only thing I found is NLP service in Bahasa http://nolimitid.com/ that is actually my friend’s startup in Indonesia)
  2. Still looking for the answer on my question: What kind of products that are frequently sold in Instagram?

Please find the scripts I made for this project here: https://github.com/girikuncoro/shopinstagram
You can actually see from nbviewer: http://nbviewer.ipython.org/github/girikuncoro/shopinstagram/blob/master/shopInstagram.ipynb



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s