DLRL Hadoop cluster

20 node hadoop cluster with Cloudera Hadoop 5.6.0

1. Hadoop Service

2. Tweet Collections and Services

ProjectCollection nameTotal # of tweetStarted atCollection toolAnalysis service
IDEAL Archive DB 1,423,695,089 2012 yTK1) Analysis using Hadoop
IDEAL Collect DB 3,230,499 Daily yTK1) N/A
IDEAL 1% sampling 72,462,020 2015 DMI-TCAT2) Analysis
IDEAL User following 9,500,316 2015 DMI-TCAT2) Analysis
IDEAL Keyword tracking 20,518,664 2015 DMI-TCAT2) Analysis
GETAR Collection 100,296,496 2015 yTK1) Analysis using Hadoop
GETAR Collection 121,581,828 2016.9 SFM3) Analysis
NIH Keyword tracking 550,578 2015 DMI-TCAT2) Analysis
Total 1,751,835,490

Open source tools for collecting tweets

  1. yourTwapperKeeper (yTK)
  3. Social Feed Manager (SFM)

3. Web Collections

ProjectCollection nameHosted byServiceLocation
IDEAL IA webpage collection Internet Archive Archive-it IA link
IDEAL IA webpage collection
Virginia Tech
Hadoop /data/IACollections in head node