Social media networks make up a large percentage of the content available on the Internet and most of
the time users spend online today is in interacting with them. All of the seemingly small pieces of
information added by billions of people result in a enormous rapidly changing dataset. Searching,
correlating, and understanding billions of individual posts is a significant technical problem; even the
data from a single site such as Twitter can be difficult to manage. In this paper, we present Coalmine a
social network data-mining system. We describe the overall architecture of Coalmine including the
capture, storage and search components. We also describe our experience with pulling 150-350 GB of
Twitter data per day through their REST API. Specifically, we discuss our experience with the
evolution of the Twitter data APIs from 2011 to 2012 and present strategies for maximizing the amount
of data collected. Finally, we describe our experiences looking for evidence of botnet command and
control channels and examining patterns of SPAM in the Twitter dataset.© (2012) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Citation
Joshua S. White ; Jeanna N. Matthews and John L. Stacy
"Coalmine: an experience in building a system for social media analytics", Proc. SPIE 8408, Cyber Sensing 2012, 84080A (May 1, 2012); doi:10.1117/12.918933; http://dx.doi.org/10.1117/12.918933