A Topic Analysis Approach To Revealing Themes In The Australian Twittersphere

Brenda Moon


This paper investigates techniques to identify the topics being discussed in one week of tweets from the Australian Twittersphere. Tweets were extracted from a comprehensive dataset which captures all tweets by 2.8m Australian: the Tracking Infrastructure for Social Media Analysis (TrISMA) (Bruns, Burgess & Banks et al., 2016). Bruns & Moe (2014) suggest that most Twitter research to date has focussed on “the macro layer of Twitter communication” (p. 23-24), partly because it is methodologically difficult to move beyond this. The TrISMA dataset enables the selection of a dataset based on a date range, rather than being limited to keywords or hashtags. As a result, the extracted one-week dataset of 5.5 million tweets is not focussed on a particular topic, and contains tweets from all three layers of Twitter communication defined by Bruns & Moe (2014), not just predominately from the macro level of hashtag conversations. This study seeks to identify the themes present in this dataset using Latent Dirichlet Allocation (LDA) (Blei, Ng, and Jordan, 2003).

The results of the topic analysis are triangulated with the themes found by the different types of analysis as part of a wider methodological study determining other metrics for the same week. The ability to identify the themes present in a dataset has many applications, including identifying changes in themes over time, extracting subsets of the corpus for further study, and understanding the diversity of themes present.


Twitter, social media, big data, topic analysis, Latent Dirichlet Allocation

Full Text: