Twitter Topic Modeling using LDA

The aim of this project was to gain a better understanding of the Data Science discussions taking place on Twitter. I first scraped Twitter data relevant to the search term ‘Data Science’ for a period of a day. Performing basic EDA, I analyzed the traffic produced by this search term over a day. Cleaning the text data, I analyzed it further to reveal frequent terms used in discussing 'Data Science' and generated a wordcloud. Furthermore, I used NLP techniques to reveal top 10 words used in the discussions by count. Finally, I used the LDA model to extract the most naturally discussed topics about the field.

The project files can be found here.