Movie Review Classification using Naive Bayes

This project was part of my IMT 574 course titled 'Data Science III: Machine Learning & Econometrics'. The aim of this project was to build a model which classifies a movie review as either 'Fresh' (good) or 'Rotten' (bad). First, we cleaned the dataset and converted the text into the bag-of-words format. We then calculated the necessary log probabilities to create a Naive Bayes classifier model from scratch. After this, we used K-fold cross validation to get a testing accuracy of about 63%.

The project files can be found here.