Youtube Data Analysis using Linear Regression and Neural Network

Data Visualization using Python

Supervisor: Prof. Mark Vogelsberger (Massachusetts Institute of Technology)

Jul 2021- Sep 2021

Publisher: IEEE


The popularity of YouTube provides an effective way to propagate epidemic prevention knowledge by analyzing the video preferences of viewers from different locations. However, it is challenging to analyze video preferences due to the dispersed geographical locations of the YouTube viewers and the indistinguishable video categories and subcategories. This paper combines linear regression and neural networks to unravel both geographical and categorical difficulties and improve the accuracy of task-solving models. First, the YouTube dataset and extract variables are preprocessed, including categories, subcategories, countries, number of subscribers, and view counts of each YouTubers. Then, linear regression and neural networks are trained to classify and find the correlation between these variables. Finally, Matplotlib, Google chart, and Tableau are utilized to visualize the result based on video categories and geographical locations. The accuracies of linear regression and neural network models are verified through the R-squared estimation. Both linear regression and neural network models show the trending types of videos and a positive correlation between the number of viewers and subscribers. The experimental results show a remarkable user’s tendency of watching films and listening to music, a concentration of YouTube users from India and the U.S., and propose targeted COVID-19 prevention propaganda based on the above two characteristics.