Abstract of “Auto Detection of Offensive Language in Social media like Facebook and Twitter etc ” final year project
Spamming and Cyberbullying are becoming common these days with the increase in the use of social media like Facebook and Twitter etc.
Keeping this in view, there is a need for automated identification and analysis models that are helpful in offense detection on social media like Facebook and Twitter, etc.
The purpose of our Auto Detection of Offensive Language in Social media project is to Identify and Categorize Offensive languages on social media like Facebook and Twitter etc. The task is divided into two subtasks:
Sub task A is to detect offense and subtask B is to categorize whether the offense is targeted or
not. For this purpose, we used a dataset of the tweets released in 2019 by SemEval named OLID.
As a baseline, we performed four machine learning classifiers including
- Random Forest
- Naïve Bayes.
- SVM,
- Logistic Regression
In advancement of this machine
learning models, we have performed deep learning models including state of the art BERT
model, newly introduced Elmo Embedding with SVM and Logistic Regression, Convolutional Neural Network(CNN), LSTM, and BilSTM with word2Vec and Glove
Embedding. Results showed that the BERT model and Elmo Embeddings with SVM performed
well as compared to other models giving an F1-score of 0.84 each in subtask A whereas Elmo
with Logistic Regression performed the best giving an F1 score of 0.921.