A review of the available Arabic dialects datasets for Sentiment Analysis
Keywords:
Sentiment Analysis, Arabic Language, Arabic Dialects, Arabic Dialect Datasets, Public Arabic DatasetsAbstract
Abstract
In recent decades, the resources available for Arabic natural language processing have undergone a significant increase and development. This includes the exploration of Arabic Language Sentiment Analysis from Arabic utterances in both Modern Standard Arabic (MSA) and different Arabic dialects (DA). With the prevalence of internet usage among Arab people, communication in dialect languages has become common, and as such poses a challenge in analyzing sentiments due to the different dialects used across Arab countries. MSA is notable for its publicly available rich corpus of written resources such as news articles, books, and academic papers, whereas Arabic dialects lack such publicly available resources. Consequently, researchers have focused their investigations on DA rather than MSA since the majority of Arabic exchanges on social media are generated in local dialects. The objective of this study is to examine recent research endeavors that have made their datasets publicly available and to determine the frequently utilized resources and domains in the realm of sentiment analysis for Arabic dialects. The findings reveal that Twitter is the most commonly employed source for researchers to obtain their datasets, while politics, sports, and movies are the most frequently utilized domains for these datasets.