Arabic Text Genre Classification

Alaa M. EL-Halees


Text genre is a type of written text. Arabic text genre classification predicts genre of specific text document written in Arabic independent of its topic. In this paper, an approach was proposed that takes an Arabic document and classify it into one of four genres which are advertisements, news, subjective and scientific documents. Since the frequency of words approach produces a low performance when used in the genre, an attempted was made to generate attributes based on the style of the text. This approach evaluated using corpus collected for this purpose. Using four machine learning methods, our approach compared with the word frequency approach, and it found that our approach is better than this mainstream approach. It, also, found that predicting subjectivity and scientific genre is more accurate than predicting advertisements and news.


Text genre, text genre classification, Arabic language processing, text mining, machine learning methods.

