Arabic Cyber Text Corpus

Full Official Name: Arabic In-Domain Cyber Text Corpus
Submission date: March 9, 2020, 4:56 p.m.

This is an Arabic text corpus covering the specific field of cybercrimes. Language : Arabic Size : 22,9 Mo Documents : 1273 Words : 2009110 The corpus was used to explore the Automatic Short Answer Grading (ASAG) field for Arabic Language. The corpus was automatically obtained from texts extracted from a collection of URLs according to a list of key terms. Key terms are combined and queried to a search engine, which returns a list of potentially relevant URLs. The URLs are then inspected and validated. Relevant web pages are retrieved, automatically cleaned of HTML tags. The text is extracted and added to the corpus.

Creator(s)
Distributor(s)
Right Holder(s)