Resource: Arabic Cyber Text Corpus

Reference Arabic In-Domain Cyber Text Corpus
Date of Submission March 9, 2020, 4:56 p.m.
Status accepted
ISLRN 798-080-268-332-8
Resource Type Primary Text
Media Type Text
Source
Language Arabic
Format/MIME Type Text
Size 22,9 Mo
Description

This is an Arabic text corpus covering the specific field of cybercrimes.
Language : Arabic
Size : 22,9 Mo
Documents : 1273
Words : 2009110
The corpus was used to explore the Automatic Short Answer Grading (ASAG) field for Arabic Language.
The corpus was automatically obtained from texts extracted from a collection of URLs according to a list of key terms.
Key terms are combined and queried to a search engine, which returns a list of potentially relevant URLs.
The URLs are then inspected and validated.
Relevant web pages are retrieved, automatically cleaned of HTML tags. The text is extracted and added to the corpus.

Version 1
Creator Ouahrani Leila - Blida 1 University , Djamal Bennouar - Bouira University
Distributor Ouahrani Leila - Blida 1 University , Djamal Bennouar - Bouira University
Rights Holder Ouahrani Leila - Blida 1 University , Djamal Bennouar - Bouira University