WORLDCOMP'07 Typing Instructions for Preparation of Final Camera-ready Papers

The Koran Database

 

Mahmoud Elsayess

Read~Verse Company

16182 Keats Circle, 92683

Westminster, California, USA

MahmoudElsayess@lessondesigner.com

Arnold Silverman

Read~Verse Company

16182 Keats Circle, 92683

Westminster, California, USA

ArnieSilverman@lessondesigner.com

 


Abstract

The Read~Verse website has unique and powerful search engines developed by Read~Verse specialists that permit quick and easy searching of the entire Koran (Quran) in Arabic script by 1 word, 2 words, topic, and sura & verse.  Also, users can search in English by topic and sura & verse.  Visitors can access over 17,000 Arabic word and their Arabic word roots.  Read~Verse Internet-based applications will display the Koran text in Arabic script along with 4 different English transitions by 4 different authors.

 Please visit:  http://www.readverse.com/

1. Credit:

Dr. Misbah Eldereiny, Islamic Scholar: Advising Read~Verse and deserving of recognition and our continuing appreciation, Dr. Eldereiny, the founder of the Straight Way School, 1912 West Merced Avenue,  West Covina, California, 91790, USA, invested over 10 years collecting, editing, and proof reading the entire Koran in its current digital format.  His work is an impressive life time achievement. Without his efforts, the system we have described herein (in development for over 15 years) would not have been possible

2. Introduction:

Read~Verse has several search engines that can help instructors and students find correct information about the Koran immediately in real time.  Instructors can assign a translation task to students to critique the translations of 4 different authors against the Arabic text.  There are up to 6236 unique, critiquing tests for students.  This unique teaching tool with its huge database was designed to provide Arabic instructors with a very large volume of materials for testing the competence of Arabic-learning students at any level.  Again, the huge database is based on what is considered the ultimate testing document in Arabic, the Koran.  Students who can proficiently translate a verse from the Koran to English can be said to have command of the Arabic Language.

3. Koran text in digital format:

The transformation of the Koran from printed media into a digital format created multiple challenges.  One of these challenges was to maintain the complete accuracy of the digital format of the Koran.  Dr. Eldereiny was faced with the fact that each word and sentence must be error- free. He personally typed, edited, and verified the vowel (Tashkeel) of every word of the Koran in its new digital format.  He then cleverly utilized the technique of inserting the same verse in a total number of rows equal to the number of words that verse has with Tashkeel.  Thus, if a verse states that “God is peace”, God is peace would be inserted and repeated in 3 rows.  An inquiry, then, under either God, is, or peace would yield the verse “God is peace”.   Below is an illustration of the basic structure of words in a table in one of the Koran databases.

Dr. Eldereiny installed a special Arabic font (http://scripts.sil.org/cms/scripts/page.php, Scheherazade) for typing classical Koran text.  He literally typed the verses of the Koran into digital format on his PC, and stored them using Microsoft’s Access Database.

4. Migration of Koran database to Internet Server:

In order to make the Koran accessible globally, the Read~Verse staff converted Access database into MYSQL database on an Internet server.  After several trials and errors, they selected “text data type” for all Arabic sentences.  However, selecting text data type will not permit efficient sorting.  The reason is that since a text column can accommodate up a maximum length of 65,535 characters, it would be extremely difficult to perform a reasonable sort using text data of that length.  Also, we discovered that the most suitable collation is utf8 (Unicode Transformation Format) which works very effectively with Arabic word contents.  We had to declare that the system will use CHARSET=cp1256 (Microsoft Code Page 1256) for storing the Arabic contents.  This is an example of a table structure:

CREATE TABLE `Koran_1` (

  `word_01` varchar(255) default NULL,   `recurrence_02` double default NULL,

  `recurrences_03` double default NULL,   `ayah_04` text,   `full_ayah_05` text,

  `ayah_number_06` double default NULL,   `sura_07` varchar(255) default NULL,

  `sura_name_08` varchar(255) default NULL,   `sura_number_09` double default NULL,

  `attachment_10` varchar(255) default NULL,   `word_without_attachment_11` varchar(255) default NULL,   `root_12` varchar(255) default NULL,

  `word_id_13` double NOT NULL default '0',   PRIMARY KEY (`word_id_13`),

  FULLTEXT KEY `root_12` (`root_12`),   FULLTEXT KEY `word_without_attachment_11` (`word_without_attachment_11`)

) ENGINE=MyISAM DEFAULT CHARSET=cp1256;

This is a row from a table that has the Koran data. Notice that the verse has     , the first word has   , the prefix has   and the suffix has 

Splitting a word into three parts: word    , prefix    , and suffix  have made matching between the searching request of a visitor against the suffix value of each row in the table possible to fetch all the matching rows.

5. Cursor movement:

For Latin (all the Romance languages , English, and other languages), the cursor moves from left to right with   text typing appending letters at the end of the right side of the typed text.  However, Arabic text needs to move from right to left with letters needing to be appended at the left side of the typed text.

Our staff, therefore, had to develop software to accommodate the movement of the Arabic text from right to left.  Additionally, the software had to recognize the correct shape of a letter based on its position in the word (some Arabic letters can take up to 4 different shapes based on its position in the word). For an example, please visit http://www.readverse.com/ali/en/search_engines/100_quran_word_full.html

(You may need to copy this URL into the address of the browser)

6. Sorting resume

MYSQL performs sort function based on the internal hexadecimal value of each column. The sorting routine compares the internal contents of two columns, two rows at a time. Since MYSQL starts the comparison from left to right, that made the sorting of Arabic words unpredictable. 

  

Please notice that the MYSQL sorted result is logically incorrect.  The reason is that MYSQL started sorting the Arabic words commencing from the left to right as Latin. The solution for this problem was to store each word that needs to be sorted in two columns.  One column has the displayable format and the other has the hexadecimal format value of that word. This solution required us to design and build a new piece of software that can display Arabic words in the correct format and produce the hexadecimal value of what is written without reversing the order of the hexadecimal values.

http://www.readverse.com//Arabic_Abacus_hex/Displaying_js_Arabic_cp1256/3000_cp1256_00_word_portal.html

(To view this URL, you may need to paste this link into your browser)

Now, as is evident in this example, sorting the hexadecimal values of these words produced the correct sorting.

7. Arabic Abacus

Read~Verse also solves a language issue that is prevalent in the Arabic world.  Since there are 16 different Arabic character sets, the matching between what is typed and what the Koran database has may not be synchronized.  Thus the search engines may not deliver the expected result.  In order to make sure that the contents of the Koran database text and what the visitor typed are  synchronized, our Read~Verse staff developed an Arabic abacus that visitors can use to type Arabic words without the need to install any additional software on their machines.  What a visitor types will be in the same format at what MYSQL database has.

Notice in the abacus below that all letters are typed without any vowels (Tashkeel).  Thus, a Saudi businessman and his counter part in Emirates can search the Koran for a specific word and the delivered result to both of them will be identical.

http://www.readverse.com/ali/en/search_engines/100_quran_word_full.html?x=22&y=13

(To view this URL, you may need to paste this link into your browser)

8. Topic search

Searching the Koran to find a specific word can take a few minutes for experienced scholars; it can take a very long time for Muslims in general.  Our staff designed and built a powerful topic search engine that anyone can quickly learn and use to find a specific topic in the Koran i.e  peace, justice, and good deeds.

http://www.readverse.com/topics_of_quran_tables/category/en_select_category.php

(To view this URL, you may need to paste this link into your browser)