Change search
ReferencesLink to record
Permanent link

Direct link
Arabic Language Analysis Toolkit.
KTH, School of Computer Science and Communication (CSC).
2011 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In this project report I investigate the possibility of creating a language analysis toolkit for Arabic by integrating existing tools and resources. The investigation shows that there are actually few tools and resources for public use and fit for this purpose. Integrating the "best" tools was not straightforward, a lot of problems exist in the area and complicated the integration process. These were mainly due to the lack of different input/output formats of every tool, granularity of tag sets and tokenisation. Despite the complications faced, I present an integrated solution consisting of a part-of-speech tagger and a morphological analyser. The toolkit was trained on classical Arabic and tested on a sample text of modern standard Arabic. The results of the experiments are not that impressive, however the report outlines the difficulties of integrating tools today and in the end the project achieved its main objective.

Abstract [sv]

I detta examensarbete undersöker jag möjligheten att skapa en språkanalysverktygslåda för arabiska genom att integrera existerande verktyg och resurser som är baserade på öppen källkod. De verktyg och resurser som använts i undersökningen hade alla olika format av indata/utdata, detaljnivå av taggar och olika tokeniseringsmetoder. I undersökningen beskrivs att integrationen av de "bästa" verktygen inte var rättfram pga dessa olikheter mellan resurserna. Trots dessa svårigheter, presenterar jag en språkanalysverktygslåda bestående av en part-of-speech taggare (PoS) och morfologisk analysator (MA). PoS:n tränades på klassisk arabiska och användes för taggningsexperiment på texter tillhörande kategorin modern standardarabiska. Resultaten av experimenten är inte imponerande men projektet anses ha uppnått sitt främsta mål ändå.

Place, publisher, year, edition, pages
Trita-CSC-E, ISSN 1653-5715 ; 2011:133
National Category
Computer Science
URN: urn:nbn:se:kth:diva-130767OAI: diva2:654214
Educational program
Master of Science in Engineering - Computer Science and Technology
Available from: 2013-10-07 Created: 2013-10-07

Open Access in DiVA

No full text

Other links
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 39 hits
ReferencesLink to record
Permanent link

Direct link