Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Arabic Language Analysis Toolkit.
KTH, School of Computer Science and Communication (CSC).
2011 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In this project report I investigate the possibility of creating a language analysis toolkit for Arabic by integrating existing tools and resources. The investigation shows that there are actually few tools and resources for public use and fit for this purpose. Integrating the "best" tools was not straightforward, a lot of problems exist in the area and complicated the integration process. These were mainly due to the lack of different input/output formats of every tool, granularity of tag sets and tokenisation. Despite the complications faced, I present an integrated solution consisting of a part-of-speech tagger and a morphological analyser. The toolkit was trained on classical Arabic and tested on a sample text of modern standard Arabic. The results of the experiments are not that impressive, however the report outlines the difficulties of integrating tools today and in the end the project achieved its main objective.

Abstract [sv]

I detta examensarbete undersöker jag möjligheten att skapa en språkanalysverktygslåda för arabiska genom att integrera existerande verktyg och resurser som är baserade på öppen källkod. De verktyg och resurser som använts i undersökningen hade alla olika format av indata/utdata, detaljnivå av taggar och olika tokeniseringsmetoder. I undersökningen beskrivs att integrationen av de "bästa" verktygen inte var rättfram pga dessa olikheter mellan resurserna. Trots dessa svårigheter, presenterar jag en språkanalysverktygslåda bestående av en part-of-speech taggare (PoS) och morfologisk analysator (MA). PoS:n tränades på klassisk arabiska och användes för taggningsexperiment på texter tillhörande kategorin modern standardarabiska. Resultaten av experimenten är inte imponerande men projektet anses ha uppnått sitt främsta mål ändå.

Place, publisher, year, edition, pages
2011.
Series
Trita-CSC-E, ISSN 1653-5715 ; 2011:133
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-130767OAI: oai:DiVA.org:kth-130767DiVA: diva2:654214
Educational program
Master of Science in Engineering - Computer Science and Technology
Uppsok
Technology
Supervisors
Examiners
Available from: 2013-10-07 Created: 2013-10-07

Open Access in DiVA

No full text

Other links

http://www.nada.kth.se/utbildning/grukth/exjobb/rapportlistor/2011/rapporter11/shouhani_rabiee_hajder_11133.pdf
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 90 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf