Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Algoritm för automatiserad generering av metadata
KTH, School of Technology and Health (STH).
KTH, School of Technology and Health (STH), Medical Engineering, Computer and Electronic Engineering.
2015 (Swedish)Independent thesis Basic level (university diploma), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Sveriges Radio stores their data in large archives which makes it hard to retrieve specific information. The sheer size of the archives makes retrieving information about a specific event difficult and causes a big problem. To solve this problem a more consistent use of metadata is needed. This resulted in an investigation about metadata and keyword genera-tion.The appointed task was to automatically generate keywords from transcribed radio shows. This included an investigation of which systems and algorithms that can be used to generate keywords, based on previous works. An application was also developed which suggests keywords based on a text to a user. This application was tested and compared to other al-ready existing software, as well as different methods/techniques based on both linguistic and statistic algorithms. The resulting analysis displayed that the developed application generated many accurate keywords, but also a large amount of keywords in general. The comparison also showed that the recall for the developed algorithm got better results than the already existing software, which in turn produced a better precision in their keywords.

Abstract [sv]

Sveriges Radio sparar sin data i stora arkiv vilket gör det svårt att hitta specifik information. På grund av denna storlek blir uppgiften att hitta specifik information om händelser ett stort problem. För att lösa problemet krävs en mer konsekvent användning av metadata, därför har en undersökning om metadata och nyckelordsgenerering gjorts.Arbetet gick ut på att utveckla en algoritm som automatisk kan generera nyckelord från transkriberade radioprogram. Det ingick också i arbetet att göra en undersökning av tidigare arbeten för att se vilka system och algoritmer som kan användas för att generera nyckelord. Dessutom utvecklades en applikation som generar färdiga nyckelord som förslag till en användare. Denna applikation jämfördes och utvärderades med redan existerande program. Metoderna som använts bygger på både lingvistiska och statistiska algoritmer. En analys av resultaten gjordes och visade att den utvecklade applikationen genererade många precisa nyckelord, men även till antalet stora mängder nyckelord. Jämförelsen med ett redan existe-rande program visade att täckningen var bättre för den utvecklade applikationen, samtidigt som precisionen var bättre för det redan existerande programmet.

Place, publisher, year, edition, pages
2015. , 55 p.
Series
TRITA-STH, 023
Keyword [sv]
Metadata, nyckelord, textutvinning, naturliga språk, algoritmer.
National Category
Other Computer and Information Science
Identifiers
URN: urn:nbn:se:kth:diva-168915OAI: oai:DiVA.org:kth-168915DiVA: diva2:818785
External cooperation
Sveriges Radio
Subject / course
Computer Technology, Program- and System Development
Educational program
Bachelor of Science in Engineering - Computer Engineering
Supervisors
Examiners
Available from: 2015-06-10 Created: 2015-06-09 Last updated: 2015-06-10Bibliographically approved

Open Access in DiVA

fulltext(725 kB)94 downloads
File information
File name FULLTEXT01.pdfFile size 725 kBChecksum SHA-512
e595473571274affe6f40bb4f1ec68d95c78fed6b5f1b7259f24eb42ac2299e7fa350865af837cd544dd5ffaf4914758b68ac88129bb2c9595733ec16611d4c1
Type fulltextMimetype application/pdf

By organisation
School of Technology and Health (STH)Computer and Electronic Engineering
Other Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 94 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 202 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf