Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluation of webscraping tools for creating an embedded webwrapper
KTH, School of Computer Science and Communication (CSC).
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Utvärdering av verktyg ämnade för webscraping vid skapandet av en webwrapper (Swedish)
Abstract [en]

This report aims to evaluate three different tools for web data extraction in Java for the company Top of Europe. The tools used in the evaluation was jArvest, Jaunt and Selenium WebDriver. Through a case implementation which wrapped parts of a specific web application, web document data was to be automatically identified, extracted and structured. By using the results of the case implementation, the tools was contrasted and evaluated. The results discovered jArvest as non-functioning while the other alternatives provided similar performance but also offering somewhat different strengths. Jaunt provides a good interface to the HTTP protocol and has bigger possibilities for wrapping DOM elements while Selenium Web-Driver supports JavaScript, AJAX and some graphical interface aspects.

Place, publisher, year, edition, pages
2016. , 44 p.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-186099OAI: oai:DiVA.org:kth-186099DiVA: diva2:925430
Subject / course
Computer Science
Educational program
Master of Science in Engineering - Industrial Engineering and Management
Presentation
2015-10-01, Fantum, Lindstedtsvägen 24, Stockholm, 15:15 (English)
Supervisors
Examiners
Available from: 2016-05-11 Created: 2016-05-02 Last updated: 2016-05-11Bibliographically approved

Open Access in DiVA

fulltext(1367 kB)242 downloads
File information
File name FULLTEXT01.pdfFile size 1367 kBChecksum SHA-512
9c207256a138ff24fee02caf8a9e1a358903476b3db8ab027fb3c2be5a1c9c8bef4d3e620e3e9c1c5cb4aa484274d3568f616bd4777968c7cbba26316465d32b
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 242 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 411 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf