Change search
ReferencesLink to record
Permanent link

Direct link
Convolutional Neural Networks for Named Entity Recognition in Images of Documents
KTH, School of Computer Science and Communication (CSC). -.
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This work researches named entity recognition (NER) with respect to images of documents with a domain-specific layout, by means of Convolutional Neural Networks (CNNs). Examples of such documents are receipts, invoices, forms and scientific papers, the latter of which are used in this work. An NER task is first performed statically, where a static number of entity classes is extracted per document. Networks based on the deep VGG-16 network are used for this task. Here, experimental evaluation shows that framing the task as a classification task, where the network classifies each bounding box coordinate separately, leads to the best network performance. Also, a multi-headed architecture is introduced, where the network has an independent fully-connected classification head per entity. VGG-16 achieves better performance with the multi-headed architecture than with its default, single-headed architecture. Additionally, it is shown that transfer learning does not improve performance of these networks. Analysis suggests that the networks trained for the static NER task learn to recognise document templates, rather than the entities themselves, and therefore do not generalize well to new, unseen templates.

For a dynamic NER task, where the type and number of entity classes vary per document, experimental evaluation shows that, on large entities in the document, the Faster R-CNN object detection framework achieves comparable performance to the networks trained on the static task. Analysis suggests that Faster R-CNN generalizes better to new templates than the networks trained for the static task, as Faster R-CNN is trained on local features rather than the full document template. Finally, analysis shows that Faster R-CNN performs poorly on small entities in the image and suggestions are made to improve its performance.

Place, publisher, year, edition, pages
2016. , 43 p.
Keyword [en]
Convolutional Neural Networks, Faster R-CNN, Named Entity Recognition, Images, Documents
National Category
Engineering and Technology
URN: urn:nbn:se:kth:diva-191213OAI: diva2:955487
External cooperation
Dooer AB
Educational program
Master of Science - Software Engineering of Distributed Systems
Available from: 2016-08-26 Created: 2016-08-25 Last updated: 2016-08-26Bibliographically approved

Open Access in DiVA

fulltext(6286 kB)6 downloads
File information
File name FULLTEXT01.pdfFile size 6286 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 6 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 19 hits
ReferencesLink to record
Permanent link

Direct link