Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
What is the Required Level of Data Cleaning?: A Research Evaluation Case
KTH, School of Industrial Engineering and Management (ITM), Industrial Economics and Management (Dept.), Sustainability and Industrial Dynamics.ORCID iD: 0000-0003-1292-8239
2016 (English)In: JOURNAL OF SCIENTOMETRIC RESEARCH, ISSN 2320-0057, Vol. 5, no 1, p. 7-12Article in journal (Refereed) Published
Abstract [en]

Bibliometric methods depend heavily on the quality of data, and cleaning and disambiguating data are very time-consuming. Therefore, quite some effort is devoted to the development of better and faster tools for disambiguating of the data (e.g., Gurney et al. 2012). Parallel to this, one may ask to what extent data cleaning is needed, given the intended use of the data. To what extent is there a trade-off between the type of questions asked and the level of cleaning and disambiguating required? When evaluating individuals, a very high level of data cleaning is required, but for other types of research questions, one may accept certain levels of error, as long as these errors do not correlate with the variables under study. In this paper, we present an earlier case study with a rather crude way of data handling as it was expected that the unavoidable error would even out. In this paper, we do a sophisticated data cleaning and disambiguation of the same dataset, and then do the same analysis as before. We compare the results and discuss conclusions about required data cleaning.

Place, publisher, year, edition, pages
PHCOG NET , 2016. Vol. 5, no 1, p. 7-12
Keywords [en]
Coupling data sets, Data cleaning disambiguation, Data error
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-215865DOI: 10.5530/jscires.5.1.3ISI: 000411805400003OAI: oai:DiVA.org:kth-215865DiVA, id: diva2:1149450
Note

QC 20171016

Available from: 2017-10-16 Created: 2017-10-16 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Sandström, Ulf

Search in DiVA

By author/editor
Sandström, Ulf
By organisation
Sustainability and Industrial Dynamics
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 43 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf