Change search
ReferencesLink to record
Permanent link

Direct link
What is the Required Level of Data Cleaning? A Research Evaluation Case
KTH, School of Industrial Engineering and Management (ITM), Industrial Economics and Management (Dept.), Sustainability and Industrial Dynamics.ORCID iD: 0000-0003-1292-8239
Vrije Universitet Amsterdam.
2016 (English)In: Journal of Scientometric Research, ISSN 2321-6654, Vol. 5, no 1, 7-12 p.Article in journal (Refereed) Published
Abstract [en]

Bibliometric methods depend heavily on the quality of data, and cleaning and disambiguating data are very time-consuming. Therefore, quite some effort is devoted to the development of better and faster tools for disambiguating of the data (e.g., Gurney et al. 2012). Parallel to this, one may ask to what extent data cleaning is needed, given the intended use of the data. To what extent is there a trade-off between the type of questions asked and the level of cleaning and disambiguating required? When evaluating individuals, a very high level of data cleaning is required, but for other types of research questions, one may accept certain levels of error, as long as these errors do not correlate with the variables under study. In this paper, we present an earlier case study with a rather crude way of data handling as it was expected that the unavoidable error would even out. In this paper, we do a sophisticated data cleaning and disambiguation of the same dataset, and then do the same analysis as before. We compare the results and discuss conclusions about required data cleaning What is the Required Level of Data Cleaning? A Research Evaluation Case.

Place, publisher, year, edition, pages
Wolters Kluwer Health and Medknow Publications , 2016. Vol. 5, no 1, 7-12 p.
Keyword [en]
Coupling data sets, Data cleaning disambiguation, Data error
National Category
Other Social Sciences not elsewhere specified
Research subject
Industrial Engineering and Management
URN: urn:nbn:se:kth:diva-191463DOI: 10.5530/jscires.5.1.3OAI: diva2:956590
Riksbankens Jubileumsfond

QC 20160907

Available from: 2016-08-30 Created: 2016-08-30 Last updated: 2016-09-07Bibliographically approved

Open Access in DiVA

fulltext(723 kB)6 downloads
File information
File name FULLTEXT01.pdfFile size 723 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full textJournal webpage

Search in DiVA

By author/editor
Sandström, Ulf
By organisation
Sustainability and Industrial Dynamics
Other Social Sciences not elsewhere specified

Search outside of DiVA

GoogleGoogle Scholar
Total: 6 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 20 hits
ReferencesLink to record
Permanent link

Direct link