Inventor disambiguation is an increasingly important issue for users of patent data. We propose and test a number of refinements to the original Massacrator algorithm, originally proposed by Lissoni et al. (The keins database on academic inventors: methodology and contents, 2006) and now applied to APE-INV, a free access database funded by the European Science Foundation. Following Raffo and Lhuillery (Res Policy 38:1617–1627, 2009) we describe disambiguation as a three step process: cleaning&parsing, matching, and filtering. By means of sensitivity analysis, based on MonteCarlo simulations, we show how various filtering criteria can be manipulated in order to obtain optimal combinations of precision and recall (type I and type II errors). We also show how these different combinations generate different results for applications to studies on inventors’ productivity, mobility, and networking; and discuss quality issues related to linguistic issues. The filtering criteria based upon information on inventors’ addresses are sensitive to data quality, while those based upon information on co-inventorship networks are always effective. Details on data access and data quality improvement via feedback collection are also discussed.

How to kill inventors: testing the Massacrator© algorithm for inventor disambiguation

Pezzoni, Michele;Lissoni, Francesco;Tarasconi, Gianluca
2014

Abstract

Inventor disambiguation is an increasingly important issue for users of patent data. We propose and test a number of refinements to the original Massacrator algorithm, originally proposed by Lissoni et al. (The keins database on academic inventors: methodology and contents, 2006) and now applied to APE-INV, a free access database funded by the European Science Foundation. Following Raffo and Lhuillery (Res Policy 38:1617–1627, 2009) we describe disambiguation as a three step process: cleaning&parsing, matching, and filtering. By means of sensitivity analysis, based on MonteCarlo simulations, we show how various filtering criteria can be manipulated in order to obtain optimal combinations of precision and recall (type I and type II errors). We also show how these different combinations generate different results for applications to studies on inventors’ productivity, mobility, and networking; and discuss quality issues related to linguistic issues. The filtering criteria based upon information on inventors’ addresses are sensitive to data quality, while those based upon information on co-inventorship networks are always effective. Details on data access and data quality improvement via feedback collection are also discussed.
2014
2014
Pezzoni, Michele; Lissoni, Francesco; Tarasconi, Gianluca
File in questo prodotto:
File Dimensione Formato  
Pezzoni_Scientometrics.pdf

non disponibili

Descrizione: Articolo
Tipologia: Pdf editoriale (Publisher's layout)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.14 MB
Formato Adobe PDF
1.14 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/3976340
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 30
  • ???jsp.display-item.citation.isi??? 28
social impact