Observatorio de I+D+i UPM

Memorias de investigación
Ponencias en congresos:
Unraveling Certainty in Bio-Scholarly Statements
Áreas de investigación
  • Bioinformática
The volume of scholarly articles published every year has doubled in the last two decades, in the biomedical domain rising from ~ 300,000 articles in 1996 to more than 800,000 in 2016. As a consequence, researchers cannot read all articles, even in their own domain, there is the imperative to maintain (the appearance of) "comparative productivity?, there is reviewer fatigue, and thus, increasingly, scientists? personal biases are being reflected in scholarly publications (Fanelli 2010) (Sarewitz 2016), and going undetected and uncorrected. Scholars acknowledge one another through citations, yet this raises a similar set of problems - citations have been shown to ?drift? in their intensity or their meaning compared to the assertion in the referenced material (De Waard and Maat 2012), likely reflecting the biases of the writer. This phenomenon may be amplified within citation chains, sometimes resulting in near-factual statements which, at the origin, were far more speculative; all of this happening in the absence of any additional evidence. Finally, the concepts within this volume of literature are increasingly being captured via text-mining, where there is no capacity to quality-control for these kinds of phenomena, thus masking the problem further. Here, we apply questionnaires to measure the ability of researchers to discern various levels of certainty being expressed in the scientific literature, determine their level of agreement, formally define categories of certainty, and then create automated classifiers for scholarly statements. Three Web-based questionnaires were e-mailed to researchers spanning both medical and plant/agricultural biotechnology research, asking them to evaluate scholarly assertions for certainty. Agreement between participants was assessed by Weighted Kappa Cicchetti (Cicchetti, Lord, Koenig, Klin, & Volkmar, 2008). Completed surveys were returned by 270 researchers (Q1-75, Q2-150, Q3-45). Classifiers were created through Machine Learning approaches, using a Neural Network algorithm to automatically assign certainty levels to new statements. Our results showed an >80% of accuracy in the most basal assignment, binary classification problem. Such algorithms can now be used in-tandem with text-mining tools in order to capture the degree of certainty being expressed in original text-mined information. We discuss also how these tools can be used to detect the certainty ?drift? problem, as well as pinpoint ?certainty inflection points? along a citation chain, which may be associated with data or a dataset that can then be explicitly associated with the increase in certainty of a scholarly assertion.
Nombre congreso
Tipo de participación
Lugar del congreso
Fecha inicio congreso
Fecha fin congreso
Desde la página
Hasta la página
Título de las actas
Esta actividad pertenece a memorias de investigación
  • Autor: Mario Prieto Godoy (UPM)
  • Autor: Mark Denis Wilkinson (UPM)
Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Departamento: Biotecnología - Biología Vegetal
S2i 2022 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)