Memorias de investigación
Ponencias en congresos:
Unraveling Certainty in Bio-Scholarly Statements
Año:2018

Áreas de investigación
  • Bioinformática

Datos
Descripción
The volume of scholarly articles published every year has doubled in the last two decades, in the biomedical domain rising from ~ 300,000 articles in 1996 to more than 800,000 in 2016. As a consequence, researchers cannot read all articles, even in their own domain, there is the imperative to maintain (the appearance of) "comparative productivity?, there is reviewer fatigue, and thus, increasingly, scientists? personal biases are being reflected in scholarly publications (Fanelli 2010) (Sarewitz 2016), and going undetected and uncorrected. Scholars acknowledge one another through citations, yet this raises a similar set of problems - citations have been shown to ?drift? in their intensity or their meaning compared to the assertion in the referenced material (De Waard and Maat 2012), likely reflecting the biases of the writer. This phenomenon may be amplified within citation chains, sometimes resulting in near-factual statements which, at the origin, were far more speculative; all of this happening in the absence of any additional evidence. Finally, the concepts within this volume of literature are increasingly being captured via text-mining, where there is no capacity to quality-control for these kinds of phenomena, thus masking the problem further. Here, we apply questionnaires to measure the ability of researchers to discern various levels of certainty being expressed in the scientific literature, determine their level of agreement, formally define categories of certainty, and then create automated classifiers for scholarly statements. Three Web-based questionnaires were e-mailed to researchers spanning both medical and plant/agricultural biotechnology research, asking them to evaluate scholarly assertions for certainty. Agreement between participants was assessed by Weighted Kappa Cicchetti (Cicchetti, Lord, Koenig, Klin, & Volkmar, 2008). Completed surveys were returned by 270 researchers (Q1-75, Q2-150, Q3-45). Classifiers were created through Machine Learning approaches, using a Neural Network algorithm to automatically assign certainty levels to new statements. Our results showed an >80% of accuracy in the most basal assignment, binary classification problem. Such algorithms can now be used in-tandem with text-mining tools in order to capture the degree of certainty being expressed in original text-mined information. We discuss also how these tools can be used to detect the certainty ?drift? problem, as well as pinpoint ?certainty inflection points? along a citation chain, which may be associated with data or a dataset that can then be explicitly associated with the increase in certainty of a scholarly assertion.
Internacional
No
Nombre congreso
XIV SYMPOSIUM ON BIOINFORMATICS
Tipo de participación
970
Lugar del congreso
Granada
Revisores
Si
ISBN o ISSN
0000-0000
DOI
Fecha inicio congreso
14/11/2018
Fecha fin congreso
16/11/2018
Desde la página
112
Hasta la página
112
Título de las actas
0000-0000

Esta actividad pertenece a memorias de investigación

Participantes

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Departamento: Biotecnología - Biología Vegetal