Descripción
|
|
---|---|
The volume of scholarly articles published every year has doubled in the last two decades, in the biomedical domain rising from ~ 300,000 articles in 1996 to more than 800,000 in 2016. As a consequence, researchers cannot read all articles, even in their own domain, there is the imperative to maintain (the appearance of) "comparative productivity?, there is reviewer fatigue, and thus, increasingly, scientists? personal biases are being reflected in scholarly publications (Fanelli 2010) (Sarewitz 2016), and going undetected and uncorrected. Scholars acknowledge one another through citations, yet this raises a similar set of problems - citations have been shown to ?drift? in their intensity or their meaning compared to the assertion in the referenced material (De Waard and Maat 2012), likely reflecting the biases of the writer. This phenomenon may be amplified within citation chains, sometimes resulting in near-factual statements which, at the origin, were far more speculative; all of this happening in the absence of any additional evidence. Finally, the concepts within this volume of literature are increasingly being captured via text-mining, where there is no capacity to quality-control for these kinds of phenomena, thus masking the problem further. Here, we apply questionnaires to measure the ability of researchers to discern various levels of certainty being expressed in the scientific literature, determine their level of agreement, formally define categories of certainty, and then create automated classifiers for scholarly statements. Three Web-based questionnaires were e-mailed to researchers spanning both medical and plant/agricultural biotechnology research, asking them to evaluate scholarly assertions for certainty. Agreement between participants was assessed by Weighted Kappa Cicchetti (Cicchetti, Lord, Koenig, Klin, & Volkmar, 2008). Completed surveys were returned by 270 researchers (Q1-75, Q2-150, Q3-45). Classifiers were created through Machine Learning approaches, using a Neural Network algorithm to automatically assign certainty levels to new statements. Our results showed an >80% of accuracy in the most basal assignment, binary classification problem. Such algorithms can now be used in-tandem with text-mining tools in order to capture the degree of certainty being expressed in original text-mined information. We discuss also how these tools can be used to detect the certainty ?drift? problem, as well as pinpoint ?certainty inflection points? along a citation chain, which may be associated with data or a dataset that can then be explicitly associated with the increase in certainty of a scholarly assertion. | |
Internacional
|
No |
Nombre congreso
|
XIV SYMPOSIUM ON BIOINFORMATICS |
Tipo de participación
|
970 |
Lugar del congreso
|
Granada |
Revisores
|
Si |
ISBN o ISSN
|
0000-0000 |
DOI
|
|
Fecha inicio congreso
|
14/11/2018 |
Fecha fin congreso
|
16/11/2018 |
Desde la página
|
112 |
Hasta la página
|
112 |
Título de las actas
|
0000-0000 |