Statistical Errors in Software Engineering Experiments: A Preliminary Literature Review (CORE A*)

Background: Statistical concepts and techniques are often applied incorrectly, even in mature disciplines such as medicine or psychology. Surprisingly, there are very few works that study statistical problems in software engineering (SE). Aim: Assess the existence of statistical errors in SE experiments. Method: Compile the most common statistical errors in experimental disciplines. Survey experiments published in ICSE to assess whether errors occur in high quality SE publications. Results: The same errors identified in others disciplines were found in ICSE experiments, exhibiting rather large prevalences, over 30% of the reviewed papers in several types of errors such as: a) Missing statistical hypotheses, b) missing sample size calculation, c) failure to assess statistical tests assumptions, and d) uncorrected multiple testing. When experiments restrict to the validation section of a larger research paper, the prevalence of errors increases. The origin of the errors can be traced back to: a) Researchers? inadequate statistical training, and, b) abundance of exploratory research. Conclusions: This paper provides preliminary evidence that SE research suffers the same statistical problems than other experimental disciplines. However, SE community does not seem to be aware of the existence of shortcomings in their experiments, whereas other disciplines work hard to avoid them. Further research is necessary to find the underlying causes and set corrective measures, but at the outset some actions could be effective: a) Improve the statistical training of SE researchers, and b) enforce quality assessment and reporting guidelines in SE publications.
