Responsible Data Science

Science is claimed to be undergoing a “credibility crisis” at the moment, with various projects and surveys showing that many scientific experiments are not reproducible. At the same time, the rise of data science means that very large-scale datasets are being collected and analysed by machines; could these same machines be used to improve reproducibility in data science?

The credibility crisis is not only due to a lack of reproducibility in science. We have seen several recent controversies about use of large-scale data, e.g. from Facebook, without consent. Such ethical concerns may make it hard to accept the findings of research studies, or indeed affect the possibility of such research being conducted in the future.

A further impediment to credibility is not only whether research results are reproducible scientifically, but whether they are sustainable; whether the tools, data and methods that might enable one to reproduce a finding are available, especially after a research project has ended.

We believe that it is timely to consider all of these issues given the growing interest in data science and large-scale data analysis. Data are being collected so fast that we might not be able to consider all of the issues that concern the collection of the data, and at the same time tools are being developed faster than the research culture that uses these tools can adapt. This SICSA short theme explores all these facets, by encapsulating them into a single theme of reproducible data science.

This theme will hold three workshops, on the topics of ethical and legal data science, reproducible data science, and sustainable data science. Details about these events will be made available on the SICSA events calendar.

The Research Theme Leaders for Responsible Data Science are Professor Ian Gent (Ian.Gent@st-andrews.ac.uk), Dr Tristan Henderson (tnhh@st-andrews.ac.uk), and Dr Alexander Konovalov (alexander.konovalov@st-andrews.ac.uk).