Reproducible Research Publications at AGILE 2018

Workshop at the 21st AGILE International Conference on Geographic Information Science, Lund, Sweden on Tuesday June 12 at 14:30 CEST.

The workshop introduces interested scientists and developers to the concepts of reproducible research and give hands-on guidance on how to increase the degree of reproducibility for their own work. Participants explore the practical principles of reproducible papers by reproducing a provided real-world scientific publication.

Workshop report

The workshop “Reproducible Research Publications @ AGILE 2018” is the second pre-conference workshop in a series of workshops raising awareness about reproducible research, computational reproducibility, and Open Science in the AGILE community. Eleven participants from six countries came together for a half day. To start off, Daniel from the organising team introduced the basics of reproducibility in science, going well beyond the planned time and providing a lot of reading material for the audience to go through when they’re home (check the slides below). The audience engaged quite well and showed a high interest and motivation for the topic, but with an eye on the clock discussions had to be deferred so the next item on the agenda could be started: Rusne presented an evaluation of the reproducibility of AGILE conference papers (accepted for publication, see pre-print). She gave an overview of the methodology and the applied criteria and described the findings, namely a low availability of data, software, and methods and subsequently a low reproducibility!

These results were a clear motivation to learn how to improve your own work’s reproducibility, so the participants were pointed to a public repository to attempt a reproduction themselves, more specifically the computations from the paper presented by Rusne. The attendees should follow the available instructions and were only aided by the four organising team members when really stuck. Depending on the previous knowledge of the used language (R) and tools (RStudio), which ranged from expert to complete novice, the progress made was mixed. The quickest participant managed to recreate the computations in about half an hour! She was an experienced R user and admittedly started with a well-kept R installation. Others took longer, but mostly because they had to install all software or needed some support understanding RStudio.

Several successful reproductions of our analysis using the data and code from https://t.co/YjNc9Q8NMH

But @314a was the fastest - congratulations! 🎉
— Daniel Nüst (@nordholmen) 12. Juni 2018

In the end, all participants were able to at least partially reproduce the paper, some merely waiting for a software installation to finish for the final steps. This was a great success, because it showed that a well-prepared research compendium can be reproduced by readers even without prior knowledge of a programming language or tool set under two hours - a reasonable time frame for deeply evaluating publications very relevant to one’s own research. Especially because the easiest “one click” approach, an online reproduction using Binder, was not even allowed to be used.

The biggest obstacle faced by the reproducing scientists was missing dependencies. This is an important lesson for the authors of the manuscript: if you want to enable non-experts to reproduce your analysis, do not assume that they know what is simple for you to do. Also, the original output of the analysis process was a PDF, because that produced figures which could be taken over to the article authoring software. However, installing the tooling for PDF creation was the largest challenge, so an HTML output would have been easier to reproduce for readers. The HTML output as a default would deviate from the original workflow, but at the benefit of easier reproduction.

As an extra challenge, the organisers suggested to apply the analysis to a new dataset. None of the participants could get that far, so Daniel as the original author turned to that task after the workshop, and here is the word cloud he came up with for all submissions to AGILE 2018:

What are all the accepted papers and posters about? Here is the answer i none graphic: #data #spatial #information #analysis #model #map

More at https://t.co/NSrOkwgZBY including a word stem-based #wordcloud #agileconf2018 #funwithwords #rstats pic.twitter.com/kxz9HaVVk9
— Daniel Nüst (@nordholmen) June 13 2018

The overall topics prevail: data, analysis and models are still at the top of used terms, but reproducibility seems to be low judging from the presentations in several conference sessions. Therefore the organisers are already set to continue the series and prepare a further workshop on reproducible geoscience publications for next year’s AGILE conference. If you’re interested in attending, let us know what you would like to learn or discuss (see contact emails below).

Besides your input, the final item on the agenda, a group discussion lead by Carlos and Frank, already provided good information on what to do at a next workshop. The discussion showed that the detailed manual reproduction is an important learning experience when starting with reproducible research. Understanding the challenges and pitfalls a reader might have is an important prerequisite and motivation for changing one’s own habits. But it also became clear that a room full of people with different operating systems, knowledge, and software versions will always find a problem unknowable to the author. There are limits to the level of detail a scholarly article can provide regarding technical instructions, so not “everyone” may be able to reproduce, but somebody must be able to - at least the author and even after a few years. Naturally the limitations (big data, privacy concerns) for reproducibility were brought up and noted that levels of reproduction exists instead of a binary black-and-white classification. It was also stated that the objectives of a reproduction (reproduce a chart, apply method to different data, conduct a review) differ and thus might require different reproduction workflows. The different workflows could cater for anonymity (blind peer review) or formal limitations (e.g. of a submission system). One participants made the interesting suggestion of documenting the workflow visually, in a figure or diagram, to help the reader. The existing practice of double-blind reviews was also seen as posing yet another challenge towards suitable documentation and easily reproducible workflows.

All of these numerous challenges and limitations lead the discussers towards the need for more education (e.g. courses or updated syllabi) and material (e.g. domain-specific guidelines) for authors, and also a broader discussion on the topic of reproducible research. However there was common agreement that there are no “simple” guidelines and “typical” research projects.

Post-workshop survey

A short participant survey has been conducted after the workshop by our team member Rusne.

An online form of the survey has been sent around to all registered participants via e-mail as soon as one week after the AGILE conference took place. However summer (and thus holiday) time made it challenging to attract participants’ attention. Still, four very honest and explicit responses have allowed us to gather some very valuable feedback.

It was great to see that the participants were generally satisfied with the workshop as all of them expressed interest to attend it again during the next year’s AGILE. All of the participants would like to stay informed about the further progress of the team’s effort to encourage reproducibility in GIScience. Moreover, most of the participants would even like to contribute to the effort through revision of materials (e.g. guidelines, teaching materials) or active participation in their preparation.

Hands-on experience on reproducing a published paper has been mentioned as the most useful part of the workshop, although it was not without flaws. The participants would have preferred to reproduce a paper in the programming language they were already familiar with. Due to the lack of experience with R, some of the participants found the example of the reproducible paper too complicated.

Another aspect of the workshop that the participants appreciated was the discussion about raising awareness and making reproducibility a default way of publishing research. One of the survey respondents has suggested that “in discussions, to encourage and motivate more researchers, one should probably not start from the idealistic point of view and criticise shortcomings, but start with the current state and recognise efforts undertaken towards reproducibility”.

Next year the survey participants would like to get more information on possible concepts of reproducibility (e.g. data vs. methods, open vs. non-open, repeatability vs. reproducibility, processing vs. interpretation) and be introduced with less complicated examples of reproducible papers. The idea from the discussion at the workshop to develop guidelines to achieve different levels of reproducibility has been restated. Finally, a review and discussion on possible tools and their pros and cons (open/free vs. proprietary/commercial, community-driven vs. company-driven, local/institutional vs. global providers, etc.) has been mentioned as a suggestion for the upcoming workshops.

The research team would like to thank Jeremy Azzopardi, Lars Harrie, Francisco Ramos and Stefan Wiemann for their time in providing us with very valuable feedback!

Agenda

Introduction to reproducible research (slides)
Reproducibility at AGILE today (slides)
Hands-on reproducible research
Reproducibility at AGILE tomorrow

Resources

We take notes at https://epad.ifgi.de/p/reproducible-agile-2018
The material for reproducing the paper is at https://github.com/nuest/reproducible-research-and-giscience

Original call for participation

Agenda

Introduction to reproducible research: relevant literature, challenges, and solutions (lecture) [15 min.]
Reproducibility at AGILE today (presentation) [15 min.]
Take a look at this recent pre-print if you want to learn more:
Hands-on reproducible research [120 mins. including breaks, two break-out sessions]
- Goal: Reproducing a paper with either R or Python code
  - Participants make first-hand experiences in trying to reproduce a prepared computational analysis from a real paper
  - Transform a “typical” publication into a reproducible document
  - Publish the analysis in a research data repository
- Skills: Tools for creating reproducible documents (literate programming wit R Markdown or Jupyter Notebook), scripted geospatial workflows for GIS
Reproducibility at AGILE tomorrow (discussion, presentation) & Feedback [30 min.]

Registration

Participants are required to on the AGILE conference website.

In addition participants must submit some information to the workshop repository on GitHub by creating a new issue until May 27 2018.

The issue must include

preferred hands-on session (R or Python),
a short description of experience in R and/or Python
a summary of computational work, if available with references to published papers, data or code, and
plans for future computer-based research.

Basic skills in the selected programming language are required to participate in the workshop (e.g. practical experiences as part of a research project) - please get in touch if you are unsure!

In case of a high number of publications, participants may be selected based on the submitted material. Participants must bring their own computers and be prepared to install software before and at the workshop.

The number and topics of hands-on sessions depend on room availability and participants’ interests. They will be announced on May 15 2018.

Code of Conduct

To ensure a welcoming environment for all, we require everyone participating in the workshop to conform to the Code of Conduct given below. This code applies to all spaces related to the workshop including, but not limited to, workshop venue, emails, shared documents, and code repositories. We strongly recommend that anyone running workshops or classes of any kind choose and publish a similar code so that everyone will know what is expected of them and what to do when those expectations are not met.

You can report Code of Conduct violations anonymously via this form or in person to Rusne Sileryte or Carlos Granell (who have access to the anonymous reports besides Daniel Nüst).

We are dedicated to providing a welcoming and supportive environment for all people, regardless of background or identity. However, we recognize that some groups in our community are subject to historical and ongoing discrimination, and may be vulnerable or disadvantaged. Membership in such a specific group can be on the basis of characteristics such as such as gender, sexual orientation, disability, physical appearance, body size, race, nationality, sex, color, ethnic or social origin, pregnancy, citizenship, familial status, veteran status, genetic information, religion or belief, political or any other opinion, membership of a national minority, property, birth, age, or choice of text editor. We do not tolerate harassment of participants on the basis of these categories, or for any other reason.

Harassment is any form of behavior intended to exclude, intimidate, or cause discomfort. Because we are a diverse community, we may have different ways of communicating and of understanding the intent behind actions. Therefore we have chosen to prohibit certain forms of behavior in our community, regardless of intent. Prohibited harassing behavior includes but is not limited to:

written or verbal comments which have the effect of excluding people on the basis of membership of a specific group listed above;
causing someone to fear for their safety, such as through stalking, following, or intimidation;
the display of sexual or violent images;
unwelcome sexual attention;
non-consensual or unwelcome physical contact;
sustained disruption of talks, events or communications;
incitement to violence, suicide, or self-harm;
continuing to initiate interaction (including photography or recording) with someone after being asked to stop; and
publication of private communication without consent.

Behavior not explicitly mentioned above may still constitute harassment. The list above should not be taken as exhaustive but rather as a guide to make it easier to enrich all of us and the communities in which we participate. All interactions should be professional regardless of location: harassment is prohibited whether it occurs on or offline, and the same standards apply to both.

Enforcement of the Code of Conduct will be respectful and not include any harassing behaviors.

Thank you for helping make this a welcoming, friendly community for all.

[This code of conduct is based on a CoC for “Hot to teach programming (and Other Things) by Greg Wilson, which in turn is based on that used by PyCon, which in turn is forked from a template written by the Ada Initiative and hosted on the Geek Feminism Wiki.]

Organizing Committee

Daniel Nüst* (ifgi), daniel.nuest@uni-muenster.de (main contact), @nuest
Frank Ostermann* (ITC), f.o.ostermann@utwente.nl, @foost
Markus Konkol (ifgi), m.konkol@uni-muenster.de, @MarkusKonk
Barbara Hofer (Z_GIS)
Carlos Granell* (Jaume I)
Valentina Cerutti (ITC)
Rusne Sileryte* (TU Delft)

* on-site teaching team member

Except where otherwise noted content is licensed under a Creative Commons Attribution 4.0 International License.

Privacy policy / Disclaimer This site (https://reproducible-agile.github.io) does not collect or store any personal data nor uses it cookies. Embedded Twitter widgets on this site do not collect information used for personalization. Although we check the content carefully, we cannot accept responsibility for the content of external links. The linked sites’ carriers are responsible for their sites’ content.