Week 6: Understanding reproducible research and choosing an open source project to contribute to

In this article, the first part is about my understanding of the reproducible research, and the second part is about evaluating an open source project before contributing to it.

Reproducible Research

In science, computing, and engineering, a black box is a device, system or object which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings.
(Source: Wikipedia)

These days, it is very common to encounter scientific papers that follow the black box model intentionally or unintentionally. They describe their methodology and present the results, but they rarely mention how they gathered the data or which specific versions of tools they have used. Sometimes, they even only provide pseudocode without mentioning the actual programming language. As a result, it becomes almost impossible to reproduce the research mentioned in the paper. This is not good because the science is known for being “self-correcting” and following the black box approach prevents this from happening. In science, other people need to be able to repeat someone else’s experiments and get the same results. If that is not happening, clearly, there is a problem. Therefore, understanding how to conduct research that is reproducible is crucial, especially whenever computers are involved in the research. Computers would give the same if they are configured correctly and in the same way, but if no one provides the exact configuration, it becomes difficult to get the same results. This week’s presentation by Vicky Steeves was also emphasizing this and many other aspects of reproducible research.

Vicky started by explaining the projects she has been working on. ReproZip is one of these projects that aim to automatically pack someone’s research along with all necessary data files, libraries, environment variables and options into a self-contained bundle. Even more, ReproZip can use that bundle to automatically set up the same original environment so anybody can reproduce the research on a different machine, without tracking down and installing the dependencies, or even having to run the same operating system! (Source: ReproZip description). I wasn’t aware of such a tool before this presentation, and I am glad that I learned about it because this helps researchers to mitigate the black box model and have fully transparent research. She also emphasized the importance of archiving and the challenges related to it. Tools like ReproZip makes it easier to archive research and source code. Still, there are unanswered questions like “What happens when the computer architecture changes?”, “What if the software stops being supported?”, and “What is the source code is not available at all?”. These are part of the challenges she is trying to tackle. Her talk made me think about the aspects of archiving that I have never thought about, and it was surprising to realize these.

Project Evaluations

Contributing to an open source project is always exciting in the beginning but can quickly turn into a horror story if you don’t choose the correct project. So, project evolutions are important to determine if a project is right for you to contribute to. To be honest, project evaluations are not fun to complete, but they help you to understand different aspects of an open source project and the community. Last week, I completed two project evaluations: one for SugarLabs and one for The Lounge.

SugarLabs

The desktop version of Sugar and the image shows the initial welcome screen for the users
(Source: SugarLabs)

SugarLabs is an award-winning learning platform for children that promotes collaborative learning. The code can be run in various ways, and the code creates the user interface and interactive games. The platform is mainly targeted for children. It seemed like an interesting project in the beginning, but as I spent more and more time, I realized that it is a boring project. I think the project is serving a good cause, but it is not a relevant project in the age of iPads and high-speed 5G internet connection. The Sugar platform requires users to install the Sugar desktop environment. So, users need to either dedicate a separate computer in their house or be tech-savvy enough to run Sugar in a virtual machine or perform a dual boot. There is a very steep learning curve for an average user to even install the project. They tried creating an online version of the Sugar desktop environment but didn’t seem successful so far. In my opinion, this project is dying, as the contributor graph shows below. I am definitely not interested in contributing to SugarLabs.

The number of SugarLabs contributors over the years
(Source: SugarLabs GitHub repository)

The Lounge

Modern and easy to use interface of The Lounge IRC client (Source: The Lounge)

The Lounge is an IRC client that I encountered while trying to find a way to receive IRC messages even if my laptop is turned off. All of the other clients I found require a GUI based operating system to work, and this requires me to leave my laptop turned on all the time. However, The Lounge runs as a server, and it doesn’t require any GUI based operating system. All you need to is opening a web browser in any device and just type the IP of the server, and that’s it! I have installed it to a Digital Ocean droplet, and I am just connecting to my droplet’s IP address to interact with the IRC client. I can do it from my laptop, mobile phone, or tablet. It is a very convenient and active project. I am interested in contributing to it. Currently, there are some missing features that I want to have in the project, and I can propose & implement these features.

Written before or on March 7, 2020