Week 6 -- Evaluating Open Source Projects
This week we were given an opportunity to attend another Vicky Steeves talk. This time Vicky talked about the use of open source in the reproducibility of research. She also touched on the importance of preserving data, especially the data derived from research. What was really interesting to me was that she asked the class to raise their hand if they had their data backed up externally, and only Professor Joanna raised her hand. This was surprising because if Github or another cloud storage were to go down today, almost every one except those who are extremely careful with their data would lose a lot of their files. That means all of the researched data we gathered would be lost. That’s actually a big problem we face today as much of the data that was gathered in the past is lost and virtually unretrievable due to bad practices. Thus, Vicky stressed the importance of having a good open sourced platform to store data and methods of research.
Another important aspect of Open Source that Vicky talked about were licenses and reproducibility. Not all open source licenses were created equal, and so Vicky gave us a good website to use to find an appropriate license for a project. This is useful because you essentially fill out a questionnaire and it spits out an appropriate license based on the context. The second important aspect of Open Source that Vicky talked about was reproducibility. She mentions one of the most important contributions to Open Source projects is documentation. Documentation is very aseful as it provides future contributors an opportunity to get on track and stay on track when reading code. This leads to great reproducibility because without much documentation, people who wish to reproduce results in the future would have difficulty following suit. In fact, Vicky Steeves mentioned that only 30% of psychology research are going to yield the same results, and the number is even lower in genetics. So clearly this is also an ongoing problem today.
In addition to the talk, we were also given the opportunity to evaluate two open source projects. One was assigned to us, and the other was one of our choice. I was assigned OpenFoodFacts for my first evaulation, which was a open source database of food. It included lots of information about calories, ingredients, and etc. It’s a huge project with over a million lines of code. The second project I evaluated was Tuxemon, which is a spinoff of Pokemon that was coded in Python using Pygame. This project isn’t nearly as big, but is still an active project among a few developers. I might be interested in contributing to this project since Python is my best language, and I’m interested in video games. Otherwise, I might go with something like FreeCodeCamp or Jupyter.