Week 4
Reflections on Deena and Vicky’s Presentations
I thought that both their presentation were very informative and a good primer on things to consider when working with open-source data. I believe I was familiar with much of the tools Deena talked about. But there was still information I learned including spreadsheets not being a good format to store data in because of limitations on much much can be stored as well as difficulty in seeing the calculations behind each cell, especially with very large data sets. I liked her story of rows accidentally being left out of a dataset being used to study effects on spending on the economy. It shows the importance of picking the right tools to store and manipulate data, especially when your study will have great impact on others, including being used as fact by policymakers.
I also found Vicky’s presenation very helpful, especially the part about good places to find open-source datasets as well was the licensing when dealing with these datasets. I knew about that there were different licenses for code repositories and GitHub lets you choose one when creating a new repo and even explains the permissions of each type, but I was not familiar with licenses for open-source data. If I work with open-source datasets in the future I’ll definitely be sure to pay attention to the licenses. She even gave helpful tips on what to do if there’s no license, namely contacting the owner to secure permissions