Week 4: Guest Speakers

We had the opportunity to hear from Professor Deena Engel and librarian Vicky Steeves about their experience in working with data.

Both speakers provided us with a brief overview of the various data tools they work with within their projects. I appreciated that they kept in spirit with the course and presented mostly open-source tools. I found that they were very eager and passionate about the topic: I overheard Ms. Steeves rejoice to Professor Engel at how great it was that there were so many students in the open-source class.

What I Appreciated

Although most of the tools were familiar to me, I also learned of some interesting ones, such as OpenRefine and a couple of places to look for opensource datasets. I was glad that both guests were keen on providing nuances to their arguments and were champions of good practices. For example, in their discussion of OpenRefine they were happy to share that this was a tool capable of cleaning datasets, but they were quick to add that it was geared towards people with less programmatic experience. They were not necessarily promoting this tool for us to adopt, but were making us aware of the existence of certain tools that we could point others towards to make something like data science a bit more accessible, which I appreciated. I also seriously enjoyed their emphasis on pushing us away from Excel/Sheet in our data workflow, citing examples such as gene study errors due to excel use. Overall, I was very grateful that they genuinely seemed to want to introduce us to good practices and useful tools

Stuff Completely New to Me

I also learned that XML comes in multiple flavors, making the workflow a bit difficult sometimes. Moreover, Professor Engel explained that different fields of studies tend to prefer to different flavors, making standardization almost impossible. It led me to skim through this article which opened my eyes to the fact that markup languages are an important consideration for a certain subgroup in humanities.

Ms. Steeves also introduced us to NYC OpenData Week, which is not something I was aware of (I also didn’t know .nyc was a TLD)! As a result I got a chance to look at some of the upcoming events and came across one called “Squirrels, Parks & the City: A (Very Serious) Data-Gathering Expedition.” Interestingly, this past semester, while I was working on a project involving the NYC Taxi Data for a class back in Abu Dhabi, I actually came across the Squirrel data-set and had quite a time telling everyone around me about (the absurdity of) it. Maybe, it’s destiny.

I enjoyed the guest speakers and was grateful for their enthusiasm, although it was a shame that we did not get to spend much time on the questions we had prepared for them.

Written before or on February 23, 2020