Week 4: Open Data and more

Besides code, what else is open-sourced?

Past Experience with Open sourced Data

In Wednesday’s lecture, Deena and Vicky gave an overview of open source data from different perspectives, and I find it very helpful. Since I have not done any research previous, I have limited knowledge of data and databases. However, I have worked at a small investment company, where I manage their website by connecting their database to the SEC.gov database. This is a huge database, where US companies file their financial reports. I know how powerful the files are, as they contain all the information about the market. And the database is open-sourced, so anyone with an interest in finance, or professional analyst have the exact same access to the data

While I manage the website, I did encounter some difficulties processing the data: one of the searches in the website was extremely slow. I went in to explore the issue, and the reason that it is taking a long time to process the data is because the search is not an exact match. the Query has to go through the entire column to find all the patterns. I fixed it by first formatting the columns. Then, I just perform an exact match, and things did speed up.

Data processing

Professor Deena introduced different forms of data to use first, most of which I have seen before. However, the tools introduced seemed very useful, and I would like to try them out sometimes. I find the different approaches of detecting errors in a data set interesting, including error detection and how to fix missing values. The visualization of values should be very helpful as well, since it makes it very obvious where mistakes occur, and shows the general trend of different patterns in the data.

Open Sources of Data

Besides SEC.gov, I do not know any other open sources of data, so the information provided by Vicky is very helpful. It enables me find find all kinds of data for the purpose of research in the future, which I did not know that they even existed before. It is important, which Vicky emphasized, to find the suitable data, in terms of the subjects, the location, the population and the longitude. Finding the correct data to use is the key to a success research. There are some things to pay attention to while using open data as well, including the license of the data sources and using the correct citations.

Written before or on February 24, 2020