Web scraping

Finding the most cited presenters

The 3rd International Conference on Econometrics and Statistics (EcoSta 2019) took place at the National Chung Hsing University (NCHU), Taichung, Taiwan 25-27 June 2019. The conference consisted of 10 parallel sessions, each having 14-17 sessions with 3-5 speakers occurring at the same time. The full programme is available here. Naturally, it was quite the optimization problem to pick which sessions to attend. For parallel sessions where multiple sessions appeared interesting and relevant for my research, my final choice became rather arbitrary.

Hockey web scraping: Data aggregation

Continuing from my previous post, I now focus on detailed match statistics, rather than the available aggregate data. By scraping very detailed data from each match of the 2018/2019 Norwegian hockey season, my goal is to present aggregate data that are not available at the source webpage. The data material is scraped from Hockey live. The code I started by simply downloading the main HTML file manually from the web browser.

Web scraping with R: Visualizing hockey statistics

I wanted to visualize the personal statistics for the hockey players of Stavanger Oilers, for the 2018/2019 season. The data material is scraped from both Elite Prospects and Hockey live (regular season and playoffs), using the R-package rvest, as described in this blog post. The code Scraping the data from Elite Prospects was straightforward, as it is stored as an HTML table. When you want to scrape a table with rvest, you only need to specify an index integer.