Data Hacking: Hacking Hacker News
Last year, Google’s Felipe Hoffa uploaded complete data about Hacker News’ posts to Google’s BigQuery data engine. BigQuery is a giant clustered SQL engine in the cloud that can query enormous amounts of data very quickly. Felipe did some great initial analysis of the dataset writing SQL by hand and graphing in Matlab.
Others on Hacker News picked up the thread and added their own analyses on Hacker News (meta, right?). I loved the idea of exploring this public dataset and wanted to surface some new insights, and also to make it accessible to anyone who wanted to play with the data, whether or not they knew SQL.
So I wrote up a quick model in LookML (Looker’s YAML-based modeling language) to describe how the BigQuery tables relate to each other. (It’s all of four files and fewer than 300 lines of code; you can view it here).
Below, I’ll walk through the process of building out the
Continue reading →