21st

June 2017

First we had a great talk from Josh Baer of Spotify to open the conference reviewing the how and the why of getting Spotify up and running on GCP.

Nicolas Belmonte from Uber then wowed the crowd with some ridiculously beautiful in-browser visualizations built off of deck.gl

While I don’t want to toot Criteo’s horn too much, we did close out the morning session with François Jehl, Pawel Szostek and Neil Thombre’s work on HLL which shows huge promise for distinct counts on OLAP workloads in Vertica.

Lunchtime on our rooftop

In the afternoon we had something of a data-production track with inspiring stuff on the data developer’s work cycle at Spotify from Rafal Wojdyla and then a little bit of data workflow development history (and future!) from Guillaume Bort and myself in which we introduced our new open source scheduler Cuttle.  We closed the track with the final talk on the subject from Marc Bux of Humboldt University in Berlin with his approach on scheduling scientific workflows in YARN.

Rounding out the talks you have  BigGraphite (Graphite on Cassandra) from Corentin Chary, to how we build our billion node, billion edge user graph from Bruno Roggeri to the discussion of the best named project ever, DataDisco (Criteo’s hdfs data schema/discovery framework) from Francois Visconte and Mathieu Chataigner.

Presentations (Videos)

Last year we had lots of requests to put presentations online, and I am very happy to say that not only have we done so, but we took the extra step of filming all of the talks as well.  You can relive this year’s experience via the videos below:

NABDConf Intro

Moving to the Cloud: A Story from the Trenches – Josh Baer, Spotify

Visualizing Data with deck.gl – Nicolas Garcia Belmonte

HLL performance characteristics in large-scale aggregations over structured data – François Jehl, Pawel Szostek, Neil Thombre, Criteo

Building a billion node / billion edge graph – Bruno Roggeri, Criteo

BigGraphite – Graphite meets Cassandra to Scale Monitoring at Criteo – Corentin Chary, Criteo

Data pipeline at Spotify – from the inception to the production – Rafal Wojdyla, Spotify

One schema to rule them all and kill your data legacy – Francois Visconte, Mathieu Chataigner, Criteo

Time-series workflow scheduling with Scala in Langoustine – Guillaume Bort, Justin Coffey, Criteo

Hi-WAY: Execution of Scientific Workflows on Hadoop YARN – Marc Bux, Humboldt University of Berlin

Photos from event

Curious to see how things went down at this year’s event? Follow this link 

A special thank you to our speakers and sponsors (Vertica and Criteo) and of course to our wonderful attendees.  Without all of you this conference wouldn’t exist!

See you in 2018 for yet Not Another Big Data Conference.

Justin Coffey