18
Speakers
12
Hours of dev talks & workshops
160
Engineers attending

19th

November 2019

Not Another Big Data Conference

There are lots of conferences these days and almost all of them promise new and magical insights that will surely revolutionize the way you work. This is not one of them. NABD’s main conference is first and foremost about engineers solving problems and sharing those resolutions with others–and we encourage our speakers to share the bruises they’ve accumulated along the way, because even the best of us have seen some pretty spectacular failures.

So, come join us for a day of sharing, learning, fun and yes, perhaps some group therapy!

Steven Leroux

Principal Engineer

Quentin Adam

CEO

Fallon Chen

Senior Software Engineer

Lynn Root

Staff Engineer

Justin Coffey

Director of Engineering

Guillaume Bort

Senior Staff Software Engineer

Gerben Stavenga

Software Engineer

Julien Tournay

Data Engineer

Dmitri Pavlichin

Idrees Khan

Senior Data Infrastructure Engineer

Joao Carreira

Ph.D. student

Lucie Bailly

Data Engineer Team Lead

Youen CHENE

CTO

Adrien Blind

DataOps Evangelist

Gadi Miller

Senior Software Engineer

Kevin Jacquemin

Staff Software Engineer Lead

William Montaz

Staff SRE

Xavier Noelle

Staff Software Engineer

19th

November 2019

Agenda

9:00 am
10:00 am
Registration and breakfast
10:00 pm
10:30 pm
Mixing Time Series and Machine Learning, at scale and for real Steven Leroux & Quentin Adam - OVH & Clever Cloud

A story telling of Warp10 use cases from OVHcloud and CleverCloud
At OVHcloud and CleverCloud we make extensive use of Time Series. From monitoring to machine learning, our usage has grown over the years, also now to billing and IoT.
We propose to demonstrate why we choose Warp10, how it can be your best friend and how it saves lifes!

10:45 am
11:15 pm
Audio Processing Infrastructure at Spotify Fallon Chen & Lynn Root - Spotify

What we've been building to support audio processing research.

11:15 am
11:45 am
Coffee break
11:45 am
12:15 pm
Fixing the Big Data Development Cycle with SQL Guillaume Bort & Justin Coffey - Criteo
12:15 pm
12:30 pm
Nicolas Helleringer - Criteo
12:30 pm
2:00 pm
Lunch break Théâtre de Paris
2:00 pm
2:30 pm
WS1: Performance optimizations in protobuf Gerben Stavenga - Google

I discuss consequences of data dependencies and control flow on protos

2:00 pm
2:30 pm
WS2: Scio 0.8 and beyond. How we make data-engineering easy at Spotify Julien Tournay - Spotify

Two years ago, Spotify introduced Scio, an open-source Scala framework to develop data pipelines and deploy them on Google Dataflow. In this talk, we will discuss the evolution of Scio, and share the highlights of running Scio in production for two years. We will showcase several interesting data processing workflows ran at Spotify, what we learned from running them in production, and how we leveraged that knowledge to make Scio faster, and safer and easier to use.

2:45 pm
3:15 pm
WS1 : A compression toolbox for exploring and exploiting redundancies in tabular data Dmitri Pavlichin - Stanford University
2:45 pm
3:15 pm
WS2: Expectations vs Reality Idrees Khan - Spotify

Often times what we expect from a dataset doesn't match what's actually there. If you don't know the accuracy of the data it's difficult to trust any metrics, insights, or models downstream. I work on a team at Spotify that aims to solve this problem, and in this talk we will cover the libraries, infrastructure, and organizational processes we've implemented to address this.

3:30 pm
4:00 pm
WS1: Cirrus: Serverless Machine Learning Joao Carreira - Berkeley University

In this talk, we will present Cirrus, a new system in the RISELab (UC Berkeley) that aims to facilitate the development of ML workflows on serverless platforms. During the presentation, we will discuss the challenges of building large-scale systems on existing serverless platforms and propose ways to address those challenges.

Machine learning (ML) workflows are complex. The typical workflow consists of distinct stages of user interaction, such as preprocessing, training, and tuning, that are repeatedly executed by users but have heterogeneous computational requirements. Serverless computing is a compelling model to address the resource management problem, in general,
but there are numerous challenges to adopt it for existing ML frameworks due to significant restrictions on local resources.

In this talk, we will present the Cirrus system design and API and discuss the mechanisms it uses to efficiently preprocess data, train models, and tune model parameters at scale. At the end, we will propose a new serverless architecture that better supports data-intensive workloads.

3:30 pm
4:00 pm
WS2: How to evaluate your Data Platform's maturity? Lucie Bailly - Criteo

What's the state-of-the-art Data Platform? What services my data platform provides or should provide to my users? Where should I focus the effort?
We will go together through 6 topics: Platform, Operations, Discovery, Monitoring, Lineage and Business value, and define criteria to evaluate the platform on a scale of 1 (you can do better) to 5 (it rocks!) This method has been applied to Criteo use cases and challenges, hundreds of individual contributors and 200TBs of new data coming every day. We use it to evaluate our services and build our roadmap for the next years.

4:00 pm
4:30 pm
Coffee break
4:30 pm
5:00 pm
WS1: Discover & Embrace DataOps to accelerate your data journey Youen Chene & Adrien Blind - Saagie

By 2019, you may have seen the DataOps word already, and may think it's just an other buzzword. But it's much more than DevOps for Data! Let's discover the DataOps concept together and see how you can organize, automatize & secure your go to production for your data workload. You know, the data workloads stuck in your datalab, laptop & long go to production procedure.
We'll cover the basics, avoid some dataops washing and give you the first key to bring your workload in production faster (and safer by the way) without being a data hero!

5:15 pm
5:45 pm
WS1: Budget Pacing at Scale with Flink Gadi Miller & Kevin Jacquemin - Criteo
5:15 pm
5:45 pm
WS2: Everything you wanted to know about your Hadoop jobs, but had no tool to ask William Montaz & Xavier Noelle - Criteo
6:00 pm
8:00 pm
Cocktail

18th

November 2019

Functional Programming with Scala by Guillaume Bort - Criteo

A full day of workshops dedicated to understand the basics and the challenges of big data by the Criteo Data Reliability Engineering team. What is data engineering all about? How can we leverage Scala, Python and SQL to do data transformations? Which language is right in which circumstances? What about ad-hoc data exploration and the data engineering development cycle? How can we manage truly interactive reporting on top of TBs of data? And finally, what are the best practices for monitoring and organizing all of a company’s datasets?

These are core questions a budding data engineer should ask herself and once armed with responses she’ll be better prepared to not only be a more efficient data engineer, but also confront the FUD and general misinformation surrounding data engineering and big data systems in general.

Register now, the first 40 registrations are eligible to attend “Functional Programming with Scala”

18th

November 2019

Agenda

9:30 am
10:30 am
Scala Type System

Just ensure that we know the basic on the Scala Type System.

10:30 am
11:30 am
Lazyness

Non-Strict Evaluation is also a thing

11:30 am
12:30 am
Sequencing Computations

Demystify Functor, Monad, map, flatMap.

12:30 am
2:00 pm
Lunch break
2:00 pm
3:00 pm
Mastering implicits

Understanding the different usage of implicit.

3:00 pm
4:00 pm
Type classes

Adhoc Polymorphism in Functional Programming.

4:00 pm
5:00 pm
Discovering Cats

Learn what Cats can bring to your projects.

Organized by
Donation to Source Vive, association founded in 1989 to support families of children with cancer and leukemia

Event Location

32 Rue Blanche
75009 Paris