15
Speakers
12
Hours of dev talks & workshops
160
Engineers attending

18th

November 2019

Functional Programming with Scala by Guillaume Bort - Criteo

A full day of workshops dedicated to understand the basics and the challenges of big data by the Criteo Data Reliability Engineering team. What is data engineering all about? How can we leverage Scala, Python and SQL to do data transformations? Which language is right in which circumstances? What about ad-hoc data exploration and the data engineering development cycle? How can we manage truly interactive reporting on top of TBs of data? And finally, what are the best practices for monitoring and organizing all of a company’s datasets?

These are core questions a budding data engineer should ask herself and once armed with responses she’ll be better prepared to not only be a more efficient data engineer, but also confront the FUD and general misinformation surrounding data engineering and big data systems in general.

Register now, the first 40 registrations are eligible to attend “Functional Programming with Scala”

18th

November 2019

Agenda

9:30 am
10:30 am
Scala Type System

Just ensure that we know the basic on the Scala Type System.

10:30 am
11:30 am
Lazyness

Non-Strict Evaluation is also a thing

11:30 am
12:30 am
Sequencing Computations

Demystify Functor, Monad, map, flatMap.

12:30 am
2:00 pm
Lunch break
2:00 pm
3:00 pm
Mastering implicits

Understanding the different usage of implicit.

3:00 pm
4:00 pm
Type classes

Adhoc Polymorphism in Functional Programming.

4:00 pm
5:00 pm
Discovering Cats

Learn what Cats can bring to your projects.

19th

November 2019

Not Another Big Data Conference

There are lots of conferences these days and almost all of them promise new and magical insights that will surely revolutionize the way you work. This is not one of them. NABD’s main conference is first and foremost about engineers solving problems and sharing those resolutions with others–and we encourage our speakers to share the bruises they’ve accumulated along the way, because even the best of us have seen some pretty spectacular failures.

So, come join us for a day of sharing, learning, fun and yes, perhaps some group therapy!

Steven Leroux

Principal Engineer

Quentin Adam

CEO

Fallon Chen

Senior Software Engineer

Lynn Root

Staff Engineer

Justin Coffey

Director of Engineering

Guillaume Bort

Senior Staff Software Engineer

Gerben Stavenga

Software Engineer

Julien Tournay

Data Engineer

Dmitri Pavlichin

Idrees Khan

Senior Data Infrastructure Engineer

Joao Carreira

Ph.D. student

Lucie Bailly

Data Engineer Team Lead

Yoav Zimmerman

Senior Applied ML Engineer

Youen CHENE

CTO

Adrien Blind

DataOps Evangelist

19th

November 2019

Agenda

9:00 am
10:00 am
Registration and breakfast
10:00 pm
10:30 pm
Mixing Time Series and Machine Learning, at scale and for real Steven Leroux & Quentin Adam

A story telling of Warp10 use cases from OVHcloud and CleverCloud
At OVHcloud and CleverCloud we make extensive use of Time Series. From monitoring to machine learning, our usage has grown over the years, also now to billing and IoT.
We propose to demonstrate why we choose Warp10, how it can be your best friend and how it saves lifes!

10:45 am
11:15 pm
Audio Processing Infrastructure at Spotify Fallon Chen & Lynn Root

What we've been building to support audio processing research.

11:15 am
11:45 am
Coffee break
11:45 am
12:15 pm
Fixing the Big Data Development Cycle with SQL Guillaume Bort & Justin Coffey
12:15 pm
12:30 pm
Diarmuid Gill
12:30 pm
2:00 pm
Lunch break Théâtre de Paris
2:00 pm
2:30 pm
WS1: Gerben Stavenga
2:00 pm
2:30 pm
WS2: Scio 0.8 and beyond. How we make data-engineering easy at Spotify Julien Tournay

Two years ago, Spotify introduced Scio, an open-source Scala framework to develop data pipelines and deploy them on Google Dataflow. In this talk, we will discuss the evolution of Scio, and share the highlights of running Scio in production for two years. We will showcase several interesting data processing workflows ran at Spotify, what we learned from running them in production, and how we leveraged that knowledge to make Scio faster, and safer and easier to use.

2:45 pm
3:15 pm
WS1 : A compression toolbox for exploring and exploiting redundancies in tabular data Dmitri Pavlichin
2:45 pm
3:15 pm
WS2: Expectations vs Reality Idrees Khan

Often times what we expect from a dataset doesn't match what's actually there. If you don't know the accuracy of the data it's difficult to trust any metrics, insights, or models downstream. I work on a team at Spotify that aims to solve this problem, and in this talk we will cover the libraries, infrastructure, and organizational processes we've implemented to address this.

3:30 pm
4:00 pm
WS1: Cirrus: Serverless Machine Learning Joao Carreira

In this talk, we will present Cirrus, a new system in the RISELab (UC Berkeley) that aims to facilitate the development of ML workflows on serverless platforms. During the presentation, we will discuss the challenges of building large-scale systems on existing serverless platforms and propose ways to address those challenges.

Machine learning (ML) workflows are complex. The typical workflow consists of distinct stages of user interaction, such as preprocessing, training, and tuning, that are repeatedly executed by users but have heterogeneous computational requirements. Serverless computing is a compelling model to address the resource management problem, in general,
but there are numerous challenges to adopt it for existing ML frameworks due to significant restrictions on local resources.

In this talk, we will present the Cirrus system design and API and discuss the mechanisms it uses to efficiently preprocess data, train models, and tune model parameters at scale. At the end, we will propose a new serverless architecture that better supports data-intensive workloads.

3:30 pm
4:00 pm
WS2: How to evaluate your Data Platform's maturity? Lucie Bailly

What's the state-of-the-art Data Platform? What services my data platform provides or should provide to my users? Where should I focus the effort?
We will go together through 6 topics: Platform, Operations, Discovery, Monitoring, Lineage and Business value, and define criteria to evaluate the platform on a scale of 1 (you can do better) to 5 (it rocks!) This method has been applied to Criteo use cases and challenges, hundreds of individual contributors and 200TBs of new data coming every day. We use it to evaluate our services and build our roadmap for the next years.

4:00 pm
4:30 pm
Coffee break
4:30 pm
5:00 pm
WS1: Stop doing iterative model development Yoav Zimmerman

ML engineers can spend many cycles on iterative model development: manual, ad-hoc experimentation to improve a model's performance over an established baseline. We've seen them labor for weeks, months, and sometimes years to improve model performance—in many cases, this is an engineer's entire job. In this talk, we introduce the paradigm of search-driven model development by developing search spaces instead of developing models. In practice, we have applied this paradigm to reproduce the results from months of iterative work in 24 hours.

4:30 pm
5:00 pm
WS2: Mixing Time Series and Machine Learning, at scale and for real Youen Chene & Adrien Blind
5:15 pm
5:45 pm
WS1:
5:15 pm
5:45 pm
WS2:
6:00 pm
8:00 pm
Cocktail
Organized by

Event Location

32 Rue Blanche
75009 Paris

vulputate, quis, suscipit fringilla risus. felis