12
International
speakers
8
Hours of Dev talks
& workshops
90
Engineers
attending

Not Another Big Data Conference

Not only do we have our usual top notch speakers talking about their first hand experiences across the subjects of real world deep learning, data and data systems engineering and building scalable engineering culture, but all proceeds are going to the outstanding charity, TechFugees. This one day event will bring new perspective across these three critical areas of modern day software engineering all the while helping refugees gain access to the knowledge economy.

Our Speakers

Hakan Jakobsson

Senior Staff Dev Lead

Piyush Narang

Staff Software Engineer

David Chaiken

Chief Architect

Doug Loyer

Director Engineering

Julien le Dem

Principal Engineer

Mohsin Hussain

VP Engineering

Yoav Zimmerman

ML Engineer

Amr Awdallah

Founder, CTO

Justin Coffey

Director of Engineering

Gerben Stavenga

Software Engineer

Ted Dunning

Chief Application Architect

Ran Lei

Software Engineer

Jie Li

Research Scientist

Agenda

9:00 AM
10:00 AM
Breakfast - 3rd floor
9:50 AM
10:00 AM
Opening
10:00 AM
10:30 AM
The Sixth Wave of Automation: Automation of Decisions Amr Awdallah- Cloudera

We are witnessing a new revolution in data, the age of automation of decisions. In this presentation, Cloudera cofounder and CTO Amr Awadallah will explain the historic importance of this wave, the common patterns with which it manifests itself in organizations today, then conclude by talking about the foundational capabilities required to enable it.

10:30 AM
11:00 AM
5 Tips for Improving Metrics Quality David Chaiken- Pinterest

This talk describes the process of improving the quality of business metrics reporting at Pinterest. This process consisted of specifying core metrics, understanding the end-to-end architecture, executing a cross-functional improvement program, and creating a novel reporting tool. The talk extracts five tips for successfully improving metrics quality from this process: know your stakeholders; define core metrics; prioritize quality; fund test implementation, and measure progress. The talk focuses on the innovation that led to a new kind of metrics quality measurement report, which PInterest has been using to track our progress throughout the year.

11:00 AM
11:30 AM
Technical overview of challenges and trade offs in the design and use of protobuffers within Google Gerben Stavenga- Google

Technical overview of challenges and trade offs in the design and use of protobuffers within Google: This talk will discuss code size, and CPU efficiency and how different languages and platforms lead to different designs. I touch upon using Arena's new upcoming features that we work on to release and compare with competitors like Cap'n Proto, flatbuffers and thrift.

11:30 AM
12:00 PM
Deep Learning: From Theory to Practice Yoav Zimmerman- Determined AI

Despite enormous excitement about the potential of deep learning, building practical applications powered by deep learning remains an enormous challenge: the necessary expertise is scarce, the hardware requirements can be prohibitive, and current software tools are immature and limited in scope. In this talk, we will first describe how deep learning workflows are supported by existing software tooling. We will then describe several promising opportunities to drastically improve these workflows via novel algorithmic and software solutions, including reproducible workflow management and efficient utilization of deep learning cluster resources. This talk draws on our experiences at Determined AI, a startup that builds software to make deep learning engineers dramatically more productive.

12:00 PM
1:00 PM
Lunch - 3rd floor
1:00 PM
1:30 PM
From Flat Files to Deconstructed Database: The Evolution and Future of the Big Data Ecosystem Julien le Dem- WeWork
1:30 PM
2:00 PM
Fixing the Big Data Development Cycle with SQL Justin Coffey- Criteo

We all know how hard Big Data stacks can be to build, use and maintain. Gartner estimates that 85% of big data projects are killed before production release. In this talk engineering leaders from Criteo's Data Reliability Engineering team will show how wide spread use of SQL addressed the two biggest issues in data engineering: systems efficiency and developer productivity.

2:00 PM
2:30 PM
Deep Semantic Analysis without Deep Learning Ted Dunning- MapR

Deep learning on text and for recommendations has had some amazing successes in building very usable semantic models of words or behavior.

It is a little known fact that many of these results can also be achieved with vastly simpler techniques based on simply finding words or actions that appear together. Recently developed algorithms allow large-scale cooccurrence analysis of this sort to be updated accurately and safely in hard realtime. In contrast, this is particularly difficult with deep models. Models generated from cooccurrence analysis also retain sparsity so they can often be deployed using very standard software such as text search engines like ElasticSearch or Solr.

This talk will be very approachable and will not require any advanced mathematics and will be interesting to a wide audience but it won't be dumbed down, either. I will show example of applications for these algorithms as well as walk through the key algorithms at a high level as well as describing some open source implementations.

2:30 PM
2:45 PM
Engineering Culture at Criteo Mohsin Hussain- Criteo

This talk delves into few of important questions:
1. Why culture matters?
2. What makes a good engineering culture and
3. How does Criteo evolve it’s engineering culture. We will deep-dive into an in-depth example and review other cultural elements that have worked for well Criteo engineering

2:45 PM
3:15 PM
Work Break
3:15 PM
3:45 PM
Polar Opposites in ML Engineering Towards Autonomous Driving Jie Li- Toyota Research

Polar opposites abound in ML systems. Big data v.s. limited labeling capacity, Offline v.s. online learning, Newton vs. Hinton, etc. These polar opposites bring about conflict, trade-offs, and decision difficulties. In this talk, we will discuss examples of polar opposites in our ML efforts towards autonomous driving, and how we handle them at TRI

3:45 PM
4:15 PM
The BOSS DB project at Criteo Hakan Jakobsson and Piyush Narang - Criteo

A project to evaluate alternative database technologies as a partial replacement for Hive. The motivation for the project was that Hive is slow and inefficient and it was felt that we could improve the productivity of our analysts with a technology with better response time while also saving money on hardware. We describe the evaluation process and the technology that was picked, Presto. We also describe some of the practical work that was done in order to deploy Presto on a 200-node production cluster, including frameworks for monitoring, testing, upgrading, failover, and end-user training

4:15 PM
4:45 PM
Providing Streaming Joins as a Service at Facebook Ran Lei- Facebook
4:45 PM
5:15 PM
Debugging predictive Machine Learning systems in production. A practical look at production predictive systems and the problems they have Doug Loyer- Criteo

Machine Learning based predictive systems are used for numerous business applications, such as advertising, transaction and customer churn prediction. Good predictive systems can result in a better business outcomes. This talk addresses the practical problems and approaches to debugging and improving a predictive system and how they differ from other Machine Learning tasks.

5:15 PM
7:00 PM
Reception - 2nd floor
Organized by
Sponsored by
Donating to

Event Location

Criteo Palo Alto
325 Lytton Street, Suite 200
Palo Alto CS 94301