MLconf SF Speakers

Andy Feng


Distinguished Architect, Yahoo

Abstract: Scalable Machine Learning at Yahoo
Yahoo scientists have developed variety of machine learning libraries (supervised learning, unsupervised learning, deep learning) for online search, advertising and personalization. The emerging business needs require us to address 2 problems:

  • Can we apply these libraries against massive datasets (billions of training examples, and millions of features) using commodity hardware clusters?
  • Can we reduce the learning time from days to minutes or seconds?

We have thus examined system architecture options (including Hadoop, Spark and Storm), and developed a fault-tolerant MPI solution that allows hundreds of machines to jointly build a model. We are collaborating with open source community for a better system architecture for next-gen machine learning applications. Yahoo ML libraries are being revised for much better scalability and latency. In the talk, we will share system architecture of our ML platform and its use cases.

Andy Feng is a Distinguished Architect at Yahoo leading the architecture and design of nextgen Big Data platforms as well as machine learning initiatives. He is a PPMC member and commiter of the Apache Storm project and a contributor to the Apache Spark project. He served as a track chair and program committee member at Hadoop Summit and Spark Summit in both 2013 and 2014. At Yahoo, he has also architected major platforms for personalization, ads serving, NoSQL, serving containers and messaging infrastructure. Prior to Yahoo, Andy served as Chief Architect at Netsape/AOL and Principal Scientist at Xerox.

Lise Getoor


Professor, Computer Science, UC Santa Cruz

One of the challenges in big data analytics lies in being able to reason collectively about extremely large, heterogeneous, incomplete, noisy interlinked data. We need data science techniques which an represent and reason effectively with this form of rich and multi-relational graph data. In this presentation, I will describe some common collective inference patterns needed for graph data including: collective classification (predicting missing labels for nodes in a network), link prediction (predicting potential edges), and entity resolution (determining when two nodes refer to the same underlying entity). I will describe three key capabilities required: relational feature construction, collective inference, and scaling. Finally, I briefly describe some of the cutting edge analytic tools being developed within the machine learning, AI, and database communities to address these challenges.

Lise Getoor is a professor in the Computer Science Department at UC Santa Cruz. Her research areas include machine learning and reasoning under uncertainty; in addition she works in data management, visual analytics and social network analysis. She has over 200 publications and extensive experience with machine learning and probabilistic modeling methods for graph and network data. She is a Fellow of the Association for Artificial Intelligence, an elected board member of the International Machine Learning Society, has served as Machine Learning Journal Action Editor, Associate Editor for the ACM Transactions of Knowledge Discovery from Data, JAIR Associate Editor, and she has served on the AAAI Council. She was co-chair for ICML 2011, and has served on the PC of many conferences including the senior PC of AAAI, ICML, KDD, UAI, WSDM and the PC of SIGMOD, VLDB, and WWW. She is a recipient of an NSF Career Award and eight best paper and best student paper awards. She was recently recognized as one of the top ten emerging researchers leaders in data mining and data science based on citation and impact, according to KDD Nuggets. She received her PhD from Stanford University in 2001, her MS from UC Berkeley, and her BS from UC Santa Barbara, and was a professor at the University of Maryland, College Park from 2001-2013.

Oscar Celma


Director of Research at Pandora

Òscar Celma is currently Director of Research at Pandora, where he leads a team of scientists to provide the best personalized radio experience. From 2011 till 2014 Òscar was Senior Research Scientist at Gracenote. His work focused on music and video recommendation and discovery.

Before that he was co-founder and Chief Innovation Officer at Barcelona Music and Audio Technologies (BMAT). Òscar published a book named “Music Recommendation and Discovery: The Long Tail, Long Fail, and Long Play in the Digital Music Space” (Springer, 2010). In 2008, Òscar obtained his Ph.D. in Computer Science and Digital Communication, in the Pompeu Fabra University (Barcelona, Spain). Mr. Celma holds a few patents from his work on music discovery as well as on Vocaloid, a singing voice-synthesizer bought by Yamaha in 2004.

Xavier Amatriain

xavier almatrain

Director of Algorithms Engineering, Netflix

There are many good textbooks and courses where you can be introduced to machine learning and maybe even learn some of the most intricate details about a particular approach or algorithm. While understanding that theory is a very important base and starting point, there are many other practical issues related to building real-life ML systems that you don’t usually hear about. In this talk I will share some of the most important lessons learned in years of building the large-scale ML solutions that power the Netflix product and scale to millions of users across many countries. I will discuss issues such as model and feature complexity, sampling, regularization, distributing/parallelizing algorithms, or how to think about offline vs. online computation.


Xavier Amatriain (PhD) is Director of Algorithms Engineering at Netflix. He leads a team of researchers and engineers designing the next wave of machine learning approaches to power the Netflix product. Previous to this, he was a Researcher in Recommender Systems, and neighboring areas such as Data Mining, Machine Learning, Information Retrieval, and Multimedia. He has authored more than 50 papers including book chapters, journals, and articles in international conferences. He has also lectured in different universities including the University of California Santa Barbara and UPF in Barcelona, Spain.

Steffen Rendle

Steffen Rendle

Research Scientist, Google

Title: Factorization Machines

Developing accurate recommender systems for a specific problem setting seems to be a complicated and time-consuming task: models have to be defined, learning algorithms derived and implementations written. In this talk, I present the factorization machine (FM) model which is a generic factorization approach that allows to be adapted to problems by feature engineering. Efficient FM learning algorithms are discussed among them SGD, ALS/CD and MCMC inference including automatic hyperparameter selection. I will show on several tasks, including the Netflix prize and KDDCup 2012, that FMs are flexible and generate highly competitive accuracy. With FMs these results can be achieved by simple data preprocessing and without any tuning of regularization parameters or learning rates.

Steffen Rendle is a research scientist at Google. Previous to this, he was an assistant professor at the University of Konstanz, Germany. Steffen’s research interest is in large-scale machine learning using factorization models. His research received the best paper award at WWW 2010 and a best student paper award at WSDM 2010. Steffen has applied his research in various machine learning competitions, receiving awards at the ECML Discovery Challenges 2009 & 2013, both tasks in KDDCup 2012 and other contests.

Arno Candel

Arno Candel

Physicist & Hacker, 0xdata

Title: Distributed Deep Learning for Classification and Regression problems using H2O

Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.

Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.

Johann Schleier-Smith

Johann Schleier-Smith

Co-Founder and CTO, if(we)

Abstract: Agile Machine Learning for Recommender Systems

What can data scientists and machine learning engineers learn from software developers? When it comes to process and tools, and managing complexity, the answer is: quite a bit. When we first started to deploy machine learning at if(we), it felt like we hit a speed bump in the middle of the highway. Accustomed to shipping software to millions of members multiple times a day, to constantly iterating toward better products, we were stunned at how long it took us to try new ideas using available machine learning tools. I will share what what we’ve learned from applying agile software development principles to building recommender systems, describing the tools and platforms that allow us to go from new ideas to proven product improvements in just a few days.

Johann Schleier-Smith is Co-Founder and CTO at if(we), the social network for meeting new people. Under Johann’s leadership, if(we) has produced highly scalable web and mobile products with its platform supporting 300 million users in over 200 countries. With an interest in machine learning, data science, analytics and software development and a passion for recommender systems, he works closely with teams to solve hard science problems, while meeting the trends of 21st century social life, adapting cutting-edge academic work to internet-size and internet-speed applications. Johann holds an A.B. in Physics and Mathematics from Harvard University and pursued a Ph.D. in Physics at Stanford for several years, before leaving to fully focus on if(we).

Quoc Le


Software Engineer, Google

Quoc Le is software engineer at Google and will become an assistant professor at Carnegie Mellon University in Fall 2014. At Google, Quoc works on large scale brain simulation using unsupervised feature learning and deep learning. His work focuses on object recognition, speech recognition and language understanding. Quoc obtained his PhD at Stanford, undergraduate degree with First Class Honours and Distinguished Scholar at the Australian National University, and was a researcher at National ICT Australia, Microsoft Research and Max Planck Institute of Biological Cybernetics. Quoc won best paper award as ECML 2007.

Lorien Pratt

Lorien headshot

Cofounder/Chief Scientist, Quantellia

Decision Intelligence is an emerging discipline that unifies machine learning, complex systems, predictive analytics, causal reasoning, optimization, and more into a unified framework that overcomes limitations of the current data stack that are faced by organizations worldwide. Just as the Unified Modeling Language (UML), along with associated tool companies like Rational, brought the discipline of design to software development, decision intelligence is a methodology, supported by software, that overcomes a number of barriers that have limited the practical use cases of the analytic / data stack. In particular, Decision Intelligence brings engineering practices to decision making, treating the “decision” as an engineered artifact. This means that best practices from design, agile development, and more can now be used to evolve decisions over time, creating a continuous “organizational learning” framework in diverse settings such as the US government and transnational corporations.

Pratt is co-founder and chief scientist of Mountain View-based Quantellia, which offers data, analytics, and decision intelligence software and services worldwide. Pratt previously served as global director of telecommunications research for Stratecast (a division of Frost & Sullivan) and also worked at Bellcore and IBM. A graduate of Dartmouth College and Rutgers University, she holds three degrees in computer science, and served on the computer science faculty at the Colorado School of Mines. A recipient of the CAREER award from the National Science Foundation, and the author of dozens of technical papers and articles, Pratt is also a well-known speaker, author, and co-editor (with Sebastian Thrun) of the book Learning to Learn.

Pinar Donmez

Pinar Donmez

Chief Data Scientist, Kabbage

Pinar is a data scientist with a keen focus on finding meaning in data, and turn the insights into data-driven businesses. As the Chief Data Scientist at Kabbage, she is passionately transforming raw data into valuable knowledge to improve underwriting for SMBs and help them succeed. She is leading a world-class team of data scientists in turning unusually rich data sources on how a business operates into predictive systems that can determine the risk, capacity, and character of the business. Kabbage uses this data-driven technology to lend over $1million per day. Prior to Kabbage, she applied her machine learning skills to attrition prediction and sales propensity estimation at Salesforce’s Data and Analytics team, and user intention understanding through her work at Yahoo! Labs which led to patents and numerous publications. Pinar holds a Ph.D. In Computer Science from CMU, where her main interest lied in machine learning and its applications on active and unsupervised learning. She has published numerous articles in top-tier peer-reviewed ML journals and conferences such as JMLR, ICML, KDD to name a few.

Anthony Bak

Anthony Bak

Principal Data Scientist and Mathematician, Ayasdi

Abstract: Topological Learning with Ayasd
Ayasdi has a unique approach to machine learning and data analysis using topology. This framework represents a revolutionary way to look at and understand data that is orthogonal but complementary to traditional machine learning and statistical tools. In this presentation I will show you what is meant by this statement: How does topology help with data analysis? Why would you use topology? I will illustrate with both synthetic examples and problems we’ve solved for our clients.

Bio: Anthony Bak is a principal research scientist at Ayasdi where he designs machine learning and analytic solutions. Prior to Ayasdi he was a postdoc with Ayasdi co-founder Gunnar Carlsson in the Stanford University Mathematics Department. His PhD is on connections between algebraic geometry and string theory.

Scott Clark


Software Engineer, Yelp

Abstract: Introducing the Metric Optimization Engine (MOE); an open source, black box, Bayesian Global Optimization engine for optimal experimental design.

In this talk we will introduce MOE, the Metric Optimization Engine. MOE is an efficient way to optimize a system’s parameters, when evaluating parameters is time-consuming or expensive. It can be used to help tackle a myriad of problems including optimizing a system’s click-through or conversion rate via A/B testing, tuning parameters of a machine learning prediction method or expensive batch job, designing an engineering system or finding the optimal parameters of a real-world experiment.

MOE is ideal for problems in which the optimization problem’s objective function is a black box, not necessarily convex or concave, derivatives are unavailable, and we seek a global optimum, rather than just a local one. This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access. To use MOE, we simply need to specify some objective function, some set of parameters, and any historical data we may have from previous evaluations of the objective function. MOE then finds the set of parameters that maximize (or minimize) the objective function, while evaluating the objective function as few times as possible. This is done internally using Bayesian Global Optimization on a Gaussian Process model of the underlying system and finding the points of highest Expected Improvement to sample next. MOE provides easy to use Python, C++, CUDA and REST interfaces to accomplish these goals and is fully open source. We will present the motivation and background, discuss the implementation and give real-world examples.

After finishing my PhD in Applied Mathematics at Cornell University in 2012 I have been working on the Ad Targeting team at Yelp Inc. I’ve been employing a variety of machine learning and optimization techniques from multi-armed bandits to Bayesian Global Optimization and beyond to their vast dataset and problems. I have also been trying to lead the charge on academic research and outreach within Yelp by leading projects like the Yelp Dataset Challenge and open sourcing MOE.

Ted Willke

Ted Willke MLconf

Senior Principal Engineer & GM, Datacenter Group, Intel

Ted Willke leads the Graph Analytics Operation within Intel’s Datacenter Group, which designs, develops, and deploys enterprise software for distributed parallel machine learning and data mining. He developed his expertise in datacenter systems over his 16 years with Intel. He has researched cluster computing technologies in Intel Labs and developed server technologies and standards within Intel’s product organizations. His work covers high-performance I/O, virtualization, next-gen microservers, Hadoop optimization tools, and open source libraries for distributed parallel computing. Ted holds a Doctorate in electrical engineering from Columbia University, where he graduated with Distinction. He has authored over 25 papers in book chapters, journals, and conferences, and he holds 10 patents. He won the MASCOTS Best Paper Award in 2013 for his work on Hadoop MapReduce performance modeling and an Intel Achievement Award this year for his work on graph processing systems.

Tamara Kolda


Distinguished Member of Technical Staff, Sandia National Laboratories

Tamara Kolda is a Distinguished Member of Technical at Sandia National Laboratories in Livermore, California, where she works on a broad range of problems including network modeling and analysis, multilinear algebra and tensor decompositions, data mining, and cybersecurity. She has also worked in optimization, nonlinear solvers, parallel computing, and the design of scientific software. She has authored numerous software packages, including the well-known Tensor Toolbox for MATLAB. Before joining Sandia, Kolda held the Householder Postdoctoral Fellowship in Scientific Computing at Oak Ridge National Laboratory. She has received several awards including a 2003 Presidential Early Career Award for Scientists and Engineers (PECASE), two best papers awards (ICDM’08 and SDM’13), and Distinguished Member of the Association for Computing Machinery (ACM). She is an elected member of the Society for Industrial and Applied Mathematics (SIAM) Board of Trustees, Section Editor for the Software and High Performance Computing section of the SIAM Journal on Scientific Computing, and Associate Editor for SIAM Journal on Matrix Analysis. She received her Ph.D. in applied mathematics from the University of Maryland at College Park in 1997.

Ameet Talwalkar


assistant professor of Computer Science, UCLA

Ameet Talwalkar is an assistant professor of Computer Science at UCLA and a technical advisor for Databricks. His research addresses scalability and ease-of-use issues in the field of statistical machine learning, with applications in computational genomics. He started the MLlib project in Apache Spark and is a co-author of the graduate-level textbook ‘Foundations of Machine Learning’ (2012, MIT Press). Prior to UCLA, he was an NSF post-doctoral fellow in the AMPLab at UC Berkeley. He obtained a B.S. from Yale University and a Ph.D. from the Courant Institute at NYU.

Ted Dunning


Chief Application Architect, MapR

Abstract: Near real-time Updates for Cooccurrence-based Recommenders

Most recommendation algorithms are inherently batch oriented and require all
relevant history to be processed. In some contexts such as music, this does not
cause significant problems because waiting a day or three before recommendations
are available for new items doesn’t significantly change their impact. In other
contexts, the value of items drops precipitously with time so that recommending
day-old items has little value to users.

In this talk, I will describe how a large-scale multi-modal cooccurrence recommender
can be extended to include near real-time updates. In addition, I will show how these
real-time updates are compatible with delivery of recommendations via search

Ted Dunning is Chief Application Architect at MapR and has held Chief Scientist positions at Veoh Networks, ID Analytics and at MusicMatch, (now Yahoo Music). Ted is responsible for building the world’s most advanced identity theft detection system, as well as one of the largest peer-assisted video distribution systems and ground-breaking music and video recommendations systems. Ted has 24 issued and numerous pending patents and contributes to Apache Mahout, Zookeeper and Drill™. He is also a mentor for Apache Spark, Storm, DataFu and Stratosphere.