New beginnings @NVIDIA

I am very happy to share the news that I am joining NVIDIA as Director of Machine Learning Research. I will be based in the Santa Clara HQ and will be hiring ML researchers and engineers at all levels, along with graduate interns.

I will be continuing my role as Bren professor at Caltech and will be dividing my time between northern and southern California. I look forward to building strong intellectual relationships between NVIDIA and Caltech. There are many synergies with initiatives at Caltech such as the Center for Autonomous Systems (CAST) and AI4science.

I found NVIDIA to be a natural fit and it stood out among other opportunities. I chose NVIDIA because of its track record, its pivotal role in the deep-learning revolution, and the people I have interacted with. I will be reporting to Bill Dally, the chief scientist of NVIDIA. In addition to Bill, there is a rich history of academic researchers at NVIDIA such as Jan Kautz, Steve Keckler, Joel Emer, and recent hires Dieter Fox and Sanja Fidler. They have created a nourishing environment that blends research with strong engineering. I am looking forward to working with CEO Jensen Huang, whose vision for research I find inspiring.

The deep-learning revolution would not have happened without NVIDIA’s GPUs. The latest Volta GPUs pack an impressive 125 teraFLOPS and have fueled developments in diverse areas. The recently released NVIDIA Tesla T4 GPU is the world’s most advanced inference accelerator and NVIDIA GeForce represents the biggest leap in performance for graphics rendering since it is the world’s first real-time ray tracing GPU.

As many of you know, NVIDIA is much more than a hardware company. The development of CUDA libraries at NVIDIA has been a critical component for scaling up deep learning. The CUDA primitives are also relevant to my research on tensors. I worked with NVIDIA researcher Cris Cecka to build extended BLAS kernels for tensor contraction operations a few years ago. I look forward to building more support for tensor algebraic operations in CUDA which can lead to more efficient tensorized neural network architectures.

I admire recent ML research that has come out of NVIDIA. This includes state-of-art generative models for images and video, image denoising etc. The convergence of ML research with state-of-art hardware is happening at rapid pace at NVIDIA. In addition, I am also thrilled about developments in design and visualization, self-driving, IoT/autonomous systems and data center solutions at NVIDIA.

I hope to continue building bridges between academia and industry, and between theory and practice in my new role.

AI4Science @Caltech

AI4science is a new initiative launched at Caltech that aims to broaden the impact of AI and ML across all areas of sciences. The inaugural workshop was held on Aug. 1st. My student Jeremy Bernstein wrote a detailed article on the workshop. The slides of the talks are also available there.

A short blurb of the article: Across science—from astrophysics to molecular biology to economics—a common problem persists: scientists are overwhelmed by the sheer amount of data they are collecting. But this problem might be better viewed as an opportunity, since with appropriate computing resources and algorithmic tools, scientists might hope to unlock insights from these swathes of data to carry their field forward. AI4science is a new initiative at Caltech aiming to bring together computer scientists with experts in other disciplines. While somewhat of a suitcase term, AI or artificial intelligence here means the combination of machine learning algorithms with large compute resources.

Professor Yisong Yue of Caltech’s Computing & Mathematical Sciences department (CMS) gave the first talk, where he gave a general overview of machine learning algorithms and their relevance across science and engineering. Professor Andrew Stuart, also in CMS, gave the talk following Professor Yue. Stuart discussed his interest in fusing data science techniques with known physical law. Frederick Eberhardt, professor of philosophy, spoke next. He discussed his work on causal inference. Professor Anima Anandkumar of the CMS department was the last computer scientist to speak. Anandkumar gave an overview of a successful machine learning technique known as artificial neural networks, which have dramatically improved the ability of computers to understand images and natural language. Anandkumar also spoke about tensor methods in machine learning. The remainder of the day was devoted to talks from scientists who have had success applying machine learning techniques in their respective fields. For more, check out the AI4science website. 


Adieu to AWS

I recently exited out of my role as principal scientist at Amazon Web Services (AWS). In this blog post, I want to recollect the rich learning experiences I had and the amazing things we accomplished over the last two years.

This was my first “industry” job out of academia. I chose AWS for several reasons. I saw huge potential to democratize AI, since AWS is the most comprehensive and broadly adopted cloud platform. Two years ago, cloud AI was still an uncharted territory, and this made it an exciting adventure. I was also attracted by the mandate to bring applied AI research to AWS.

My early days were exciting and busy. We were a new team in bay area with a startup-like environment, while still being connected with the Seattle team and the larger Amazon ecosystem. There was a steep learning curve to understand all the AWS services, software engineering practices, product management etc, and I loved it. We were busy growing the team, designing new AI services, and thinking about research directions, all at the same time.

I am proud of what we accomplished over the last two years. We launched a vast array of AI services at all levels of the stack. At the bottom are the compute instances — the latest GPU instances are the powerful NVIDIA Tesla V100s. The middle layer consists of SageMaker, a fully managed service with high-performance dockerized ML algorithms, and Deeplens, the first deep learning camera with seamless AWS integration. The top layer includes services for computer vision, natural language processing, speech recognition and so on. We made large strides in customer adoption and today AWS has the largest number of customer references for cloud ML services. In addition, the AWS ML lab provides advanced solutions for custom use cases. I got to interact with many customers and it was eye-opening to learn about real-world AI deployment in diverse domains.

I was most closely involved in the design, development and launch of SageMaker. Its broad adoption led to AWS increasing its ML user base by more than 250 percent over the last year. SageMaker removes heavy lifting, complexity, and guesswork from each step of productionizing ML. It was personally fulfilling to build topic modeling on SageMaker (and AWS comprehend) based on my academic research, which uses tensor decompositions. SageMaker topic-modeling automatically categorizes documents at scale and is several times faster than any other (open-source) framework. Check out my talk at ReInvent for more details. Taking the tensor algorithm from its theoretical roots to an AWS production service was a big highlight for me.

It was exciting to grow applied research at AWS. I looked for problems that posed the biggest obstacles in the real world. The “data problem” is the proverbial “elephant in the room”. While researchers work on established benchmark datasets, most of the time and effort in the real world is in data collection and clean up. We developed and tested efficient deep active learning, crowdsourcing and semi-supervised learning methods in a number of domains. We found that deep networks can be trained with significantly less data (~25%). For an overview, check out tutorial slides at UAI. I was also happy to connect my earlier research on tensors with deep learning to obtain a new class of deep networks that naturally encode the data dimensions and higher-order correlations. They are more compact and generalize better in many domains. Tensorly is Keras-like frontend to easily use tensor algebraic operations in deep learning with any backend. Moreover, I realized that in practice, simple methods work even if theory cannot explain it. We tried to close this gap by looking for conditions under which simple methods provably succeed, and then experimentally verifying these conditions. For instance, we showed that 1-bit gradient quantization has almost no accuracy loss but has reduced communication requirements for distributed ML, both in theory and in practice. All these projects were executed with an excellent cohort of interns and AWS scientists.

Being at AWS gave me a platform for community outreach to democratize AI. I worked to build partnerships with universities and non-profit organizations. The Caltech-Amazon partnership funded graduate fellowships and cloud credits which is is transforming fundamental scientific research at Caltech. This partnership also resulted in a new AWS office in Pasadena with Stefano Soatto and Pietro Perona at the helm.

I am happy that I got to represent AWS at many prominent avenues, including Deep Learning Indaba 2017, the first pan-African deep learning summit, Mulan forum for Chinese women entrepreneurs, Geekpark forum for startups in China and Shaastra 2018 at IIT Madras, a student run techfest, where we held the largest deep learning workshop in India.

I had the privilege to work with and learn from so many amazing individuals. It was enlightening to hear about the early days of AWS from veteran AWS engineer and team VP Swami Sivasubramanian. I tried to develop new skills from the very best: product management from Joseph Spisak, team management from Craig Wiley, software engineering from Leo Dirac, clear exposition from Zachary Lipton, ML practice from Sunil Mallya, to name a few. Attending MARS and interacting with Jeff Bezos and many other superstars was a big highlight.

I learnt good management principles and business practices at Amazon. Leadership principles is a succinct list of desirable leadership qualities. But some principles are at odds with others, which meant there was a need for balance. For instance, the “dive deep” principle requires time and effort, while “bias for action” calls for expediency. I also found that working backwards from customer needs and having “two pizza” teams resulted in focused discussions with great outcomes. Another effective strategy I learnt was the Press Release (PR)-FAQ. Presentations are banned in Amazon and in order to pitch any new idea one had to start by writing its press release. This entailed having clarity on the product goals and its target customers, right from the beginning. I could see the effectiveness of all these principles and their role in making Amazon the huge success it is today.

To summarize, I am very thankful for the learning experience I had at AWS. In the next post, I will talk about my upcoming plans. Stay tuned!

Below is a slideshow of some of my favorite memories..

At MARS 2017
With Jeff Bezos, Swami Sivasubramanian, Ralf Herbrich, Hassan Sawaf.
At MARS 2017
Operating the giant robot
At ReInvent 2017
Talk video
At ReInvent 2017
With Zachary Lipton
At Roborace, ReInvent 2017
WIth Werner Vogels, Swami Sivasubramanian, Tom Soderstrom, Sunil Mallya
In Poland
Amazon conference
Visiting India
Largest DL workshop in India
Recruiting for AWS
With Larissa and Dilpreet
AWS-Caltech partnership
Sunil Mallya, AWS senior solution architect visiting Caltech
Geek Park, Chengdu
With interns
Kayaking weekend
At MARS 2018
With Adam Savage
At MARS 2018
Blue origin capsule
At MARS 2018
Aerial Aerobatics
Cute animals
Apt ending for any slide show!
previous arrow
next arrow

My first blog post

Over the years, I have followed many good academic blogs, e.g. Scott Aaronson, Moritz HardtZachary LiptonLior Pachter etc. Until now I didn’t think I had time for blogging. May be one of the luxuries of tenure is to make room for things I always wanted to do  and this is one of them 🙂

Besides writing has never been my strongest suite. My verbal skills developed much later than my mathematical skills. At the age of 3, my mom tells me that I could barely speak but could solve lots of puzzles. Apparently my grandma had written up the calendar for the next 10 years that I had memorized and could name the day for any date. At school, I struggled with essays. Fast forward, in grad school I was frustrated that papers had to be written and that just deriving math equations and proofs was not enough. Math was my native language and everything else felt foreign.

This didn’t change until a few years ago when I learnt the importance of writing and communication the hard way. I interviewed at most of the top schools and amidst a hasty schedule, I did not polish my presentations or gather thoughts on how to communicate my research to people outside my field. I did not get any offers, even though there was strong initial interest. This forced me to think how I should communicate complex research to general audience. Richard Feynman was an inspiration during my teenage years and I went back to looking at how he structured his lectures. I also had Zachary Lipton work with me. He wrote beautiful prose and I learnt a lot from him.

I am now thankful that I spent time to improve my verbal and presentation skills because it has opened up so many doors. At Amazon, I have interacted with product managers, marketing people, engineers with no ML background and customers. This requires providing customizing the explanations and listening to their concerns.  I realize how critical these skills are for me to be effective. These days I insist that all my students take courses on communication and be able to write well. I will give them editorial comments, but I will not write on their behalf. I keep emphasizing the importance of clear communication.

This blog is an attempt to get my thoughts out to the world. You will hear more in the coming days.