Job change | Anima on AI

I am very happy to share the news that I am joining NVIDIA as Director of Machine Learning Research. I will be based in the Santa Clara HQ and will be hiring ML researchers and engineers at all levels, along with graduate interns.

I will be continuing my role as Bren professor at Caltech and will be dividing my time between northern and southern California. I look forward to building strong intellectual relationships between NVIDIA and Caltech. There are many synergies with initiatives at Caltech such as the Center for Autonomous Systems (CAST) and AI4science.

I found NVIDIA to be a natural fit and it stood out among other opportunities. I chose NVIDIA because of its track record, its pivotal role in the deep-learning revolution, and the people I have interacted with. I will be reporting to Bill Dally, the chief scientist of NVIDIA. In addition to Bill, there is a rich history of academic researchers at NVIDIA such as Jan Kautz, Steve Keckler, Joel Emer, and recent hires Dieter Fox and Sanja Fidler. They have created a nourishing environment that blends research with strong engineering. I am looking forward to working with CEO Jensen Huang, whose vision for research I find inspiring.

The deep-learning revolution would not have happened without NVIDIA’s GPUs. The latest Volta GPUs pack an impressive 125 teraFLOPS and have fueled developments in diverse areas. The recently released NVIDIA Tesla T4 GPU is the world’s most advanced inference accelerator and NVIDIA GeForce represents the biggest leap in performance for graphics rendering since it is the world’s first real-time ray tracing GPU.

As many of you know, NVIDIA is much more than a hardware company. The development of CUDA libraries at NVIDIA has been a critical component for scaling up deep learning. The CUDA primitives are also relevant to my research on tensors. I worked with NVIDIA researcher Cris Cecka to build extended BLAS kernels for tensor contraction operations a few years ago. I look forward to building more support for tensor algebraic operations in CUDA which can lead to more efficient tensorized neural network architectures.

I admire recent ML research that has come out of NVIDIA. This includes state-of-art generative models for images and video, image denoising etc. The convergence of ML research with state-of-art hardware is happening at rapid pace at NVIDIA. In addition, I am also thrilled about developments in design and visualization, self-driving, IoT/autonomous systems and data center solutions at NVIDIA.

I hope to continue building bridges between academia and industry, and between theory and practice in my new role.

I recently exited out of my role as principal scientist at Amazon Web Services (AWS). In this blog post, I want to recollect the rich learning experiences I had and the amazing things we accomplished over the last two years.

This was my first “industry” job out of academia. I chose AWS for several reasons. I saw huge potential to democratize AI, since AWS is the most comprehensive and broadly adopted cloud platform. Two years ago, cloud AI was still an uncharted territory, and this made it an exciting adventure. I was also attracted by the mandate to bring applied AI research to AWS.

My early days were exciting and busy. We were a new team in bay area with a startup-like environment, while still being connected with the Seattle team and the larger Amazon ecosystem. There was a steep learning curve to understand all the AWS services, software engineering practices, product management etc, and I loved it. We were busy growing the team, designing new AI services, and thinking about research directions, all at the same time.

I am proud of what we accomplished over the last two years. We launched a vast array of AI services at all levels of the stack. At the bottom are the compute instances — the latest GPU instances are the powerful NVIDIA Tesla V100s. The middle layer consists of SageMaker, a fully managed service with high-performance dockerized ML algorithms, and Deeplens, the first deep learning camera with seamless AWS integration. The top layer includes services for computer vision, natural language processing, speech recognition and so on. We made large strides in customer adoption and today AWS has the largest number of customer references for cloud ML services. In addition, the AWS ML lab provides advanced solutions for custom use cases. I got to interact with many customers and it was eye-opening to learn about real-world AI deployment in diverse domains.

I was most closely involved in the design, development and launch of SageMaker. Its broad adoption led to AWS increasing its ML user base by more than 250 percent over the last year. SageMaker removes heavy lifting, complexity, and guesswork from each step of productionizing ML. It was personally fulfilling to build topic modeling on SageMaker (and AWS comprehend) based on my academic research, which uses tensor decompositions. SageMaker topic-modeling automatically categorizes documents at scale and is several times faster than any other (open-source) framework. Check out my talk at ReInvent for more details. Taking the tensor algorithm from its theoretical roots to an AWS production service was a big highlight for me.

It was exciting to grow applied research at AWS. I looked for problems that posed the biggest obstacles in the real world. The “data problem” is the proverbial “elephant in the room”. While researchers work on established benchmark datasets, most of the time and effort in the real world is in data collection and clean up. We developed and tested efficient deep active learning, crowdsourcing and semi-supervised learning methods in a number of domains. We found that deep networks can be trained with significantly less data (~25%). For an overview, check out tutorial slides at UAI. I was also happy to connect my earlier research on tensors with deep learning to obtain a new class of deep networks that naturally encode the data dimensions and higher-order correlations. They are more compact and generalize better in many domains. Tensorly is Keras-like frontend to easily use tensor algebraic operations in deep learning with any backend. Moreover, I realized that in practice, simple methods work even if theory cannot explain it. We tried to close this gap by looking for conditions under which simple methods provably succeed, and then experimentally verifying these conditions. For instance, we showed that 1-bit gradient quantization has almost no accuracy loss but has reduced communication requirements for distributed ML, both in theory and in practice. All these projects were executed with an excellent cohort of interns and AWS scientists.

Being at AWS gave me a platform for community outreach to democratize AI. I worked to build partnerships with universities and non-profit organizations. The Caltech-Amazon partnership funded graduate fellowships and cloud credits which is is transforming fundamental scientific research at Caltech. This partnership also resulted in a new AWS office in Pasadena with Stefano Soatto and Pietro Perona at the helm.

I am happy that I got to represent AWS at many prominent avenues, including Deep Learning Indaba 2017, the first pan-African deep learning summit, Mulan forum for Chinese women entrepreneurs, Geekpark forum for startups in China and Shaastra 2018 at IIT Madras, a student run techfest, where we held the largest deep learning workshop in India.

I had the privilege to work with and learn from so many amazing individuals. It was enlightening to hear about the early days of AWS from veteran AWS engineer and team VP Swami Sivasubramanian. I tried to develop new skills from the very best: product management from Joseph Spisak, team management from Craig Wiley, software engineering from Leo Dirac, clear exposition from Zachary Lipton, ML practice from Sunil Mallya, to name a few. Attending MARS and interacting with Jeff Bezos and many other superstars was a big highlight.

I learnt good management principles and business practices at Amazon. Leadership principles is a succinct list of desirable leadership qualities. But some principles are at odds with others, which meant there was a need for balance. For instance, the “dive deep” principle requires time and effort, while “bias for action” calls for expediency. I also found that working backwards from customer needs and having “two pizza” teams resulted in focused discussions with great outcomes. Another effective strategy I learnt was the Press Release (PR)-FAQ. Presentations are banned in Amazon and in order to pitch any new idea one had to start by writing its press release. This entailed having clarity on the product goals and its target customers, right from the beginning. I could see the effectiveness of all these principles and their role in making Amazon the huge success it is today.

To summarize, I am very thankful for the learning experience I had at AWS. In the next post, I will talk about my upcoming plans. Stay tuned!

Below is a slideshow of some of my favorite memories..