Author: Tensorial Professor: Anima on AI

Anima Anandkumar is a Bren Professor at Caltech and senior director of AI Research at NVIDIA. Her work developing novel artificial intelligence algorithms enables and accelerates scientific applications of AI, including scientific simulations, weather forecasting, autonomous drone flights, and drug design. Anandkumar is a fellow of the IEEE and Alfred. P. Sloan foundation. She has received best paper awards at venues such as NeurIPS and the ACM Gordon Bell Special Prize for HPC-Based COVID-19 Research. She is part of the World Economic Forum's Expert Network. Anandkumar holds degrees from the Indian Institute of Technology Madras and Cornell University and conducted postdoctoral research at MIT.

My departure from Twitter

Many of you are very concerned about why my Twitter account is no longer active. I have voluntarily decided to de-activate my account in the interest of my safety and to reduce anxiety for my loved ones. I want to focus on my research and my team where my attention and energy are badly needed.

I want to emphasize that this decision is solely mine. My employers NVIDIA and Caltech are fully supportive of me and my mission. They support employees expressing diverse personal views.

I am proud of the work we have done to promote diversity and inclusion. I encourage you to continue doing that. We are all bright creative minds with an endless potential to innovate. We will find new and safer ways to stay connected and build a better future.

10 things that happened last decade

Got a faculty position at UC Irvine during the recession. Attended N(eur)IPS conference for the first time.
Discovered the power of tensors for machine learning and it became my core research focus.
Received many awards such as Sloan fellowship, Microsoft faculty fellowship and NSF Career Award.
Moved to AWS as principal scientist to build some of the first cloud AI applications. Got to deploy tensor algorithms in production.
Became the youngest named chair professor at Caltech, the highest honor bestowed on individual faculty.
Started the ML research group at NVIDIA while continuing my work at Caltech.
Founded AI4science with Yisong Yue to accelerate interdisciplinary AI research at Caltech.
Shared my #meToo experiences and pushed for NeurIPS name change to help improve the climate for women and minorities in AI.
Having grandkids: academically speaking. Many of my students and mentees became faculty members and formed thriving research groups.
Continue to learn and grow. Lucky to have had amazing opportunities and experiences.

Research Highlights of 2019

2019 was an interesting year in so many ways. I was able to build and solidify research programs, both at NVIDIA and at Caltech. I was able to continue working towards diversity and inclusion in AI, and saw a lot of visible improvements (recent incident I wrote in my previous blog post is a notable exception). Overall, there is a lot of positivity and a great way to end an eventful decade!

Before I list research highlights that I was personally involved in, here’s an overall summary for the AI field from my viewpoint. This was published on KDnuggets.

In 2019, researchers aimed to develop a better understanding of deep learning, its generalization properties, and its failure cases. Reducing dependence on labeled data was a key focus, and methods like self-training gained ground. Simulations became more relevant for AI training and more realistic in visual domains such as autonomous driving and robot learning, including on NVIDIA platforms such as DriveSIM and Isaac. Language models went big, e.g. NVIDIA’s 8 billion Megatron model trained on 512 GPUs, and started producing coherent paragraphs. However, researchers showed spurious correlations and undesirable societal biases in these models. AI regulation went mainstream with many prominent politicians voicing their support for ban of face recognition by Governmental agencies. AI conferences started enforcing a code of conduct and increased their efforts to improve diversity and inclusion, starting with the NeurIPS name change last year. In the coming year, I predict that there will be new algorithmic developments and not just superficial application of deep learning. This will especially impact “AI for science” in many areas such as physics, chemistry, material sciences and biology.

New book on Spectral Learning on Matrices on Tensors

Builds up spectral methods from first principles. Applications to learning latent variable models and deep neural networks. Order your copy here.

Better Optimization Methods

Fixing training in GANs through competitive gradient descent

(Excellent blog post by Florian here). In contrast to standard simultaneous gradient updates, CGD guarantees convergence and is efficient. NeurIPS poster below:

Screen Shot 2020-01-01 at 3.51.05 PM.png

Application of CGD to GAN training and demonstrating its implicit competitive regularization (NeurIPS workshop):

Implicit Competitive Regularization-poster-revised (2)

Guaranteed convergence for SignSGD

SignSGD compresses gradient to a single bit but has no significant loss in accuracy in practice. Theoretically, there are convergence guarantees. Paper. Main theorem:

Screen Shot 2020-01-01 at 1.57.05 PM.png

Generative Models

Exciting collaboration between ML and neuroscience with Doris Tsao at Caltech. Adding feedback generative model to convolutional neural networks significantly improves robustness in tasks such as unsupervised denoising. Short paper here.

cnn-f

Robust Learning in Control Systems

Neural Lander

First work to successfully demonstrate use of deep learning to land drones with stability guarantees. Collaboration under CAST at Caltech. Paper at ICRA 2019.

Robust regression for safe exploration

We address the following: How to extrapolate robustly from your training data in real world control tasks and achieve end to end stability guarantees in safe exploration? Paper. robust-regression

Multi-modal learning for UAV navigation

Multi-modal fusion of vision and IMU improves robustness in navigation and landing. Paper. Screen Shot 2020-01-01 at 12.46.33 PM.png

Generalization in ML

Detecting hard examples through an angular measure

angular

Watch my GTC-DC talk. Angular alignment is a robust measure for hardness: easier examples align more with the target class. We found that correspondence between angular measure and human selection frequency was statistically significant. Improves self training in domain adaptation. Paper.

Regularized learning for domain adaptation

New domain adaptation algorithm to correct for label shifts. Paper

Our ability to fix bias in deep-learning algorithms

Twitter thread here

Screen Shot 2020-01-01 at 11.55.35 AM.png

Neural Programming

Recursive neural networks with external memory

stack-recursive Recursive networks have compositionality and can extrapolate to unseen instances.

Extrapolation to harder instances (higher tree depth) is challenging.

We show that augmenting with external memory stacks significantly improves extrapolation. Paper

Open Vocabulary Learning on Source Code with a Graph–Structured Cache

Use of syntax trees in program code to handle unbounded vocabulary. Paper

Reinforcement Learning and Bandits

Robust off-policy evaluation

Robust methods to handle covariate shift in off-policy evaluation. PaperStochastic Linear Bandits with Hidden Low Rank Structure

Low regret algorithms for discovering hidden low rank structures in bandits. Paper

Competitive Differentiation for Lagrangian Problems

Many RL problems have constraints leading to Lagrangian formulation. Paper Poster_A_Lagrangian_Method_for_Inverse_Problems_in_Reinforcement_Learning

Context-based Meta RL with Structured Latent Space

AI4science and Other Applications

Neural ODEs for Turbulence Forecasting

Turbulence modeling is a notoriously hard problem. Exciting initial results presented at NeurIPS workshop.

Tackling Trolling through Dynamic Keyword Selection Methods

Can we find social media trolls and create more trustworthy environment online? We presented our ongoing study of the #meToo movement, in collaboration with Michael Alvarez from Social sciences at Caltech.

Thank you all!

The Thanksgiving weekend has been a much needed staycation. Given how much I travel rest of the year, I feel lucky to avoid all the weather-related chaos this weekend. I got to even see the rare views of snow-capped mountains juxtaposed with palm trees as I worked from my office at Caltech.

A195CFCA-29BC-4B9E-8C08-2DEEBA3B9420

I got the time to reflect on so many things that I am thankful for. I am lucky to be part of two amazing communities at Caltech and NVIDIA. This year I have deepened my professional relationships at both the places and also made many friends and found wonderful supportive mentors. I am thankful for amazing colleagues from whom I have learnt so much. I have amazing team of researchers at both the places. Their curiosity and passion continues to inspire me.

I am thankful for all the support and encouragement that I have received when I have spoken about the need for better diversity and inclusion in our tech communities. Having allies and raising awareness has been incredibly fulfilling. I am especially thankful to also those who had the humility and honesty to tell me that they were wrong or unaware and that I had changed their mind. I hope you can pass it along. Together we can build a healthy community where everyone can thrive.

I am so thankful to my family. They are my rock and have been so supportive through everything I went through this year. I am thankful to having a new addition to my family: my sister-in-law Prashanthi who displays wisdom and maturity beyond her years.

I should add that it hasn’t always been easy for me to show appreciation and communicate how much I am thankful to everyone. I strive to make things better in all walks of my life, both personal and professional. This means my mind is attuned to finding gaps, calling them out, and trying to fix things. I believe that it is possible to strike a balance: being thankful for the present while striving for a better tomorrow.

Have a great Thanksgiving weekend!

Call for accountability of deployed AI services

Edit: I want to clarify during my tenure at AWS, I did not work on face recognition or was involved in any of the decisions to sell it to law enforcement. I added this after a journalist asked me about this.

About two weeks ago, Timnit Gebru and Margaret (Meg) Mitchell approached me with a request to sign a letter outlining scientific arguments to counter the claims made by Amazon representatives regarding their face recognition service and calling them to stop selling it to law enforcement.

I am one of 26 signatories. This includes many veteran leaders in the community including Yoshua Bengio, one of the Turing award winners this year. So I am in good company 😉 In addition, there are numerous other groups which have called for Amazon to stop selling it to the police.

Joy Buolamwini and Inioluwa Deborah Raji have done amazingly in-depth research on this topic and you can check it out at the gendershades website. So all the credit goes to them for laying this strong foundation.

When I read the letter I was happy to see careful factual arguments being made that are grounded in science. My hope is that the letter opens up a public dialogue on how we can evaluate face recognition (and other AI services), both in terms of metrics, but also the social context in which it is being deployed.

I am a former member of the AWS AI group and I want to clarify I have at most admiration of how AWS has transformed the developer ecosystem. AWS services have removed a lot of “heavy lifting” in DevOps and democratized software development. I am hoping that this letter leads to productive dialogue and we can collectively work towards enhancing the beneficial uses of AI.

Govt. regulation can only come about once we have laid out technical frameworks to evaluate these systems. The gendershades paper shows how our current evaluation metrics are broken, and it starts with imbalanced training data. So we need a variety of different ways to evaluate the system and we need accountability from currently deployed AI services. In short, regulation is only part of the answer but is badly needed.

Update: AWS released a FAQ outlining guidelines of how face recognition should be used. Unfortunately, this does not solve anything. https://aws.amazon.com/rekognition/the-facts-on-facial-recognition-with-artificial-intelligence/

Reproducibility Debate: Should Code Release be Compulsory for Conference Publications?

Update: Added discussions in the end based on Twitter conversations.

Yesterday, I was on the debate team at DALI conference in gorgeous George in South Africa. The topic was:

“DALI believes it is justified for industry researchers not to release code for reproducibility because of proprietary code and dependencies.”

I was opposing the motion, and this matched by personal beliefs. I am happy to talk about my own stance but I cannot disclose the arguments of others, since it was off the records (and their arguments were not necessarily their own personal opinions).

Edit: Uri Shalit and I formed the team opposing the motion. I checked with him to see if he is fine with me mentioning it. We collaboratively came back with the points below.

This topic is timely since ICML 2019 has added reproducibility as one of the factors to be considered by the reviewers. When it first came up, it seemed natural to set standards for reproducibility: the same way we set standards for a publication at our top-tier conferences. However, I was disheartened to see vocal opposition, especially from many “big-name” industry researchers. So with that background, DALI decided to focus the reproducibility debate on industry researchers.

My main reasons for opposing the motion:

Pseudo-code is just not enough: Anyone who has tried to implement an algorithm from another paper knows how terribly frustrating and time consuming it can be. With complex DL algorithms, every tiny detail matters: from hyperparameters to the randomness of the machine. It is another matter that this brittleness of DL is a huge cause of concern. See excellent talk by Joelle Pineau on reproducibility issues in reinforcement learning. In the current peer-review environment, it is nearly impossible to get a paper accepted unless all comparisons are made. I have personally had papers rejected even after we clearly stated that we could not reproduce the results of another paper.
Unfair to academic researchers: The cards are already stacked against academic researchers: they do not have access to vast compute and engineering resources. This is exasperated by the lack of reproducibility. It is grossly unfair to expect a graduate student to reproduce the results of a 100-person engineering team. It is critical to keep academia competitive: we are training the next generation and much of basic research still happens only in academia.
Accountability and fostering healthy environment: As AI gets deployed in the real world, we need to be responsible and accountable. We would not allow new medical drugs into the market without careful trials. The same standards should apply to AI , especially in safety critical applications. It first starts with setting rigorous standards for our research publications. Having accessible code allows the research community to extensively test the claims of the paper. Only then, it can be called legitimate science.
No incentives for voluntary release of code: Jessica Forde gave me some depressing statistics: currently only one third of the papers voluntarily release code. Many argue that making it compulsory is Draconian. I will take Draconian any day if it ensures a fair environment that promotes honest progress. There is also the broader issue that the current review system is broken: fair credit assignment is not ensured and false hype is unfairly rewarded. I am proud how the AI field, industry in particular, has embraced the culture of open sourcing. This is arguably the single most important factor for rapid progress. There is incentive for industries to open source since it allows them to capture a user base. These incentives have a smaller effect on release of individual papers. It is therefore needed to enforce standards.
To increase synergistic impacts of the field: Counter-intuitively, code release will move the field away from leaderboard chasing. When code is readily available, barriers of entry for incremental research are lowered. Researchers are incentivized to do “deeper” investigation of the algorithms. Without this, we are surely headed for the next AI winter.

Countering the arguments that support the motion:

Cannot separate code from internal infrastructure: There exist (admittedly imperfect) solutions such as containerization. But this is a technical problem, and we are good at coming up with solutions for such well-defined problems.
Will drive away industry researchers and will slow down progress of AI: First of all, progress of AI is not just dependent on industry researchers. Let us not have an “us vs. them” mentality. We need both industry and academia to make AI progress. I am personally happy if we can drive away researchers who are not ready to provide evidence for their claims. This will create a much healthier environment and will speed up progress.
Reproducibility is not enough: Certainly! But it is a great first step. As next steps, we need to ensure usable and modular code. We need abstractions that allows for easy repurposing of parts of the code. These are great technical challenges: ones our community is very well equipped to tackle.

Update from Twitter conversations

There was enthusiastic participation on Twitter. A summary below:

Useful tools for reproducibility:

Screen Shot 2019-01-28 at 4.45.56 PM Screen Shot 2019-01-28 at 4.42.29 PM Screen Shot 2019-01-28 at 4.50.25 PM Screen Shot 2019-01-28 at 5.07.10 PM Screen Shot 2019-01-28 at 4.51.04 PM Screen Shot 2019-01-28 at 5.17.36 PM Lessons from other communities:

Screen Shot 2019-01-28 at 5.18.37 PM Screen Shot 2019-01-28 at 4.46.25 PM It is not just about code, but data, replication etc:

Screen Shot 2019-01-28 at 5.16.42 PM Screen Shot 2019-01-28 at 5.08.23 PM Screen Shot 2019-01-28 at 5.17.56 PM Screen Shot 2019-01-28 at 5.20.26 PM Disagreements:

Screen Shot 2019-01-28 at 5.22.31 PM Screen Shot 2019-01-28 at 5.23.24 PM

I assume that the Tweet above does not represent the official position of Deep mind, but I am not surprised.

I do not agree with the premise that it is a worthwhile exercise for others to reinvent the wheel, only to find out it is just vaporware. It is unfair to academia and unfair to graduate students whose careers depend on this.

I also find it ironic that the comment states that if an algorithm is so brittle to hyperparameters we should not trust these results. YES! That is the majority of deep RL results that are hyped up (and we know who the main culprit is).

What happens behind the doors: Even though there is overwhelming public support, I know that such efforts get thwarted in committee meetings of popular conferences like ICML and NeurIPS. We need to apply more pressure to have better accountability.

It is time to burst the bubble on hyped up AI vaporware with no supporting evidence. Let the true science begin!

2018 in Review

This post reviews my experiences in 2018. I welcomed the year in the gorgeous beaches of Goa and am now ending it in the wilderness of South Africa. My highlights of 2018 are the following:

Joining NVIDIA: I joined NVIDIA in September and started a new research group on core AI/ML. I am hiring at full pace and have started many new projects. I am also excited about many new launches from NVIDIA over the last few months:

Rapids: Apache open-source multi-GPU ML library.
Clara: Platform for medical imaging.
Physx: Open source 3D simulation framework.

Honor of being the youngest named chair professor at Caltech: I was one of the six faculty members that Caltech recognized during the 2017-18 academic year. This is the Institute’s most distinguished award for individual faculty.

Launching AWS Ground Truth: Before leaving AWS, I was working on the ground truth service which got launched during ReInvent conference in November. Data is a big barrier to adoption of AI. The availability of private workforce and not just the public crowd on Mturk will be a game changer in many applications. My team did the prototyping and many research projects on active learning, crowdsourcing and building intelligence into the data collection process.

Exciting research directions:

Autonomous Systems: CAST at Caltech was launched in October 2017 to develop foundations for autonomy. This has been an exciting new area of research for me. We got a DARPA Physics of AI project funded that infuses physics into AI algorithms. The first paper to come out of this project has been the neural lander that uses neural networks to improve landing of drones while guaranteeing stability. Check out its videos here.
AI4Science at Caltech: Along with Yuxin Chen and Yisong Yue, I launched AI4Science initiative at Caltech. The goal is to do truly integrated research that brings about new advances in many scientific domains. Some great use cases are high energy physics, earthquake detection, spinal cord therapy etc.
Core ML research: We have pushed for a holistic view of AI as data + algorithms + systems.
- Active learning and crowdsourcing for intelligent data gathering that significantly reduces data requirements.
- Neural rendering model combines generation and prediction in a single model for semi-supervised learning of images.
- SignSGD yields drastic gradient compression with almost no loss in accuracy.
- Symbols + Numbers: Instead of indulging in pointless Twitter debates over which is better, can we just unify both? We combine symbolic expressions and numerical data in a common framework for neural programming.
- Principled approaches in reinforcement learning: We develop efficient Bayesian DQN that improves exploration in high dimensions. We derive new trust-region policy optimization for partially observable models with guaranteed monotonic improvement. We show negative results for combining model-based and model-free RL frameworks.
- Domain adaptation: We derive generalization bounds when there are shifts in label distribution between source and target. This is applicable for AI cloud services where training distribution can have different proportions of categories from the serving distribution.
- Tensorly: The open-source framework that allows you to write tensor algorithms in Python and choosing any of the backends: PyTorch, TensorFlow, NumPy or MxNet. It has many new features now and is now part of PyTorch ecosystem.

On academic job market: My graduating student Kamyar Azzizadenesheli has done ground-breaking work in reinforcement learning (some of which I outlined above). Hire him!

Having grandkids: academically speaking 😉 It is great to see my former student Furong Huang and my former postdoc Rose Yu thrive in their faculty careers.

Outreach and Democratization of AI: It has been very fulfilling to educate the public about AI around the world. I gave my first TEDx talk. I shared the stage with so many luminaries such as his holiness Dalai Lama. It was special to speak to a large crowd of Chinese women entrepreneurs at the Mulan event.

2018 NYTimes GoodTech award: for raising awareness about diversity and inclusion. 2018 has been a defining year for me and for many #womeninTech. A large part of my energy went into fighting vicious sexism in our research communities. It is impossible to distill this into few sentences. I have had to fend off numerous pushbacks, trolls and threats. But the positive part has been truly uplifting: countless women have hugged me and said that I am speaking on their behalf. I have found numerous male allies who have pledged to fight sexism and racism.

I want to end the year in a positive light. I hope for a great 2019! I know it is not going to be easy, but I won’t give up. Stay strong and fight for what you truly believe in!

New beginnings @NVIDIA

I am very happy to share the news that I am joining NVIDIA as Director of Machine Learning Research. I will be based in the Santa Clara HQ and will be hiring ML researchers and engineers at all levels, along with graduate interns.

I will be continuing my role as Bren professor at Caltech and will be dividing my time between northern and southern California. I look forward to building strong intellectual relationships between NVIDIA and Caltech. There are many synergies with initiatives at Caltech such as the Center for Autonomous Systems (CAST) and AI4science.

I found NVIDIA to be a natural fit and it stood out among other opportunities. I chose NVIDIA because of its track record, its pivotal role in the deep-learning revolution, and the people I have interacted with. I will be reporting to Bill Dally, the chief scientist of NVIDIA. In addition to Bill, there is a rich history of academic researchers at NVIDIA such as Jan Kautz, Steve Keckler, Joel Emer, and recent hires Dieter Fox and Sanja Fidler. They have created a nourishing environment that blends research with strong engineering. I am looking forward to working with CEO Jensen Huang, whose vision for research I find inspiring.

The deep-learning revolution would not have happened without NVIDIA’s GPUs. The latest Volta GPUs pack an impressive 125 teraFLOPS and have fueled developments in diverse areas. The recently released NVIDIA Tesla T4 GPU is the world’s most advanced inference accelerator and NVIDIA GeForce represents the biggest leap in performance for graphics rendering since it is the world’s first real-time ray tracing GPU.

As many of you know, NVIDIA is much more than a hardware company. The development of CUDA libraries at NVIDIA has been a critical component for scaling up deep learning. The CUDA primitives are also relevant to my research on tensors. I worked with NVIDIA researcher Cris Cecka to build extended BLAS kernels for tensor contraction operations a few years ago. I look forward to building more support for tensor algebraic operations in CUDA which can lead to more efficient tensorized neural network architectures.

I admire recent ML research that has come out of NVIDIA. This includes state-of-art generative models for images and video, image denoising etc. The convergence of ML research with state-of-art hardware is happening at rapid pace at NVIDIA. In addition, I am also thrilled about developments in design and visualization, self-driving, IoT/autonomous systems and data center solutions at NVIDIA.

I hope to continue building bridges between academia and industry, and between theory and practice in my new role.

AI4Science @Caltech

AI4science is a new initiative launched at Caltech that aims to broaden the impact of AI and ML across all areas of sciences. The inaugural workshop was held on Aug. 1st. My student Jeremy Bernstein wrote a detailed article on the workshop. The slides of the talks are also available there.

https://sites.google.com/view/ai-for-science-workshop/about-ai4science

A short blurb of the article: Across science—from astrophysics to molecular biology to economics—a common problem persists: scientists are overwhelmed by the sheer amount of data they are collecting. But this problem might be better viewed as an opportunity, since with appropriate computing resources and algorithmic tools, scientists might hope to unlock insights from these swathes of data to carry their field forward. AI4science is a new initiative at Caltech aiming to bring together computer scientists with experts in other disciplines. While somewhat of a suitcase term, AI or artificial intelligence here means the combination of machine learning algorithms with large compute resources.

Professor Yisong Yue of Caltech’s Computing & Mathematical Sciences department (CMS) gave the first talk, where he gave a general overview of machine learning algorithms and their relevance across science and engineering. Professor Andrew Stuart, also in CMS, gave the talk following Professor Yue. Stuart discussed his interest in fusing data science techniques with known physical law. Frederick Eberhardt, professor of philosophy, spoke next. He discussed his work on causal inference. Professor Anima Anandkumar of the CMS department was the last computer scientist to speak. Anandkumar gave an overview of a successful machine learning technique known as artificial neural networks, which have dramatically improved the ability of computers to understand images and natural language. Anandkumar also spoke about tensor methods in machine learning. The remainder of the day was devoted to talks from scientists who have had success applying machine learning techniques in their respective fields. For more, check out the AI4science website.

Adieu to AWS

I recently exited out of my role as principal scientist at Amazon Web Services (AWS). In this blog post, I want to recollect the rich learning experiences I had and the amazing things we accomplished over the last two years.

This was my first “industry” job out of academia. I chose AWS for several reasons. I saw huge potential to democratize AI, since AWS is the most comprehensive and broadly adopted cloud platform. Two years ago, cloud AI was still an uncharted territory, and this made it an exciting adventure. I was also attracted by the mandate to bring applied AI research to AWS.

My early days were exciting and busy. We were a new team in bay area with a startup-like environment, while still being connected with the Seattle team and the larger Amazon ecosystem. There was a steep learning curve to understand all the AWS services, software engineering practices, product management etc, and I loved it. We were busy growing the team, designing new AI services, and thinking about research directions, all at the same time.

I am proud of what we accomplished over the last two years. We launched a vast array of AI services at all levels of the stack. At the bottom are the compute instances — the latest GPU instances are the powerful NVIDIA Tesla V100s. The middle layer consists of SageMaker, a fully managed service with high-performance dockerized ML algorithms, and Deeplens, the first deep learning camera with seamless AWS integration. The top layer includes services for computer vision, natural language processing, speech recognition and so on. We made large strides in customer adoption and today AWS has the largest number of customer references for cloud ML services. In addition, the AWS ML lab provides advanced solutions for custom use cases. I got to interact with many customers and it was eye-opening to learn about real-world AI deployment in diverse domains.

I was most closely involved in the design, development and launch of SageMaker. Its broad adoption led to AWS increasing its ML user base by more than 250 percent over the last year. SageMaker removes heavy lifting, complexity, and guesswork from each step of productionizing ML. It was personally fulfilling to build topic modeling on SageMaker (and AWS comprehend) based on my academic research, which uses tensor decompositions. SageMaker topic-modeling automatically categorizes documents at scale and is several times faster than any other (open-source) framework. Check out my talk at ReInvent for more details. Taking the tensor algorithm from its theoretical roots to an AWS production service was a big highlight for me.

It was exciting to grow applied research at AWS. I looked for problems that posed the biggest obstacles in the real world. The “data problem” is the proverbial “elephant in the room”. While researchers work on established benchmark datasets, most of the time and effort in the real world is in data collection and clean up. We developed and tested efficient deep active learning, crowdsourcing and semi-supervised learning methods in a number of domains. We found that deep networks can be trained with significantly less data (~25%). For an overview, check out tutorial slides at UAI. I was also happy to connect my earlier research on tensors with deep learning to obtain a new class of deep networks that naturally encode the data dimensions and higher-order correlations. They are more compact and generalize better in many domains. Tensorly is Keras-like frontend to easily use tensor algebraic operations in deep learning with any backend. Moreover, I realized that in practice, simple methods work even if theory cannot explain it. We tried to close this gap by looking for conditions under which simple methods provably succeed, and then experimentally verifying these conditions. For instance, we showed that 1-bit gradient quantization has almost no accuracy loss but has reduced communication requirements for distributed ML, both in theory and in practice. All these projects were executed with an excellent cohort of interns and AWS scientists.

Being at AWS gave me a platform for community outreach to democratize AI. I worked to build partnerships with universities and non-profit organizations. The Caltech-Amazon partnership funded graduate fellowships and cloud credits which is is transforming fundamental scientific research at Caltech. This partnership also resulted in a new AWS office in Pasadena with Stefano Soatto and Pietro Perona at the helm.

I am happy that I got to represent AWS at many prominent avenues, including Deep Learning Indaba 2017, the first pan-African deep learning summit, Mulan forum for Chinese women entrepreneurs, Geekpark forum for startups in China and Shaastra 2018 at IIT Madras, a student run techfest, where we held the largest deep learning workshop in India.

I had the privilege to work with and learn from so many amazing individuals. It was enlightening to hear about the early days of AWS from veteran AWS engineer and team VP Swami Sivasubramanian. I tried to develop new skills from the very best: product management from Joseph Spisak, team management from Craig Wiley, software engineering from Leo Dirac, clear exposition from Zachary Lipton, ML practice from Sunil Mallya, to name a few. Attending MARS and interacting with Jeff Bezos and many other superstars was a big highlight.

I learnt good management principles and business practices at Amazon. Leadership principles is a succinct list of desirable leadership qualities. But some principles are at odds with others, which meant there was a need for balance. For instance, the “dive deep” principle requires time and effort, while “bias for action” calls for expediency. I also found that working backwards from customer needs and having “two pizza” teams resulted in focused discussions with great outcomes. Another effective strategy I learnt was the Press Release (PR)-FAQ. Presentations are banned in Amazon and in order to pitch any new idea one had to start by writing its press release. This entailed having clarity on the product goals and its target customers, right from the beginning. I could see the effectiveness of all these principles and their role in making Amazon the huge success it is today.

To summarize, I am very thankful for the learning experience I had at AWS. In the next post, I will talk about my upcoming plans. Stay tuned!

Below is a slideshow of some of my favorite memories..