2020 has been an exciting time for DL frameworks and the AI stacks. We have seen more consolidation of frameworks into platforms that are domain-specific such as NVIDIA Omniverse and NVIDIA Clara. We have seen better abstractions in the AI stack that helps democratize AI and enable rapid prototyping and testing such Pytorch Lightning.
Below are some frameworks that my team at NVIDIA has been involved in building.
TensorLy-Torch is a PyTorch only library that builds on top of TensorLy and provides out-of-the-box tensor layers to replace matrix layers in any neural network. Link
Tensorize all layers of a neural network: This includes Factorized convolutions fully-connected layers and more!
Initialization: initializing tensor decompositions can be tricky since default parameters for matrix layers are not optimal. We provide good defaults to initialize using our tltorch.init module. Alternatively, you can initialize to fit the pretrained matrix layer.
Tensor hooks: you can easily augment your architectures with our built-in hooks. Robustify your network with Tensor Dropout. Automatically select the rank end-to-end with L1 Regularization.
Methods and model zoo: we are always adding more methods and models to make it easy to compare the performance of various deep tensor-based methods!
Minkowski Engine is an auto-differentiation library for sparse tensors. It supports all standard neural network layers such as convolution, pooling, and broadcasting operations for sparse tensors. Popular architectures include 3D and higher-order vision problems such as semantic segmentation, reconstruction, and detection. Link
Unlimited high-dimensional sparse tensor support
All standard neural network layers (Convolution, Pooling, Broadcast, etc.)
Dynamic computation graph
Custom kernel shapes
Multi-threaded kernel map
Highly-optimized GPU kernels
End-to-end Reinforcement Learning on GPUs with NVIDIA Isaac Gym
We are excited about the preview release of Isaac Gym – NVIDIA’s physics simulation environment for reinforcement learning research that dramatically speeds up training. These environments are physically valid allowing for an efficient sim-to-real transfer. These include a robotic arm, legged robots, deformable objects, and humanoids. Blog
Stay tuned for more in 2021! Here’s looking forward to exciting developments in AI in the new year.
Embodied AI is the union of “mind” (AI) and “body” (robotics). To achieve this, we need robust learning methods that can be embedded into control systems with safety and stability guarantees. Many of our recent works are advancing these goals on both theoretical and practical fronts.
My journey into this area of learning and control started with the neural lander. We used deep learning to learn the aerodynamic ground effects in drones. This led to improved landing speed without sacrificing stability requirements. In a subsequent work, we aimed to automate the collection of drone data while staying safe.
We employed robust regression methods with guaranteed uncertainty bounds that guarantees safety even outside of the training domain. This allows the drone to progressively land faster while maintaining safety (i.e. not crashing). Our method trains a density-ratio estimator that accurately predicts the ability to maintain safety at higher speeds. This is based on the principle of adversarial risk minimization, that has also shown gains in sim-to-real generalization in computer vision (post 2).
We employed our method on a simulator, built with data collected from real drones. Our method is superior to popular Gaussian process (GP) method for uncertainty quantification and leads to faster exploration while maintaining safety. This is because GPs are brittle in high dimensions due to poor choice of kernels/priors.
The ability to explore safely can now be combined with downstream trajectory planning methods in control. It allows us to propagate uncertainty bounds from robust regression and we pose it as chance constraints for planning methods. Thus, we can compute a pool of safe and information-rich trajectories.
The episodic learning framework is applied to the robotic spacecraft model to explore the state space and learn the friction under collision constraints. We show a significant reduction in variance of the learned model predictions and the number of collisions using robust regression models.
Reinforcement learning in control systems
Analyzing RL in control systems is challenging due to the following reasons: (1) state and action spaces are continuous (2) safety and stability requirements (3) partial observability.
A canonical setting is the linear quadratic Gaussian (LQG) that involves linear dynamics evolution and linear transformation of the hidden state to yield observations with Gaussian noise. LQG appears deceptively simple, but is notoriously challenging to analyze.
Previous methods focused on open loop control which uses random excitation (i.e. actions) to collect measurements for model estimation. However, this yields a regret of T^0.66 which is not optimal, where T is the number of time steps. Paper
Our method is the first closed-loop RL method with guaranteed regret bounds. In closed-loop control, the past measurements are all correlated with the control actions which makes it challenging to estimate the model parameters. We utilize tools from classical control theory (predictive form) to guarantee consistent estimation of the model parameters. This yields an improved regret bound of T^0.5. Paper
Surprisingly, we can do better in terms of the regret bound. We showed that combining online learning with episodic updates can lead to logarithmic regret. Intuitively, we decouple adaptive learning of model parameters (episodic updates) with online learning of control policy. This combination allows us to achieve fast learning (with low regret) in closed-loop control. Paper
Generative models have greatly advanced over the last few years. We can now generate images and text that pass the Turing test: at the first glance, they look considerably realistic.
A major unsolved challenge is the ability to control the generative process. We would like specify attributes or style codes for image generation; we would like to shape the narrative of text generation. We have made progress with both these goals and will describe them in this post.
Large pretrained language models like GPT can generate long paragraphs of text. However, these models are uncontrollable and make mistakes like common-sense errors, repetition, and inconsistency. In a recent paper, we add ability to dynamically control text generation using keywords and it also incorporates an external knowledge base. Our framework consists of a keyword generator, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator. Results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity. PaperSlides
Disentanglement Learning in StyleGAN
Disentanglement learning is crucial for obtaining disentangled representations and controllable generation. Current disentanglement methods face several inherent limitations: difficulty with high-resolution images, primarily on learning disentangled representations, and non-identifiability due to the unsupervised setting. To alleviate these limitations, we design new architectures and loss functions based on StyleGAN (Karras et al., 2019), for semi-supervised high-resolution disentanglement learning.
We create two complex high-resolution synthetic datasets for systematic testing. We investigate the impact of limited supervision and find that using only 0.25%~2.5% of labeled data is sufficient for good disentanglement on both synthetic and real datasets.
We propose new metrics to quantify generator controllability, and observe there may exist a crucial trade-off between disentangled representation learning and controllable generation. We also consider semantic fine-grained image editing to achieve better generalization to unseen images. Project page
Fourier neural operator solves complex PDEs such as turbulent fluid flows, and Orbnet solves quantum chemistry calculations showing 1000x speedups over traditional solvers while maintaining fidelity.
One of the exciting breakthroughs of 2020 is the Fourier neural operator. Neural operators learn mappings from problem specification (e.g. initial and boundary conditions of a PDE) to the solution operator in infinite dimensional spaces. This means that there is no dependence on the resolution or grid of sample points. This allows neural operator to do zero-shot super-resolution, i.e. be able to evaluate at higher resolution and at arbitrary points compared to the training data. None of the previous approaches using deep learning for solving PDEs have this capability.
We show that our method can solve Navier Stokes PDE in the turbulent regime: the first result for a deep learning system. Blog
In a related paper, we proposed an alternative framework for solving large-scale fluid flow problems. Meshfree flownet performs physically-valid super-resolution of fluid dynamics at scale (experiments were run on CORI cluster with 128 V100 GPUs) Project page
Quantum chemistry is the study of chemical properties and processes at the quantum scale. It has been pivotal for research and discovery in modern chemistry. However, as powerful as quantum chemistry has shown itself to be, it also has a big drawback: Accurate calculations are resource-intensive and time consuming, with routine chemical studies involving computations that take days or longer.
We developed Orbnet: a deep-learning based calculator of quantum properties that preserves the fidelity of traditional solvers while obtaining 1000x speed ups. Orbnet combines domain-specific knowledge (molecular orbitals) with the flexibility of deep learning (graph neural networks). This hybrid model allows for transferability to much larger molecules (more than 10x) compared to molecules used for training Orbnet. We also show that Orbnet provides powerful representations for molecular properties and can be directly used for predicting them. News article
In this post, I will focus on the new optimization methods we proposed in 2020. Simple gradient-based methods such as SGD and Adam remain the “workhorses” for training standard neural networks. However, we find many instances where more sophisticated and principled approaches beat these baselines and show promising results.
Employing standard optimization techniques such as SGD and Adam for training in low precision systems leads to severe degradation as the bit width is reduced. Instead, we propose a co-design framework where we jointly design the bit representation and optimization algorithm.
We draw inspiration from how our own brains represent information: there is strong evidence that it uses a logarithmic number system. This system can efficiently handle a large dynamic range even with low bitwidth. We propose a new optimization method MADAM for directly optimizing in the logarithmic number system. This is a multiplicative weight update version of the popular Adam method. We show that it obtains state-of-art performance in low bitwidth training, often without any learning rate tuning. Thus, Madam can directly train compressed neural networks where the weights are efficiently represented in a logarithmic number system. Paper
This extends single-agent optimization to multiple agents with their own objective functions. It has applications ranging from constrained optimization to generative adversarial networks (GANs) and multi-agent reinforcement learning (MARL).
We introduce competitive gradient descent (CGD) as a natural generalization of gradient descent (GD). In GD, each agent updates based on their own gradients, and there is no interaction among the players. In contrast, CGD incorporates interactions among the players by posing it as the Nash equilibrium of a local bilinear approximation of their objectives. This reduces to having a preconditioner based on the mixed Hessian function. This is efficient to implement using conjugate gradient (CG) updates. We see that CGD successfully converges in all instances of games where GD is unstable and exhibits oscillatory behavior. Blog
We further used the game-theoretic intuitions CGD to study dynamics in GAN training. There is a delicate balance between generator and discriminator capabilities to obtain the best performance. If the discriminator becomes too powerful on the training data, it will reject all samples outside of training leading to pathological solutions. We show that this pathology is prevented due to simultaneous training of both agents, and we term this as implicit competitive regularization (ICR). We observe that CGD strengthens ICR and prevents oscillatory behavior, and thus improves GAN training. Blog
Distributional shifts are common in real-world problems, e.g. often simulation data is used to train in data-limited domains. Standard neural networks cannot handle such large domain shifts. They also lack uncertainty quantification: they tend to be overconfident when they make errors.
A common approach for unsupervised domain adaptation is self training. Here, the model trained on source domain is fine-tuned on the target samples using self-generated labels (and hence, the name). Accurate uncertainty quantification (UQ) is critical here: we should only be selecting target labels with high confidence for self training. Otherwise it will lead to catastrophic failure.
We propose a distributionally robust learning (DRL) framework for accurate UQ. It is an adversarial risk minimization framework that leads to a joint training with an additional neural network – a density ratio estimator. This is obtained through a discriminative network that classifies the source and target domains. The density-ratio estimator prevents the model from being overconfident on target inputs far away from the source domain. We see significantly better calibration and improvement in domain adaptation on VisDA-17. Paper
Saliency map for model (DRST) compared to baselines for self-training. Density ratio of source to target. A lower density ratio indicates a lower confidence.
In a previous project, we proposed another simple measure hardness of samples, termed as angular visual hardness (AVH). This score that does not need any additional computation or model capacity. We saw improved self-training performance compared to baseline softmax score for confidence. Project
Another key ingredient for improved synthetic-to-real generalization involves domain distillation and automated layer-wise learning rates. We propose an Automated Synthetic-to-real Generalization (ASG) framework by formulating it as a lifelong learning problem with a pre-trained model on real images (e.g. Imagenet). Since it does not require any extra training loop other than synthetic training, it can be conveniently used as a drop-in module to many applications involving synthetic training. Project
Combining ASG with density-ratio estimator yields state-of-art results on unsupervised domain adaptation. Paper
Fair ML: Handling Imbalanced Datasets
It is common to have imbalanced training datasets where certain attributes (e.g. darker skin tone) are not well represented. We tackle a general framework that can handle any arbitrary distributional shift in the label proportions between training and testing data. Simple approaches to handle label shift involve class balanced sampling or incorporating importance weights. However, we show that neither is optimal. We proposal a new method that optimally combines these methods and balances the bias introduced from class-balanced sampling and the variance due to importance weighting. Paper
In the next post, I will be highlighting some important contributions to optimization methods.
2020 has been an unprecedented year. There has been too much suffering around the world. I salute the brave frontline workers who have risked their lives to tackle this raging pandemic. Amidst all the negativity and toxicity in online social media, it is easy to miss many positive outcomes of 2020.
Personally, 2020 has been a year of many exciting research breakthroughs for me and my collaborators at NVIDIA and Caltech. We are grateful to have this opportunity to focus on our research. Here are some important highlights.
Concept learning and compositionality: We developed a new benchmark Bongard-LOGO for human-level concept learning and reasoning. Our benchmark captures three core properties of human perception: 1) context-dependent perception, in which the same object has disparate interpretations, given different contexts; 2) analogy-making perception, in which some meaningful concepts are traded off for other meaningful concepts; and 3) perception with a few samples but infinite vocabulary.
Our evaluations show that state-of-the-art deep learning methods perform substantially worse than human subjects, implying that they fail to capture core human cognition properties. Significantly, the neuro-symbolic method has the best performance across all the tests, implying the need for symbolic reasoning for efficient concept learning. Project
Conscious AI: Adding Feedback to Feedforward Neural Networks It is hypothesized that the human brain derives consciousness with a top-down feedback mechanism that incorporates a generative model of the world. Inspired by this, we design a principled approach t adding a coupled generative recurrent feedback into feedforward neural networks. This vastly improves adversarial robustness, even when there is no explicit adversarial training. Paper.Blog.
Adaptive learning: Generalizable AI requires the ability to quickly adapt to changing environments. We designed practical hierarchical reinforcement learning (RL) in legged robots that can adapt to new environments and tasks, not available during training. The training is carried out with NVIDIA Flex simulation environment that is physically valid and GPU accelerated. We adopted a hierarchical RL framework where the high-level controller learns to choose from a set of primitives in response to changes in the environment and a low-level controller that utilizes an established control method to robustly execute the primitives.
The model can easily transfer to a real-life robot without sophisticated randomization or adaption schemes due to this hierarchical design and having a curriculum of tasks during training. The designed controller is up to 85% more energy-efficient and is more robust compared to baseline methods. Blog
Real-world tasks often have a compositional structure that contains a sequence of simpler sub-tasks. We proposed a multi-task RL framework OCEAN to perform online task inference for compositional tasks. Here, the current task composition is estimated from the agent’s past experiences with probabilistic inference. We model global and local context variables in a joint latent space, where the global variables represent a mixture of sub-tasks that constitute the given task, while the local variables capture the transitions between the sub-tasks. Our framework supports flexible latent distributions based on prior knowledge of the task structure and can be trained in an unsupervised manner. Experimental results show that the proposed framework provides more effective task inference with sequential context adaptation and thus leads to a performance boost on complex, multi-stage tasks. Project
Causallearning: Being able to identify cause and effects is at the core of human cognition. This allows us to extrapolate to entirely new unseen scenarios and reason about them. We proposed the first framework that is able to learn causal structural dependencies directly from videos without any supervision on the ground-truth graph structure. This model combines unsupervised keypoint-based representation with causal graph discovery and graph-based dynamics learning. Experiments demonstrate that our model can correctly identify the interactions from a short sequence of images and make long-term future predictions on out of distribution interventions and counterfactuals. Project
I am very happy to share the news that I am joining NVIDIA as Director of Machine Learning Research. I will be based in the Santa Clara HQ and will be hiring ML researchers and engineers at all levels, along with graduate interns.
I will be continuing my role as Bren professor at Caltech and will be dividing my time between northern and southern California. I look forward to building strong intellectual relationships between NVIDIA and Caltech. There are many synergies with initiatives at Caltech such as the Center for Autonomous Systems (CAST) and AI4science.
I found NVIDIA to be a natural fit and it stood out among other opportunities. I chose NVIDIA because of its track record, its pivotal role in the deep-learning revolution, and the people I have interacted with. I will be reporting to Bill Dally, the chief scientist of NVIDIA. In addition to Bill, there is a rich history of academic researchers at NVIDIA such as Jan Kautz, Steve Keckler, Joel Emer, and recent hires Dieter Fox and Sanja Fidler. They have created a nourishing environment that blends research with strong engineering. I am looking forward to working with CEO Jensen Huang, whose vision for research I find inspiring.
The deep-learning revolution would not have happened without NVIDIA’s GPUs. The latest Volta GPUs pack an impressive 125 teraFLOPS and have fueled developments in diverse areas. The recently released NVIDIA Tesla T4 GPU is the world’s most advanced inference accelerator and NVIDIA GeForce represents the biggest leap in performance for graphics rendering since it is the world’s first real-time ray tracing GPU.
As many of you know, NVIDIA is much more than a hardware company. The development of CUDA libraries at NVIDIA has been a critical component for scaling up deep learning. The CUDA primitives are also relevant to my research on tensors. I worked with NVIDIA researcher Cris Cecka to build extended BLAS kernels for tensor contraction operations a few years ago. I look forward to building more support for tensor algebraic operations in CUDA which can lead to more efficient tensorized neural network architectures.
I recently exited out of my role as principal scientist at Amazon Web Services (AWS). In this blog post, I want to recollect the rich learning experiences I had and the amazing things we accomplished over the last two years.
This was my first “industry” job out of academia. I chose AWS for several reasons. I saw huge potential to democratize AI, since AWS is the most comprehensive and broadly adopted cloud platform. Two years ago, cloud AI was still an uncharted territory, and this made it an exciting adventure. I was also attracted by the mandate to bring applied AI research to AWS.
My early days were exciting and busy. We were a new team in bay area with a startup-like environment, while still being connected with the Seattle team and the larger Amazon ecosystem. There was a steep learning curve to understand all the AWS services, software engineering practices, product management etc, and I loved it. We were busy growing the team, designing new AI services, and thinking about research directions, all at the same time.
I am proud of what we accomplished over the last two years. We launched a vast array of AI services at all levels of the stack. At the bottom are the compute instances — the latest GPU instances are the powerful NVIDIA Tesla V100s. The middle layer consists of SageMaker, a fully managed service with high-performance dockerized ML algorithms, and Deeplens, the first deep learning camera with seamless AWS integration. The top layer includes services for computer vision, natural language processing, speech recognition and so on. We made large strides in customer adoption and today AWS has the largest number of customer references for cloud ML services. In addition, the AWS ML lab provides advanced solutions for custom use cases. I got to interact with many customers and it was eye-opening to learn about real-world AI deployment in diverse domains.
I was most closely involved in the design, development and launch of SageMaker. Its broad adoption led to AWS increasing its ML user base by more than 250 percent over the last year. SageMaker removes heavy lifting, complexity, and guesswork from each step of productionizing ML. It was personally fulfilling to build topic modeling on SageMaker (and AWS comprehend) based on my academic research, which uses tensor decompositions. SageMaker topic-modeling automatically categorizes documents at scale and is several times faster than any other (open-source) framework. Check out my talk at ReInvent for more details. Taking the tensor algorithm from its theoretical roots to an AWS production service was a big highlight for me.
It was exciting to grow applied research at AWS. I looked for problems that posed the biggest obstacles in the real world. The “data problem” is the proverbial “elephant in the room”. While researchers work on established benchmark datasets, most of the time and effort in the real world is in data collection and clean up. We developed and tested efficient deep active learning, crowdsourcing and semi-supervised learning methods in a number of domains. We found that deep networks can be trained with significantly less data (~25%). For an overview, check out tutorial slides at UAI. I was also happy to connect my earlier research on tensors with deep learning to obtain a new class of deep networks that naturally encode the data dimensions and higher-order correlations. They are more compact and generalize better in many domains. Tensorly is Keras-like frontend to easily use tensor algebraic operations in deep learning with any backend. Moreover, I realized that in practice, simple methods work even if theory cannot explain it. We tried to close this gap by looking for conditions under which simple methods provably succeed, and then experimentally verifying these conditions. For instance, we showed that 1-bit gradient quantization has almost no accuracy loss but has reduced communication requirements for distributed ML, both in theory and in practice. All these projects were executed with an excellent cohort of interns and AWS scientists.
Being at AWS gave me a platform for community outreach to democratize AI. I worked to build partnerships with universities and non-profit organizations. The Caltech-Amazon partnership funded graduate fellowships and cloud credits which is is transforming fundamental scientific research at Caltech. This partnership also resulted in a new AWS office in Pasadena with Stefano Soatto and Pietro Perona at the helm.
I had the privilege to work with and learn from so many amazing individuals. It was enlightening to hear about the early days of AWS from veteran AWS engineer and team VP Swami Sivasubramanian. I tried to develop new skills from the very best: product management from Joseph Spisak, team management from Craig Wiley, software engineering from Leo Dirac, clear exposition from Zachary Lipton, ML practice from Sunil Mallya, to name a few. Attending MARS and interacting with Jeff Bezos and many other superstars was a big highlight.
I learnt good management principles and business practices at Amazon. Leadership principles is a succinct list of desirable leadership qualities. But some principles are at odds with others, which meant there was a need for balance. For instance, the “dive deep” principle requires time and effort, while “bias for action” calls for expediency. I also found that working backwards from customer needs and having “two pizza” teams resulted in focused discussions with great outcomes. Another effective strategy I learnt was the Press Release (PR)-FAQ. Presentations are banned in Amazon and in order to pitch any new idea one had to start by writing its press release. This entailed having clarity on the product goals and its target customers, right from the beginning. I could see the effectiveness of all these principles and their role in making Amazon the huge success it is today.
To summarize, I am very thankful for the learning experience I had at AWS. In the next post, I will talk about my upcoming plans. Stay tuned!
Below is a slideshow of some of my favorite memories..
At MARS 2017
At MARS 2017
With Jeff Bezos, Swami Sivasubramanian, Ralf Herbrich, Hassan Sawaf.
At MARS 2017
Operating the giant robot
At ReInvent 2017
At ReInvent 2017
At ReInvent 2017
With Zachary Lipton
At Roborace, ReInvent 2017
WIth Werner Vogels, Swami Sivasubramanian, Tom Soderstrom, Sunil Mallya