Top-10 Things in 2022

1. Neural Operators Accelerating Science: 2022 was an excellent year for neural operators, on the heels of being featured as a highlight in Math and Computer Science by Quanta Magazine in 2021. Neural operator learns mappings between function spaces, which makes them discretization-invariant, meaning they can work on any discretization of inputs and converge to a limit upon mesh refinement. Once the neural operators are trained, they can be evaluated at any resolution without re-training. In contrast, standard neural networks severely degrade in performance when the resolution of test data is different from that of training. In addition, we have extended it to a physics-informed neural operator that can exploit both data and physical laws and constraints for reducing modeling errors and enabling effective generalization.

We developed and deployed neural operators for a range of scientific domains. Neural operators have now accelerated carbon capture and storage (CCS) modeling and weather forecasting: 4-5 orders of magnitude faster while maintaining the same fidelity as numerical methods. These are immensely important for dealing with climate change. This speed enables the creation of ensembles with thousands of members, leading to improved uncertainty quantification.

We also employed neural operators for inverse problems, such as mask optimization in lithography. Inverse problems with traditional solvers require many iterations of the forward model, while neural operators require only one evaluation due to their differentiability. We extended neural operators to incorporate arbitrary geometries and used them for the inverse design of airfoils with ~100,000x speedup. We also used the neural operator to learn the long-term behavior of chaotic systems. Chaotic systems are inherently unpredictable since small changes in initial conditions lead to diverging trajectories. However, in many popular systems, such as fluid flows, reproducible statistical properties, such as energy spectrum and auto-correlation, exist over long-term trajectories. We used the neural operator to learn the transition kernel of chaotic systems and used the dissipative loss to stabilize the evolution while being able to recover the invariant statistical measure successfully.

2. Digital Biology: In 2021, our AI algorithms enabled an unprecedented understanding of the coronavirus. We used AI methods to model aerosolized coronavirus and deployed an unprecedented billion-atom system, one of the largest biochemical systems ever modeled at the atomic level. Our speedup of quantum-mechanical calculations using AI methods enabled this large-scale biological simulation for the first time. We also used neural operators to model the replication dynamics of the coronavirus as it invades the cells in a host. Both these papers were recognized as finalists for the Association for Computing Machinery (ACM) Gordon Bell Special Prize for Covid Research in 2021.

Continuing the streak, this year, we trained the largest biological foundation model for genome-scale modeling of the coronavirus with the ability to predict new variants of concern. We won the 2022 ACM Gordon Bell Special Prize for HPC-Based COVID-19 Research. Previously, variants of concern needed to be identified by individually going through every protein and mapping each mutation to see if any mutations were of interest, which is labor and time intensive. Our model makes this much cheaper and faster, allowing us to be more agile in dealing with current and future pandemics. As seen in the figure below, our diffusion-based hierarchical model we developed has better generation. (Fig.A) Comparison of statistics measured on generated sequences and on real data. (Fig. B) Generated sequences (light blue) from the model overlaid on the phylogenetic tree.

We developed large-scale foundation models for molecular understanding. Our retrieval-based molecular generative model adapts to new design criteria flexibly. It uses a small set of exemplar molecules that may only partially meet the design objectives and fuses this information with pre-trained molecular generative models to generalize to new tasks in a zero-shot manner. We optimize the binding affinity of eight existing, weakly-binding drugs for COVID-19 treatment under multiple design criteria. We also developed dynamic protein-ligand structure prediction with multiscale diffusion models and applied it to systems where the presence of ligands significantly alters protein-folding landscapes. We also developed multi-modal models by jointly learning molecules’ chemical structures and textual descriptions via a contrastive learning strategy. This enables challenging zero-shot tasks based on textual instructions such as structure-text retrieval and molecule editing.

3. Closing the Decision-Making Loop with Generalist AI Agents: 2022 was a breakthrough year for generalist AI where flexible transformer architectures can solve diverse tasks. We focused on using pre-trained foundation models for open-ended decision-making. Traditionally, reinforcement learning has focused on a single objective function and relied on billions of iterations, which makes it infeasible to adapt quickly to new open-ended tasks. We built the Minedojo framework in a Minecraft environment that can take text instructions for open-ended tasks using internet-scale knowledge of videos, wiki, and text information to train multi-modal models to aid fast reinforcement learning. It was recognized as an outstanding paper at NeurIPS 2022. We also built the VIMA benchmark for diverse robotic tasks.

4. Generative AI Unleashing Creativity: 2022 was the year of diffusion models enabling large-scale text-to-image generation. However, one downside of diffusion models is that it is still very slow to sample. For instance, when I used the Lensa AI app to generate personalized images, it took hours. On the other hand, many practical applications like robotics or autonomous driving would need faster than real-time generation. The reason diffusion models are slow is their need to simulate trajectories of the diffusion process from easy-to-sample distributions, such as the Gaussian distribution, and gradually transform it to the distribution of interest, such as natural images. These trajectories can be described by Ordinary Differential Equations (ODE). Using neural network surrogates such as neural operators to replace ODE solvers can significantly speed up sampling diffusion models, as seen in our recent work. We show that just one function evaluation. is sufficient to achieve the best image generation quality. In contrast, standard diffusion models require hundreds to thousands of evaluations to generate high-quality images.

5. Robust Vision and Video Understanding: We showed generative models improve the performance of downstream tasks, e.g., by enhancing robustness through denoising. We also developed robust vision transformer architecture, fully attention networks (FAN), with channel-based attention for robustness. We won the Semantic Segmentation Tracking of Robust Vision Challenge at ECCV. We adopted the SegFormer head on a pre-trained FAN model. We created the first large-scale radiance field dataset for perception, PeRFception, which consists of object-centric and scene-centric scans for classification and segmentation. It shows a significant memory compression rate from the original dataset while containing both 2D and 3D information in a unified form. We also developed a minimalist video segmentation architecture that outperforms on challenging occluded object benchmarks. It uses only frame-based training and does not rely on any specialized video architectures or training procedures.

6. Optimization: We developed the first fully deterministic weight initialization procedure for training any neural network. Neural networks are usually initialized with random weights, with adequately selected initial variance to ensure stable signal propagation during training. However, selecting the appropriate variance becomes challenging, especially as the number of layers grows. Our scheme, ZerO, initializes networks’ weights with only zeros and ones based on identity and Hadamard transforms. ZerO achieves state-of-art accuracy and can train ultra-deep networks without batch normalization, has low-rank learning trajectories that result in low-rank and sparse solutions, and improves training reproducibility.

7. Trustworthy AI: At NVIDIA, we enabled and enhanced model cards to enhance transparency and accountability. We augmented the original model card with model-specific information concerning bias, explainability, privacy, safety, and security. We also explored the limits of detoxifying large language models using self-generated inputs filtered for toxicity. We also developed efficient active and transfer learning methods for pre-trained language models without fine-tuning. Labeling for custom dimensions of toxicity and social bias is challenging and labor-intensive. We propose an Active Transfer Few-shot Instructions (ATF) approach, which leverages the internal linguistic knowledge of pre-trained language models to facilitate the transfer of information from existing pre-labeled datasets with minimum labeling effort on unlabeled target data.

8. Surgical AI: This involves assessing the skill of surgeons predicting patient outcomes, and discovering novel surgeon biomarkers based on multi-modal data and deep learning algorithms. The modalities are also diverse and can include video and audio of live surgical cases, as well as virtual reality, and the kinematics from a surgical robot. We developed surgical gestures as a building block of surgery and AI methods based on gesture sequences to predict 12-month functional outcomes after surgery. We also showed how real-time feedback significantly enhances surgical performance.

9. Team Accomplishments and Partnerships: I am incredibly blessed to be surrounded by amazing people at Caltech and NVIDIA, as well as collaborators in many other institutions. Zhuoran Qiao successfully defended in December 2022 and has been a leading force in AI for chemistry. Zongyi Li won the NVIDIA fellowship, as well as the PIMCO fellowship, along with Pan Xu. At Caltech, we initiated research partnerships with TII and Activision. My former student, Furong Huang, was recognized by MIT Technology Review as an Under 35 Asia Pacific 2022 Honoree.

10. Personal Triumphs: I was honored to be hosted by Prof. Terence Tao at UCLA as a distinguished speaker. I was also so happy to be featured on the Quanta (pic below) and AAAS websites. Our documentary 10kCasts was a Webby Award Honoree.

And perhaps the most important of all, I was so happy to celebrate my wedding to my soulmate Benedikt Jenik at the Caltech Athenaeum, surrounded by friends and family and officiated by my role model Frances Arnold.

Anima + Benedikt Wedding

Top-10 AI Research Highlights of 2021

AI4Science in the limelight: We developed AI methods for simulating complex multi-scale phenomena such as weather, materials, etc. with orders of magnitude speed-ups. Our core technique, Fourier Neural Operator (FNO), was recently featured as a highlight of math and computer science advances in 2021 by Quanta Magazine. It was also featured in the GTC Fall keynote and the IamAI video, released by NVIDIA. In addition, we advanced drug discovery through Orbnet that predicts quantum-mechanical properties with thousands of times speedup. Both these techniques were part of publications that were finalists for the Gordon-Bell special prize for Covid-19 research.

Quantum ML research taking roots: Our foray into quantum research was aided by our research on tensor methods for deep learning, since tensor networks represent quantum systems efficiently. We developed a new algorithm for quantum optimization that halved the quantum resources needed to solve classical optimization such as maxcut, which implies an exponential reduction for simulating them on GPUs. We partnered with the NVIDIA cuQuantum team and established a world-record for large-scale simulation of a successful, nonlocal quantum optimization algorithm, and open-sourced Tensorly-Quantum.

Trustworthy AI no longer just a wish list: Forbes predicts that trustworthy AI will be operationalized in the next year. Trustworthy AI has many facets: improving uncertainty calibration, auditing AI models and improving robustness. We improved the robustness of AI models through various approaches: certifying robustness, balancing diversity and hardness of data augmentations, and enhancing robustness of 3D vision. We demonstrated that language models can detect different kinds of social biases without any re-training, when supplied with a small number of labeled examples.

Metaverse calling: We continue to develop large-scale reinforcement and interactive learning algorithms using efficient GPU-accelerated physically-valid simulations. We developed rapidly adaptive robot manipulation methods in NVIDIA Isaac Gym that combined physics and learning effectively. We developed efficient skill chaining methods, visual reinforcement learning, morphology co-design, uncertainty quantification and stability-aware RL methods.

No more supervision: 99% of computer vision teams have had a ML project canceled due to insufficient training data, according to a survey. We have made strides in weak and self-supervised learning (Discobox) that are competitive with supervised learning methods. We have also developed efficient controllable generation methods that can do zero-shot composition of attributes. Gartner predicts that by 2024, synthetic data will account for 60% of all data used in AI development. We also developed efficient methods for automatic camera calibration using neural radiance field (NERF) models.

Transformers transforming vision and language: The Cambrian explosion of transformer architectures continued this year, with a focus on harder tasks and multimodal domains. We developed Segformer for semantic and panoptic segmentation with SOTA performance and efficiency, which is being used by multiple teams across the company. We enabled linear efficiency in self-attention layers using long-short decomposition and adaptive Fourier neural operator.

Bridging the gap with biological intelligence: Humans are capable of zero-shot generalization, and can handle long-tailed distributions. We developed efficient controllable generation methods that can do zero-shot composition of attributes. We showed that simple memory-recall strategies enable efficient long-tailed object detection. Such capabilities have been framed as formative AI intellect by Gartner.

Hardware efficiency: We co-designed both a quantization scheme and an AI training method to obtain energy efficiency and accuracy. The logarithmic number system (LNS) provides a high dynamic range, but the non-linear quantization gaps make it challenging to train using standard methods such as SGD. Instead, we employed multiplicative updates that are able to train AI models directly in LNS using just 8 bits, without requiring any full-precision copies. This resulted in no accuracy loss and 90% reduction in energy.

Hub of collaborations: Grateful to be supported by an amazing network of collaborators across multiple institutions in a wide array of domains. We are excited about the announcement of Earth-2 that will enable NVIDIA to partner with researchers in climate science globally.

Personal touch: 2021 has been a greatly fulfilling year on both personal and professional fronts. I spent the beginning of the year in Hawaii, a dream playground, where I got to swim every evening into the sunset after my meetings. I started weight training and was surprised at being able to lift my own body weight! Focusing on my physical and spiritual health has greatly enhanced my creativity and productivity. During the latter half of the year, I got to attend some events in person. A highlight was a trip to CERN where I got to tour the particle accelerator and the antimatter factory; my interview, Stealing theorist’s lunch, was published in the CERN Courier magazine. I got to participate in an unusual documentary that featured our fishing trip at Jackson Hole where we collected snapshots of casts of fly fishing and trained AI to predict good casts. I also participated in latenightIT show, hosted by the Emmy nominated Baratunde Thurston. Here’s looking forward to new adventures in 2022!

2020 AI Research Highlights: Learning Frameworks (part 7)

2020 has been an exciting time for DL frameworks and the AI stacks. We have seen more consolidation of frameworks into platforms that are domain-specific such as NVIDIA Omniverse and NVIDIA Clara. We have seen better abstractions in the AI stack that helps democratize AI and enable rapid prototyping and testing such Pytorch Lightning.

Below are some frameworks that my team at NVIDIA has been involved in building.

This is part of the blog series on 2020 research highlights. You can read other posts for research highlights on generalizable AI (part 1), handling distributional shifts (part 2), optimization for deep learning (part 3), AI4science (part 4), controllable generation (part 5), learning and control (part 6).

Announcing Tensorly-Torch

TensorLy-Torch is a PyTorch only library that builds on top of TensorLy and provides out-of-the-box tensor layers to replace matrix layers in any neural network. Link

Tensorize all layers of a neural network: This includes Factorized convolutions fully-connected layers and more!
Initialization: initializing tensor decompositions can be tricky since default parameters for matrix layers are not optimal. We provide good defaults to initialize using our tltorch.init module. Alternatively, you can initialize to fit the pretrained matrix layer.
Tensor hooks: you can easily augment your architectures with our built-in hooks. Robustify your network with Tensor Dropout. Automatically select the rank end-to-end with L1 Regularization.
Methods and model zoo: we are always adding more methods and models to make it easy to compare the performance of various deep tensor-based methods!

Minkowski Engine

Minkowski Engine is an auto-differentiation library for sparse tensors. It supports all standard neural network layers such as convolution, pooling, and broadcasting operations for sparse tensors. Popular architectures include 3D and higher-order vision problems such as semantic segmentation, reconstruction, and detection. Link

Unlimited high-dimensional sparse tensor support
All standard neural network layers (Convolution, Pooling, Broadcast, etc.)
Dynamic computation graph
Custom kernel shapes
Multi-GPU training
Multi-threaded kernel map
Multi-threaded compilation
Highly-optimized GPU kernels

End-to-end Reinforcement Learning on GPUs with NVIDIA Isaac Gym

We are excited about the preview release of Isaac Gym – NVIDIA’s physics simulation environment for reinforcement learning research that dramatically speeds up training. These environments are physically valid allowing for an efficient sim-to-real transfer. These include a robotic arm, legged robots, deformable objects, and humanoids. Blog

Stay tuned for more in 2021! Here’s looking forward to exciting developments in AI in the new year.

2020 AI Research Highlights: Learning and Control (part 6)

Embodied AI is the union of “mind” (AI) and “body” (robotics). To achieve this, we need robust learning methods that can be embedded into control systems with safety and stability guarantees. Many of our recent works are advancing these goals on both theoretical and practical fronts.

Safe Exploration and Planning

My journey into this area of learning and control started with the neural lander. We used deep learning to learn the aerodynamic ground effects in drones. This led to improved landing speed without sacrificing stability requirements. In a subsequent work, we aimed to automate the collection of drone data while staying safe.

Safe landing

Aggresive landing

We employed robust regression methods with guaranteed uncertainty bounds that guarantees safety even outside of the training domain. This allows the drone to progressively land faster while maintaining safety (i.e. not crashing). Our method trains a density-ratio estimator that accurately predicts the ability to maintain safety at higher speeds. This is based on the principle of adversarial risk minimization, that has also shown gains in sim-to-real generalization in computer vision (post 2).

We employed our method on a simulator, built with data collected from real drones. Our method is superior to popular Gaussian process (GP) method for uncertainty quantification and leads to faster exploration while maintaining safety. This is because GPs are brittle in high dimensions due to poor choice of kernels/priors.

The ability to explore safely can now be combined with downstream trajectory planning methods in control. It allows us to propagate uncertainty bounds from robust regression and we pose it as chance constraints for planning methods. Thus, we can compute a pool of safe and information-rich trajectories.

Learning methods with accurate uncertainty bounds enable safe trajectory planning

The episodic learning framework is applied to the robotic spacecraft model to explore the state space and learn the friction under collision constraints. We show a significant reduction in variance of the learned model predictions and the number of collisions using robust regression models.

Reinforcement learning in control systems

Analyzing RL in control systems is challenging due to the following reasons: (1) state and action spaces are continuous (2) safety and stability requirements (3) partial observability.

A canonical setting is the linear quadratic Gaussian (LQG) that involves linear dynamics evolution and linear transformation of the hidden state to yield observations with Gaussian noise. LQG appears deceptively simple, but is notoriously challenging to analyze.

Previous methods focused on open loop control which uses random excitation (i.e. actions) to collect measurements for model estimation. However, this yields a regret of T^0.66 which is not optimal, where T is the number of time steps. Paper

Our method is the first closed-loop RL method with guaranteed regret bounds. In closed-loop control, the past measurements are all correlated with the control actions which makes it challenging to estimate the model parameters. We utilize tools from classical control theory (predictive form) to guarantee consistent estimation of the model parameters. This yields an improved regret bound of T^0.5. Paper

Surprisingly, we can do better in terms of the regret bound. We showed that combining online learning with episodic updates can lead to logarithmic regret. Intuitively, we decouple adaptive learning of model parameters (episodic updates) with online learning of control policy. This combination allows us to achieve fast learning (with low regret) in closed-loop control. Paper

2020 AI Research Highlights: Controllable Generation (part 5)

Generative models have greatly advanced over the last few years. We can now generate images and text that pass the Turing test: at the first glance, they look considerably realistic.

A major unsolved challenge is the ability to control the generative process. We would like specify attributes or style codes for image generation; we would like to shape the narrative of text generation. We have made progress with both these goals and will describe them in this post.

You can read previous posts for other research highlights: generalizable AI (part 1), handling distributional shifts (part 2), optimization for deep learning (part 3), AI4science (part 4), learning and control (part 6), learning framework (part 7).

Controllable Text Generation: Megatron-CTRL

Large pretrained language models like GPT can generate long paragraphs of text. However, these models are uncontrollable and make mistakes like common-sense errors, repetition, and inconsistency. In a recent paper, we add ability to dynamically control text generation using keywords and it also incorporates an external knowledge base. Our framework consists of a keyword generator, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator. Results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity. Paper Slides

Disentanglement Learning in StyleGAN

Disentanglement learning is crucial for obtaining disentangled representations and controllable generation. Current disentanglement methods face several inherent limitations: difficulty with high-resolution images, primarily on learning disentangled representations, and non-identifiability due to the unsupervised setting. To alleviate these limitations, we design new architectures and loss functions based on StyleGAN (Karras et al., 2019), for semi-supervised high-resolution disentanglement learning.

We create two complex high-resolution synthetic datasets for systematic testing. We investigate the impact of limited supervision and find that using only 0.25%~2.5% of labeled data is sufficient for good disentanglement on both synthetic and real datasets.

We propose new metrics to quantify generator controllability, and observe there may exist a crucial trade-off between disentangled representation learning and controllable generation. We also consider semantic fine-grained image editing to achieve better generalization to unseen images. Project page

There is still more to come!

2020 AI Research Highlights: AI4Science (part 4)

2020 has been a landmark year for AI4science. I have had the privilege to work with some of the world’s best experts in a number of challenging scientific domains.

You can read previous posts for other research highlights: generalizable AI (part 1), handling distributional shifts (part 2), optimization for deep learning (part 3), controllable generation (part 5), learning and control (part 6), learning framework (part 7).

Fourier neural operator solves complex PDEs such as turbulent fluid flows, and Orbnet solves quantum chemistry calculations showing 1000x speedups over traditional solvers while maintaining fidelity.

AI4PDE

One of the exciting breakthroughs of 2020 is the Fourier neural operator. Neural operators learn mappings from problem specification (e.g. initial and boundary conditions of a PDE) to the solution operator in infinite dimensional spaces. This means that there is no dependence on the resolution or grid of sample points. This allows neural operator to do zero-shot super-resolution, i.e. be able to evaluate at higher resolution and at arbitrary points compared to the training data. None of the previous approaches using deep learning for solving PDEs have this capability.

We show that our method can solve Navier Stokes PDE in the turbulent regime: the first result for a deep learning system. Blog

In a related paper, we proposed an alternative framework for solving large-scale fluid flow problems. Meshfree flownet performs physically-valid super-resolution of fluid dynamics at scale (experiments were run on CORI cluster with 128 V100 GPUs) Project page

AI4QuantumChemistry

Quantum chemistry is the study of chemical properties and processes at the quantum scale. It has been pivotal for research and discovery in modern chemistry. However, as powerful as quantum chemistry has shown itself to be, it also has a big drawback: Accurate calculations are resource-intensive and time consuming, with routine chemical studies involving computations that take days or longer.

We developed Orbnet: a deep-learning based calculator of quantum properties that preserves the fidelity of traditional solvers while obtaining 1000x speed ups. Orbnet combines domain-specific knowledge (molecular orbitals) with the flexibility of deep learning (graph neural networks). This hybrid model allows for transferability to much larger molecules (more than 10x) compared to molecules used for training Orbnet. We also show that Orbnet provides powerful representations for molecular properties and can be directly used for predicting them. News article

Molecular Orbitals

Stay tuned for more!

2020 AI Research Highlights: Optimization for Deep Learning (part 3)

In this post, I will focus on the new optimization methods we proposed in 2020. Simple gradient-based methods such as SGD and Adam remain the “workhorses” for training standard neural networks. However, we find many instances where more sophisticated and principled approaches beat these baselines and show promising results.

You can read previous posts for other research highlights: generalizable AI (part 1), handling distributional shifts (part 2), AI4science (part 4), controllable generation (part 5), learning and control (part 6), learning framework (part 7).

Low-Precision Training

Employing standard optimization techniques such as SGD and Adam for training in low precision systems leads to severe degradation as the bit width is reduced. Instead, we propose a co-design framework where we jointly design the bit representation and optimization algorithm.

We draw inspiration from how our own brains represent information: there is strong evidence that it uses a logarithmic number system. This system can efficiently handle a large dynamic range even with low bitwidth. We propose a new optimization method MADAM for directly optimizing in the logarithmic number system. This is a multiplicative weight update version of the popular Adam method. We show that it obtains state-of-art performance in low bitwidth training, often without any learning rate tuning. Thus, Madam can directly train compressed neural networks where the weights are efficiently represented in a logarithmic number system. Paper

Madam preserves performance even under low bitwidth

Competitive optimization

This extends single-agent optimization to multiple agents with their own objective functions. It has applications ranging from constrained optimization to generative adversarial networks (GANs) and multi-agent reinforcement learning (MARL).

We introduce competitive gradient descent (CGD) as a natural generalization of gradient descent (GD). In GD, each agent updates based on their own gradients, and there is no interaction among the players. In contrast, CGD incorporates interactions among the players by posing it as the Nash equilibrium of a local bilinear approximation of their objectives. This reduces to having a preconditioner based on the mixed Hessian function. This is efficient to implement using conjugate gradient (CG) updates. We see that CGD successfully converges in all instances of games where GD is unstable and exhibits oscillatory behavior. Blog

We further used the game-theoretic intuitions CGD to study dynamics in GAN training. There is a delicate balance between generator and discriminator capabilities to obtain the best performance. If the discriminator becomes too powerful on the training data, it will reject all samples outside of training leading to pathological solutions. We show that this pathology is prevented due to simultaneous training of both agents, and we term this as implicit competitive regularization (ICR). We observe that CGD strengthens ICR and prevents oscillatory behavior, and thus improves GAN training. Blog

A pathological discriminator that overfits to training data.

CGD obtains best FID score without the need for explicit regularization

Stay tuned for more!

2020 AI Research Highlights: Handling distributional shifts (part 2)

Distributional shifts are common in real-world problems, e.g. often simulation data is used to train in data-limited domains. Standard neural networks cannot handle such large domain shifts. They also lack uncertainty quantification: they tend to be overconfident when they make errors.

You can read previous posts for other research highlights: generalizable AI (part 1), optimization (part 3), AI4science (part 4), controllable generation (part 5), learning and control (part 6), learning framework (part 7).

A common approach for unsupervised domain adaptation is self training. Here, the model trained on source domain is fine-tuned on the target samples using self-generated labels (and hence, the name). Accurate uncertainty quantification (UQ) is critical here: we should only be selecting target labels with high confidence for self training. Otherwise it will lead to catastrophic failure.

We propose a distributionally robust learning (DRL) framework for accurate UQ. It is an adversarial risk minimization framework that leads to a joint training with an additional neural network – a density ratio estimator. This is obtained through a discriminative network that classifies the source and target domains. The density-ratio estimator prevents the model from being overconfident on target inputs far away from the source domain. We see significantly better calibration and improvement in domain adaptation on VisDA-17. Paper

Saliency map for model (DRST) compared to baselines for self-training.
Density ratio of source to target. A lower density ratio indicates a lower confidence.

In a previous project, we proposed another simple measure hardness of samples, termed as angular visual hardness (AVH). This score that does not need any additional computation or model capacity. We saw improved self-training performance compared to baseline softmax score for confidence. Project

Another key ingredient for improved synthetic-to-real generalization involves domain distillation and automated layer-wise learning rates. We propose an Automated Synthetic-to-real Generalization (ASG) framework by formulating it as a lifelong learning problem with a pre-trained model on real images (e.g. Imagenet). Since it does not require any extra training loop other than synthetic training, it can be conveniently used as a drop-in module to many applications involving synthetic training. Project

Learning without forgetting for synthetic to real generalization

Combining ASG with density-ratio estimator yields state-of-art results on unsupervised domain adaptation. Paper

Fair ML: Handling Imbalanced Datasets

It is common to have imbalanced training datasets where certain attributes (e.g. darker skin tone) are not well represented. We tackle a general framework that can handle any arbitrary distributional shift in the label proportions between training and testing data. Simple approaches to handle label shift involve class balanced sampling or incorporating importance weights. However, we show that neither is optimal. We proposal a new method that optimally combines these methods and balances the bias introduced from class-balanced sampling and the variance due to importance weighting. Paper

Example of label shift in binary classification (stars and dots). Our method is optimal since it combines subsampling with importance weighting for bias-variance tradeoff.

In the next post, I will be highlighting some important contributions to optimization methods.

2020 AI Research Highlights: Generalizable AI (part 1)

2020 has been an unprecedented year. There has been too much suffering around the world. I salute the brave frontline workers who have risked their lives to tackle this raging pandemic. Amidst all the negativity and toxicity in online social media, it is easy to miss many positive outcomes of 2020.

Personally, 2020 has been a year of many exciting research breakthroughs for me and my collaborators at NVIDIA and Caltech. We are grateful to have this opportunity to focus on our research. Here are some important highlights.

In the first part of this blog series, I will focus on generalizable AI algorithms while the subsequent posts will highlight ML methods, optimization, domain-specific AI algorithms, and DL frameworks. Check out the other posts here: handling distributional shifts (part 2), optimization for deep learning (part 3), AI4science (part 4), controllable generation (part 5), learning and control (part 6), learning frameworks (part 7).

Generalizable AI Highlights:

Concept learning and compositionality: We developed a new benchmark Bongard-LOGO for human-level concept learning and reasoning. Our benchmark captures three core properties of human perception: 1) context-dependent perception, in which the same object has disparate interpretations, given different contexts; 2) analogy-making perception, in which some meaningful concepts are traded off for other meaningful concepts; and 3) perception with a few samples but infinite vocabulary.

Our evaluations show that state-of-the-art deep learning methods perform substantially worse than human subjects, implying that they fail to capture core human cognition properties. Significantly, the neuro-symbolic method has the best performance across all the tests, implying the need for symbolic reasoning for efficient concept learning. Project

Conscious AI: Adding Feedback to Feedforward Neural Networks It is hypothesized that the human brain derives consciousness with a top-down feedback mechanism that incorporates a generative model of the world. Inspired by this, we design a principled approach t adding a coupled generative recurrent feedback into feedforward neural networks. This vastly improves adversarial robustness, even when there is no explicit adversarial training. Paper. Blog.

Adaptive learning: Generalizable AI requires the ability to quickly adapt to changing environments. We designed practical hierarchical reinforcement learning (RL) in legged robots that can adapt to new environments and tasks, not available during training. The training is carried out with NVIDIA Flex simulation environment that is physically valid and GPU accelerated. We adopted a hierarchical RL framework where the high-level controller learns to choose from a set of primitives in response to changes in the environment and a low-level controller that utilizes an established control method to robustly execute the primitives.

The model can easily transfer to a real-life robot without sophisticated randomization or adaption schemes due to this hierarchical design and having a curriculum of tasks during training. The designed controller is up to 85% more energy-efficient and is more robust compared to baseline methods. Blog

Real-world tasks often have a compositional structure that contains a sequence of simpler sub-tasks. We proposed a multi-task RL framework OCEAN to perform online task inference for compositional tasks. Here, the current task composition is estimated from the agent’s past experiences with probabilistic inference. We model global and local context variables in a joint latent space, where the global variables represent a mixture of sub-tasks that constitute the given task, while the local variables capture the transitions between the sub-tasks. Our framework supports flexible latent distributions based on prior knowledge of the task structure and can be trained in an unsupervised manner. Experimental results show that the proposed framework provides more effective task inference with sequential context adaptation and thus leads to a performance boost on complex, multi-stage tasks. Project

OCEAN is able to adapt to new goals in different stages

Causal learning: Being able to identify cause and effects is at the core of human cognition. This allows us to extrapolate to entirely new unseen scenarios and reason about them. We proposed the first framework that is able to learn causal structural dependencies directly from videos without any supervision on the ground-truth graph structure. This model combines unsupervised keypoint-based representation with causal graph discovery and graph-based dynamics learning. Experiments demonstrate that our model can correctly identify the interactions from a short sequence of images and make long-term future predictions on out of distribution interventions and counterfactuals. Project

Our method is able to predict on new cloth configurations due to causal learning.

Stay tuned for more!

My heartfelt apology

I want to wholeheartedly apologize to everyone hurt by my words. I want to assure you that I bear no animosity. I want to be part of an inclusive community where all voices are heard.

I am sorry if my actions/words have ever created a threatening environment. My intention was only to change hearts and minds, and to raise awareness to the struggles that women and minorities face both online and in the real world. I will find better ways to achieve that goal.

I am by no means perfect. I am here to learn from you. I am here to address your concerns. I hope you will join me in my quest to create a healthy and thriving community.