AI4Science in the limelight: We developed AI methods for simulating complex multi-scale phenomena such as weather, materials, etc. with orders of magnitude speed-ups. Our core technique, Fourier Neural Operator (FNO), was recently featured as a highlight of math and computer science advances in 2021 by Quanta Magazine. It was also featured in the GTC Fall keynote and the IamAI video, released by NVIDIA. In addition, we advanced drug discovery through Orbnet that predicts quantum-mechanical properties with thousands of times speedup. Both these techniques were part of publications that were finalists for the Gordon-Bell special prize for Covid-19 research.
Quantum ML research taking roots: Our foray into quantum research was aided by our research on tensor methods for deep learning, since tensor networks represent quantum systems efficiently. We developed a new algorithm for quantum optimization that halved the quantum resources needed to solve classical optimization such as maxcut, which implies an exponential reduction for simulating them on GPUs. We partnered with the NVIDIA cuQuantum team and established a world-record for large-scale simulation of a successful, nonlocal quantum optimization algorithm, and open-sourced Tensorly-Quantum.
Trustworthy AI no longer just a wish list: Forbes predicts that trustworthy AI will be operationalized in the next year. Trustworthy AI has many facets: improving uncertainty calibration, auditing AI models and improving robustness. We improved the robustness of AI models through various approaches: certifying robustness, balancing diversity and hardness of data augmentations, and enhancing robustness of 3D vision. We demonstrated that language models can detect different kinds of social biases without any re-training, when supplied with a small number of labeled examples.
No more supervision: 99% of computer vision teams have had a ML project canceled due to insufficient training data, according to a survey. We have made strides in weak and self-supervised learning (Discobox) that are competitive with supervised learning methods. We have also developed efficient controllable generation methods that can do zero-shot composition of attributes. Gartner predicts that by 2024, synthetic data will account for 60% of all data used in AI development. We also developed efficient methods for automatic camera calibration using neural radiance field (NERF) models.
Transformers transforming vision and language: The Cambrian explosion of transformer architectures continued this year, with a focus on harder tasks and multimodal domains. We developed Segformer for semantic and panoptic segmentation with SOTA performance and efficiency, which is being used by multiple teams across the company. We enabled linear efficiency in self-attention layers using long-short decomposition and adaptive Fourier neural operator.
Bridging the gap with biological intelligence: Humans are capable of zero-shot generalization, and can handle long-tailed distributions. We developed efficient controllable generation methods that can do zero-shot composition of attributes. We showed that simple memory-recall strategies enable efficient long-tailed object detection. Such capabilities have been framed as formative AI intellect byGartner.
Hardware efficiency: We co-designed both a quantization scheme and an AI training method to obtain energy efficiency and accuracy. The logarithmic number system (LNS) provides a high dynamic range, but the non-linear quantization gaps make it challenging to train using standard methods such as SGD. Instead, we employed multiplicative updates that are able to train AI models directly in LNS using just 8 bits, without requiring any full-precision copies. This resulted in no accuracy loss and 90% reduction in energy.
Hub of collaborations: Grateful to be supported by an amazing network of collaborators across multiple institutions in a wide array of domains. We are excited about the announcement of Earth-2 that will enable NVIDIA to partner with researchers in climate science globally.
Personal touch: 2021 has been a greatly fulfilling year on both personal and professional fronts. I spent the beginning of the year in Hawaii, a dream playground, where I got to swim every evening into the sunset after my meetings. I started weight training and was surprised at being able to lift my own body weight! Focusing on my physical and spiritual health has greatly enhanced my creativity and productivity. During the latter half of the year, I got to attend some events in person. A highlight was a trip to CERN where I got to tour the particle accelerator and the antimatter factory; my interview, Stealing theorist’s lunch, was published in the CERN Courier magazine. I got to participate in an unusual documentary that featured our fishing trip at Jackson Hole where we collected snapshots of casts of fly fishing and trained AI to predict good casts. I also participated in latenightIT show, hosted by the Emmy nominated Baratunde Thurston. Here’s looking forward to new adventures in 2022!
2020 has been an unprecedented year. There has been too much suffering around the world. I salute the brave frontline workers who have risked their lives to tackle this raging pandemic. Amidst all the negativity and toxicity in online social media, it is easy to miss many positive outcomes of 2020.
Personally, 2020 has been a year of many exciting research breakthroughs for me and my collaborators at NVIDIA and Caltech. We are grateful to have this opportunity to focus on our research. Here are some important highlights.
Concept learning and compositionality: We developed a new benchmark Bongard-LOGO for human-level concept learning and reasoning. Our benchmark captures three core properties of human perception: 1) context-dependent perception, in which the same object has disparate interpretations, given different contexts; 2) analogy-making perception, in which some meaningful concepts are traded off for other meaningful concepts; and 3) perception with a few samples but infinite vocabulary.
Our evaluations show that state-of-the-art deep learning methods perform substantially worse than human subjects, implying that they fail to capture core human cognition properties. Significantly, the neuro-symbolic method has the best performance across all the tests, implying the need for symbolic reasoning for efficient concept learning. Project
Conscious AI: Adding Feedback to Feedforward Neural Networks It is hypothesized that the human brain derives consciousness with a top-down feedback mechanism that incorporates a generative model of the world. Inspired by this, we design a principled approach t adding a coupled generative recurrent feedback into feedforward neural networks. This vastly improves adversarial robustness, even when there is no explicit adversarial training. Paper.Blog.
Adaptive learning: Generalizable AI requires the ability to quickly adapt to changing environments. We designed practical hierarchical reinforcement learning (RL) in legged robots that can adapt to new environments and tasks, not available during training. The training is carried out with NVIDIA Flex simulation environment that is physically valid and GPU accelerated. We adopted a hierarchical RL framework where the high-level controller learns to choose from a set of primitives in response to changes in the environment and a low-level controller that utilizes an established control method to robustly execute the primitives.
The model can easily transfer to a real-life robot without sophisticated randomization or adaption schemes due to this hierarchical design and having a curriculum of tasks during training. The designed controller is up to 85% more energy-efficient and is more robust compared to baseline methods. Blog
Real-world tasks often have a compositional structure that contains a sequence of simpler sub-tasks. We proposed a multi-task RL framework OCEAN to perform online task inference for compositional tasks. Here, the current task composition is estimated from the agent’s past experiences with probabilistic inference. We model global and local context variables in a joint latent space, where the global variables represent a mixture of sub-tasks that constitute the given task, while the local variables capture the transitions between the sub-tasks. Our framework supports flexible latent distributions based on prior knowledge of the task structure and can be trained in an unsupervised manner. Experimental results show that the proposed framework provides more effective task inference with sequential context adaptation and thus leads to a performance boost on complex, multi-stage tasks. Project
Causallearning: Being able to identify cause and effects is at the core of human cognition. This allows us to extrapolate to entirely new unseen scenarios and reason about them. We proposed the first framework that is able to learn causal structural dependencies directly from videos without any supervision on the ground-truth graph structure. This model combines unsupervised keypoint-based representation with causal graph discovery and graph-based dynamics learning. Experiments demonstrate that our model can correctly identify the interactions from a short sequence of images and make long-term future predictions on out of distribution interventions and counterfactuals. Project