My heartfelt apology

I want to wholeheartedly apologize to everyone hurt by my words. I want to assure you that I bear no animosity. I want to be part of an inclusive community where all voices are heard.

I am sorry if my actions/words have ever created a threatening environment. My intention was only to change hearts and minds, and to raise awareness to the struggles that women and minorities face both online and in the real world. I will find better ways to achieve that goal.

I am by no means perfect. I am here to learn from you. I am here to address your concerns. I hope you will join me in my quest to create a healthy and thriving community.

My departure from Twitter

Many of you are very concerned about why my Twitter account is no longer active. I have voluntarily decided to de-activate my account in the interest of my safety and to reduce anxiety for my loved ones. I want to focus on my research and my team where my attention and energy are badly needed.

I want to emphasize that this decision is solely mine. My employers NVIDIA and Caltech are fully supportive of me and my mission. They support employees expressing diverse personal views.

I am proud of the work we have done to promote diversity and inclusion. I encourage you to continue doing that. We are all bright creative minds with an endless potential to innovate. We will find new and safer ways to stay connected and build a better future.

10 things that happened last decade

Got a faculty position at UC Irvine during the recession. Attended N(eur)IPS conference for the first time.
Discovered the power of tensors for machine learning and it became my core research focus.
Received many awards such as Sloan fellowship, Microsoft faculty fellowship and NSF Career Award.
Moved to AWS as principal scientist to build some of the first cloud AI applications. Got to deploy tensor algorithms in production.
Became the youngest named chair professor at Caltech, the highest honor bestowed on individual faculty.
Started the ML research group at NVIDIA while continuing my work at Caltech.
Founded AI4science with Yisong Yue to accelerate interdisciplinary AI research at Caltech.
Shared my #meToo experiences and pushed for NeurIPS name change to help improve the climate for women and minorities in AI.
Having grandkids: academically speaking. Many of my students and mentees became faculty members and formed thriving research groups.
Continue to learn and grow. Lucky to have had amazing opportunities and experiences.

Research Highlights of 2019

2019 was an interesting year in so many ways. I was able to build and solidify research programs, both at NVIDIA and at Caltech. I was able to continue working towards diversity and inclusion in AI, and saw a lot of visible improvements (recent incident I wrote in my previous blog post is a notable exception). Overall, there is a lot of positivity and a great way to end an eventful decade!

Before I list research highlights that I was personally involved in, here’s an overall summary for the AI field from my viewpoint. This was published on KDnuggets.

In 2019, researchers aimed to develop a better understanding of deep learning, its generalization properties, and its failure cases. Reducing dependence on labeled data was a key focus, and methods like self-training gained ground. Simulations became more relevant for AI training and more realistic in visual domains such as autonomous driving and robot learning, including on NVIDIA platforms such as DriveSIM and Isaac. Language models went big, e.g. NVIDIA’s 8 billion Megatron model trained on 512 GPUs, and started producing coherent paragraphs. However, researchers showed spurious correlations and undesirable societal biases in these models. AI regulation went mainstream with many prominent politicians voicing their support for ban of face recognition by Governmental agencies. AI conferences started enforcing a code of conduct and increased their efforts to improve diversity and inclusion, starting with the NeurIPS name change last year. In the coming year, I predict that there will be new algorithmic developments and not just superficial application of deep learning. This will especially impact “AI for science” in many areas such as physics, chemistry, material sciences and biology.

New book on Spectral Learning on Matrices on Tensors

Builds up spectral methods from first principles. Applications to learning latent variable models and deep neural networks. Order your copy here.

Better Optimization Methods

Fixing training in GANs through competitive gradient descent

(Excellent blog post by Florian here). In contrast to standard simultaneous gradient updates, CGD guarantees convergence and is efficient. NeurIPS poster below:

Screen Shot 2020-01-01 at 3.51.05 PM.png

Application of CGD to GAN training and demonstrating its implicit competitive regularization (NeurIPS workshop):

Implicit Competitive Regularization-poster-revised (2)

Guaranteed convergence for SignSGD

SignSGD compresses gradient to a single bit but has no significant loss in accuracy in practice. Theoretically, there are convergence guarantees. Paper. Main theorem:

Screen Shot 2020-01-01 at 1.57.05 PM.png

Generative Models

Exciting collaboration between ML and neuroscience with Doris Tsao at Caltech. Adding feedback generative model to convolutional neural networks significantly improves robustness in tasks such as unsupervised denoising. Short paper here.

cnn-f

Robust Learning in Control Systems

Neural Lander

First work to successfully demonstrate use of deep learning to land drones with stability guarantees. Collaboration under CAST at Caltech. Paper at ICRA 2019.

Robust regression for safe exploration

We address the following: How to extrapolate robustly from your training data in real world control tasks and achieve end to end stability guarantees in safe exploration? Paper. robust-regression

Multi-modal learning for UAV navigation

Multi-modal fusion of vision and IMU improves robustness in navigation and landing. Paper. Screen Shot 2020-01-01 at 12.46.33 PM.png

Generalization in ML

Detecting hard examples through an angular measure

angular

Watch my GTC-DC talk. Angular alignment is a robust measure for hardness: easier examples align more with the target class. We found that correspondence between angular measure and human selection frequency was statistically significant. Improves self training in domain adaptation. Paper.

Regularized learning for domain adaptation

New domain adaptation algorithm to correct for label shifts. Paper

Our ability to fix bias in deep-learning algorithms

Twitter thread here

Screen Shot 2020-01-01 at 11.55.35 AM.png

Neural Programming

Recursive neural networks with external memory

stack-recursive Recursive networks have compositionality and can extrapolate to unseen instances.

Extrapolation to harder instances (higher tree depth) is challenging.

We show that augmenting with external memory stacks significantly improves extrapolation. Paper

Open Vocabulary Learning on Source Code with a Graph–Structured Cache

Use of syntax trees in program code to handle unbounded vocabulary. Paper

Reinforcement Learning and Bandits

Robust off-policy evaluation

Robust methods to handle covariate shift in off-policy evaluation. PaperStochastic Linear Bandits with Hidden Low Rank Structure

Low regret algorithms for discovering hidden low rank structures in bandits. Paper

Competitive Differentiation for Lagrangian Problems

Many RL problems have constraints leading to Lagrangian formulation. Paper Poster_A_Lagrangian_Method_for_Inverse_Problems_in_Reinforcement_Learning

Context-based Meta RL with Structured Latent Space

AI4science and Other Applications

Neural ODEs for Turbulence Forecasting

Turbulence modeling is a notoriously hard problem. Exciting initial results presented at NeurIPS workshop.

Tackling Trolling through Dynamic Keyword Selection Methods

Can we find social media trolls and create more trustworthy environment online? We presented our ongoing study of the #meToo movement, in collaboration with Michael Alvarez from Social sciences at Caltech.

Thank you all!

The Thanksgiving weekend has been a much needed staycation. Given how much I travel rest of the year, I feel lucky to avoid all the weather-related chaos this weekend. I got to even see the rare views of snow-capped mountains juxtaposed with palm trees as I worked from my office at Caltech.

A195CFCA-29BC-4B9E-8C08-2DEEBA3B9420

I got the time to reflect on so many things that I am thankful for. I am lucky to be part of two amazing communities at Caltech and NVIDIA. This year I have deepened my professional relationships at both the places and also made many friends and found wonderful supportive mentors. I am thankful for amazing colleagues from whom I have learnt so much. I have amazing team of researchers at both the places. Their curiosity and passion continues to inspire me.

I am thankful for all the support and encouragement that I have received when I have spoken about the need for better diversity and inclusion in our tech communities. Having allies and raising awareness has been incredibly fulfilling. I am especially thankful to also those who had the humility and honesty to tell me that they were wrong or unaware and that I had changed their mind. I hope you can pass it along. Together we can build a healthy community where everyone can thrive.

I am so thankful to my family. They are my rock and have been so supportive through everything I went through this year. I am thankful to having a new addition to my family: my sister-in-law Prashanthi who displays wisdom and maturity beyond her years.

I should add that it hasn’t always been easy for me to show appreciation and communicate how much I am thankful to everyone. I strive to make things better in all walks of my life, both personal and professional. This means my mind is attuned to finding gaps, calling them out, and trying to fix things. I believe that it is possible to strike a balance: being thankful for the present while striving for a better tomorrow.

Have a great Thanksgiving weekend!

Call for accountability of deployed AI services

Edit: I want to clarify during my tenure at AWS, I did not work on face recognition or was involved in any of the decisions to sell it to law enforcement. I added this after a journalist asked me about this.

About two weeks ago, Timnit Gebru and Margaret (Meg) Mitchell approached me with a request to sign a letter outlining scientific arguments to counter the claims made by Amazon representatives regarding their face recognition service and calling them to stop selling it to law enforcement.

I am one of 26 signatories. This includes many veteran leaders in the community including Yoshua Bengio, one of the Turing award winners this year. So I am in good company 😉 In addition, there are numerous other groups which have called for Amazon to stop selling it to the police.

Joy Buolamwini and Inioluwa Deborah Raji have done amazingly in-depth research on this topic and you can check it out at the gendershades website. So all the credit goes to them for laying this strong foundation.

When I read the letter I was happy to see careful factual arguments being made that are grounded in science. My hope is that the letter opens up a public dialogue on how we can evaluate face recognition (and other AI services), both in terms of metrics, but also the social context in which it is being deployed.

I am a former member of the AWS AI group and I want to clarify I have at most admiration of how AWS has transformed the developer ecosystem. AWS services have removed a lot of “heavy lifting” in DevOps and democratized software development. I am hoping that this letter leads to productive dialogue and we can collectively work towards enhancing the beneficial uses of AI.

Govt. regulation can only come about once we have laid out technical frameworks to evaluate these systems. The gendershades paper shows how our current evaluation metrics are broken, and it starts with imbalanced training data. So we need a variety of different ways to evaluate the system and we need accountability from currently deployed AI services. In short, regulation is only part of the answer but is badly needed.

Update: AWS released a FAQ outlining guidelines of how face recognition should be used. Unfortunately, this does not solve anything. https://aws.amazon.com/rekognition/the-facts-on-facial-recognition-with-artificial-intelligence/

Reproducibility Debate: Should Code Release be Compulsory for Conference Publications?

Update: Added discussions in the end based on Twitter conversations.

Yesterday, I was on the debate team at DALI conference in gorgeous George in South Africa. The topic was:

“DALI believes it is justified for industry researchers not to release code for reproducibility because of proprietary code and dependencies.”

I was opposing the motion, and this matched by personal beliefs. I am happy to talk about my own stance but I cannot disclose the arguments of others, since it was off the records (and their arguments were not necessarily their own personal opinions).

Edit: Uri Shalit and I formed the team opposing the motion. I checked with him to see if he is fine with me mentioning it. We collaboratively came back with the points below.

This topic is timely since ICML 2019 has added reproducibility as one of the factors to be considered by the reviewers. When it first came up, it seemed natural to set standards for reproducibility: the same way we set standards for a publication at our top-tier conferences. However, I was disheartened to see vocal opposition, especially from many “big-name” industry researchers. So with that background, DALI decided to focus the reproducibility debate on industry researchers.

My main reasons for opposing the motion:

Pseudo-code is just not enough: Anyone who has tried to implement an algorithm from another paper knows how terribly frustrating and time consuming it can be. With complex DL algorithms, every tiny detail matters: from hyperparameters to the randomness of the machine. It is another matter that this brittleness of DL is a huge cause of concern. See excellent talk by Joelle Pineau on reproducibility issues in reinforcement learning. In the current peer-review environment, it is nearly impossible to get a paper accepted unless all comparisons are made. I have personally had papers rejected even after we clearly stated that we could not reproduce the results of another paper.
Unfair to academic researchers: The cards are already stacked against academic researchers: they do not have access to vast compute and engineering resources. This is exasperated by the lack of reproducibility. It is grossly unfair to expect a graduate student to reproduce the results of a 100-person engineering team. It is critical to keep academia competitive: we are training the next generation and much of basic research still happens only in academia.
Accountability and fostering healthy environment: As AI gets deployed in the real world, we need to be responsible and accountable. We would not allow new medical drugs into the market without careful trials. The same standards should apply to AI , especially in safety critical applications. It first starts with setting rigorous standards for our research publications. Having accessible code allows the research community to extensively test the claims of the paper. Only then, it can be called legitimate science.
No incentives for voluntary release of code: Jessica Forde gave me some depressing statistics: currently only one third of the papers voluntarily release code. Many argue that making it compulsory is Draconian. I will take Draconian any day if it ensures a fair environment that promotes honest progress. There is also the broader issue that the current review system is broken: fair credit assignment is not ensured and false hype is unfairly rewarded. I am proud how the AI field, industry in particular, has embraced the culture of open sourcing. This is arguably the single most important factor for rapid progress. There is incentive for industries to open source since it allows them to capture a user base. These incentives have a smaller effect on release of individual papers. It is therefore needed to enforce standards.
To increase synergistic impacts of the field: Counter-intuitively, code release will move the field away from leaderboard chasing. When code is readily available, barriers of entry for incremental research are lowered. Researchers are incentivized to do “deeper” investigation of the algorithms. Without this, we are surely headed for the next AI winter.

Countering the arguments that support the motion:

Cannot separate code from internal infrastructure: There exist (admittedly imperfect) solutions such as containerization. But this is a technical problem, and we are good at coming up with solutions for such well-defined problems.
Will drive away industry researchers and will slow down progress of AI: First of all, progress of AI is not just dependent on industry researchers. Let us not have an “us vs. them” mentality. We need both industry and academia to make AI progress. I am personally happy if we can drive away researchers who are not ready to provide evidence for their claims. This will create a much healthier environment and will speed up progress.
Reproducibility is not enough: Certainly! But it is a great first step. As next steps, we need to ensure usable and modular code. We need abstractions that allows for easy repurposing of parts of the code. These are great technical challenges: ones our community is very well equipped to tackle.

Update from Twitter conversations

There was enthusiastic participation on Twitter. A summary below:

Useful tools for reproducibility:

Screen Shot 2019-01-28 at 4.45.56 PM Screen Shot 2019-01-28 at 4.42.29 PM Screen Shot 2019-01-28 at 4.50.25 PM Screen Shot 2019-01-28 at 5.07.10 PM Screen Shot 2019-01-28 at 4.51.04 PM Screen Shot 2019-01-28 at 5.17.36 PM Lessons from other communities:

Screen Shot 2019-01-28 at 5.18.37 PM Screen Shot 2019-01-28 at 4.46.25 PM It is not just about code, but data, replication etc:

Screen Shot 2019-01-28 at 5.16.42 PM Screen Shot 2019-01-28 at 5.08.23 PM Screen Shot 2019-01-28 at 5.17.56 PM Screen Shot 2019-01-28 at 5.20.26 PM Disagreements:

Screen Shot 2019-01-28 at 5.22.31 PM Screen Shot 2019-01-28 at 5.23.24 PM

I assume that the Tweet above does not represent the official position of Deep mind, but I am not surprised.

I do not agree with the premise that it is a worthwhile exercise for others to reinvent the wheel, only to find out it is just vaporware. It is unfair to academia and unfair to graduate students whose careers depend on this.

I also find it ironic that the comment states that if an algorithm is so brittle to hyperparameters we should not trust these results. YES! That is the majority of deep RL results that are hyped up (and we know who the main culprit is).

What happens behind the doors: Even though there is overwhelming public support, I know that such efforts get thwarted in committee meetings of popular conferences like ICML and NeurIPS. We need to apply more pressure to have better accountability.

It is time to burst the bubble on hyped up AI vaporware with no supporting evidence. Let the true science begin!

2018 in Review

This post reviews my experiences in 2018. I welcomed the year in the gorgeous beaches of Goa and am now ending it in the wilderness of South Africa. My highlights of 2018 are the following:

Joining NVIDIA: I joined NVIDIA in September and started a new research group on core AI/ML. I am hiring at full pace and have started many new projects. I am also excited about many new launches from NVIDIA over the last few months:

Rapids: Apache open-source multi-GPU ML library.
Clara: Platform for medical imaging.
Physx: Open source 3D simulation framework.

Honor of being the youngest named chair professor at Caltech: I was one of the six faculty members that Caltech recognized during the 2017-18 academic year. This is the Institute’s most distinguished award for individual faculty.

Launching AWS Ground Truth: Before leaving AWS, I was working on the ground truth service which got launched during ReInvent conference in November. Data is a big barrier to adoption of AI. The availability of private workforce and not just the public crowd on Mturk will be a game changer in many applications. My team did the prototyping and many research projects on active learning, crowdsourcing and building intelligence into the data collection process.

Exciting research directions:

Autonomous Systems: CAST at Caltech was launched in October 2017 to develop foundations for autonomy. This has been an exciting new area of research for me. We got a DARPA Physics of AI project funded that infuses physics into AI algorithms. The first paper to come out of this project has been the neural lander that uses neural networks to improve landing of drones while guaranteeing stability. Check out its videos here.
AI4Science at Caltech: Along with Yuxin Chen and Yisong Yue, I launched AI4Science initiative at Caltech. The goal is to do truly integrated research that brings about new advances in many scientific domains. Some great use cases are high energy physics, earthquake detection, spinal cord therapy etc.
Core ML research: We have pushed for a holistic view of AI as data + algorithms + systems.
- Active learning and crowdsourcing for intelligent data gathering that significantly reduces data requirements.
- Neural rendering model combines generation and prediction in a single model for semi-supervised learning of images.
- SignSGD yields drastic gradient compression with almost no loss in accuracy.
- Symbols + Numbers: Instead of indulging in pointless Twitter debates over which is better, can we just unify both? We combine symbolic expressions and numerical data in a common framework for neural programming.
- Principled approaches in reinforcement learning: We develop efficient Bayesian DQN that improves exploration in high dimensions. We derive new trust-region policy optimization for partially observable models with guaranteed monotonic improvement. We show negative results for combining model-based and model-free RL frameworks.
- Domain adaptation: We derive generalization bounds when there are shifts in label distribution between source and target. This is applicable for AI cloud services where training distribution can have different proportions of categories from the serving distribution.
- Tensorly: The open-source framework that allows you to write tensor algorithms in Python and choosing any of the backends: PyTorch, TensorFlow, NumPy or MxNet. It has many new features now and is now part of PyTorch ecosystem.

On academic job market: My graduating student Kamyar Azzizadenesheli has done ground-breaking work in reinforcement learning (some of which I outlined above). Hire him!

Having grandkids: academically speaking 😉 It is great to see my former student Furong Huang and my former postdoc Rose Yu thrive in their faculty careers.

Outreach and Democratization of AI: It has been very fulfilling to educate the public about AI around the world. I gave my first TEDx talk. I shared the stage with so many luminaries such as his holiness Dalai Lama. It was special to speak to a large crowd of Chinese women entrepreneurs at the Mulan event.

2018 NYTimes GoodTech award: for raising awareness about diversity and inclusion. 2018 has been a defining year for me and for many #womeninTech. A large part of my energy went into fighting vicious sexism in our research communities. It is impossible to distill this into few sentences. I have had to fend off numerous pushbacks, trolls and threats. But the positive part has been truly uplifting: countless women have hugged me and said that I am speaking on their behalf. I have found numerous male allies who have pledged to fight sexism and racism.

I want to end the year in a positive light. I hope for a great 2019! I know it is not going to be easy, but I won’t give up. Stay strong and fight for what you truly believe in!

AI4Science @Caltech

AI4science is a new initiative launched at Caltech that aims to broaden the impact of AI and ML across all areas of sciences. The inaugural workshop was held on Aug. 1st. My student Jeremy Bernstein wrote a detailed article on the workshop. The slides of the talks are also available there.

https://sites.google.com/view/ai-for-science-workshop/about-ai4science

A short blurb of the article: Across science—from astrophysics to molecular biology to economics—a common problem persists: scientists are overwhelmed by the sheer amount of data they are collecting. But this problem might be better viewed as an opportunity, since with appropriate computing resources and algorithmic tools, scientists might hope to unlock insights from these swathes of data to carry their field forward. AI4science is a new initiative at Caltech aiming to bring together computer scientists with experts in other disciplines. While somewhat of a suitcase term, AI or artificial intelligence here means the combination of machine learning algorithms with large compute resources.

Professor Yisong Yue of Caltech’s Computing & Mathematical Sciences department (CMS) gave the first talk, where he gave a general overview of machine learning algorithms and their relevance across science and engineering. Professor Andrew Stuart, also in CMS, gave the talk following Professor Yue. Stuart discussed his interest in fusing data science techniques with known physical law. Frederick Eberhardt, professor of philosophy, spoke next. He discussed his work on causal inference. Professor Anima Anandkumar of the CMS department was the last computer scientist to speak. Anandkumar gave an overview of a successful machine learning technique known as artificial neural networks, which have dramatically improved the ability of computers to understand images and natural language. Anandkumar also spoke about tensor methods in machine learning. The remainder of the day was devoted to talks from scientists who have had success applying machine learning techniques in their respective fields. For more, check out the AI4science website.

My first blog post

Over the years, I have followed many good academic blogs, e.g. Scott Aaronson, Moritz Hardt, Zachary Lipton, Lior Pachter etc. Until now I didn’t think I had time for blogging. May be one of the luxuries of tenure is to make room for things I always wanted to do and this is one of them 🙂

Besides writing has never been my strongest suite. My verbal skills developed much later than my mathematical skills. At the age of 3, my mom tells me that I could barely speak but could solve lots of puzzles. Apparently my grandma had written up the calendar for the next 10 years that I had memorized and could name the day for any date. At school, I struggled with essays. Fast forward, in grad school I was frustrated that papers had to be written and that just deriving math equations and proofs was not enough. Math was my native language and everything else felt foreign.

This didn’t change until a few years ago when I learnt the importance of writing and communication the hard way. I interviewed at most of the top schools and amidst a hasty schedule, I did not polish my presentations or gather thoughts on how to communicate my research to people outside my field. I did not get any offers, even though there was strong initial interest. This forced me to think how I should communicate complex research to general audience. Richard Feynman was an inspiration during my teenage years and I went back to looking at how he structured his lectures. I also had Zachary Lipton work with me. He wrote beautiful prose and I learnt a lot from him.

I am now thankful that I spent time to improve my verbal and presentation skills because it has opened up so many doors. At Amazon, I have interacted with product managers, marketing people, engineers with no ML background and customers. This requires providing customizing the explanations and listening to their concerns. I realize how critical these skills are for me to be effective. These days I insist that all my students take courses on communication and be able to write well. I will give them editorial comments, but I will not write on their behalf. I keep emphasizing the importance of clear communication.

This blog is an attempt to get my thoughts out to the world. You will hear more in the coming days.