The views expressed here are solely my own and do not in any way reflect those of my employers.
This blog post is meant to clarify my position on the recent OpenAI controversy. A few days ago, I engaged with Jack Clark, who manages communications and policy for OpenAI, on Twitter. It is hard to have a nuanced discussion on Twitter and I am writing this blog post to better summarize my thoughts. For a longer and more thorough discussion on this topic, see excellent blog posts by Rob Munro and Zack Lipton.
The controversy: OpenAI released their language model a few days ago with a huge media blitz
My Twitter comments:
When OpenAI was started a few years with much fanfare, its core mission was to foster openness in AI. As a non-profit, it was meant to freely collaborate with other institutions and researchers by making its patents and research open to the public. I find this goal highly admirable and important.
I also have a deep admiration for Jack Clark. His newsletter has been a great resource for the community to keep up with the latest updates in machine learning. In the past, he has pushed for more openness from the ML community. When the NeurIPS conference banned journalists from attending the workshops, he protested on Twitter and I supported his stance.
On the other hand, OpenAI seems to be making a conscious effort to move away from this open model and from its core founding principles. A few months ago, Jack Clark wrote this on Twitter:
So why do I feel so strongly about this? Because I think that OpenAI is using its clout to make ML research more closed and inaccessible. I have always been a strong proponent for open source and for increasing reproducibility and accountability in our ML community. I am pushing to make it compulsory in our machine-learning conferences. See my recent blog post on this topic.
I am certainly not dismissive of AI risks. It is important to have a conversation about it and it is important to involve experts working on this topic. But for several reasons, I believe that OpenAI is squandering an opportunity to have a real conversation and distorting the views to the public. Some of the reasons are:
- OpenAI is severely playing up the risks of releasing a language model. This is an active area of research with numerous groups working on very similar ideas. Even if OpenAI kept the whole thing locked up in a vault, another team would certainly release a similar model.
- In this whole equation, it is academia that loses out the most. I have previously spoken about the severe disadvantage that academic researchers face due to lack of reproducibility and open source code. They do not have the luxury of a large amount of compute and engineering resources for replication.
- This kind of fear-mongering about AI risks distorts science to the public. OpenAI followed a planned media strategy to provide limited access to their model to a few journalists and fed them with a story about AI risks without any concrete proofs. This is not science and does not serve humanity well.
A better approach would be to:
- Go back to the founding mission and foster openness and collaboration. Engage with researchers, especially academic researchers; collaborate with them, provide them the resources and engage in the peer-review process. This is the time-tested way to advance science.
- Engage with experts on risk management to study the impacts of AI. Engage with economists to study the right incentive mechanisms to design for deployment of AI. Publish those studies in peer-reviewed venues.
Numerous other scientists have expressed a similar opinion. I hope OpenAI takes this feedback and acts on it.
Update: Added discussions in the end based on Twitter conversations.
Yesterday, I was on the debate team at DALI conference in gorgeous George in South Africa. The topic was:
“DALI believes it is justified for industry researchers not to release code for reproducibility because of proprietary code and dependencies.”
I was opposing the motion, and this matched by personal beliefs. I am happy to talk about my own stance but I cannot disclose the arguments of others, since it was off the records (and their arguments were not necessarily their own personal opinions).
Edit: Uri Shalit and I formed the team opposing the motion. I checked with him to see if he is fine with me mentioning it. We collaboratively came back with the points below.
This topic is timely since ICML 2019 has added reproducibility as one of the factors to be considered by the reviewers. When it first came up, it seemed natural to set standards for reproducibility: the same way we set standards for a publication at our top-tier conferences. However, I was disheartened to see vocal opposition, especially from many “big-name” industry researchers. So with that background, DALI decided to focus the reproducibility debate on industry researchers.
My main reasons for opposing the motion:
- Pseudo-code is just not enough: Anyone who has tried to implement an algorithm from another paper knows how terribly frustrating and time consuming it can be. With complex DL algorithms, every tiny detail matters: from hyperparameters to the randomness of the machine. It is another matter that this brittleness of DL is a huge cause of concern. See excellent talk by Joelle Pineau on reproducibility issues in reinforcement learning. In the current peer-review environment, it is nearly impossible to get a paper accepted unless all comparisons are made. I have personally had papers rejected even after we clearly stated that we could not reproduce the results of another paper.
- Unfair to academic researchers: The cards are already stacked against academic researchers: they do not have access to vast compute and engineering resources. This is exasperated by the lack of reproducibility. It is grossly unfair to expect a graduate student to reproduce the results of a 100-person engineering team. It is critical to keep academia competitive: we are training the next generation and much of basic research still happens only in academia.
- Accountability and fostering healthy environment: As AI gets deployed in the real world, we need to be responsible and accountable. We would not allow new medical drugs into the market without careful trials. The same standards should apply to AI , especially in safety critical applications. It first starts with setting rigorous standards for our research publications. Having accessible code allows the research community to extensively test the claims of the paper. Only then, it can be called legitimate science.
- No incentives for voluntary release of code: Jessica Forde gave me some depressing statistics: currently only one third of the papers voluntarily release code. Many argue that making it compulsory is Draconian. I will take Draconian any day if it ensures a fair environment that promotes honest progress. There is also the broader issue that the current review system is broken: fair credit assignment is not ensured and false hype is unfairly rewarded. I am proud how the AI field, industry in particular, has embraced the culture of open sourcing. This is arguably the single most important factor for rapid progress. There is incentive for industries to open source since it allows them to capture a user base. These incentives have a smaller effect on release of individual papers. It is therefore needed to enforce standards.
- To increase synergistic impacts of the field: Counter-intuitively, code release will move the field away from leaderboard chasing. When code is readily available, barriers of entry for incremental research are lowered. Researchers are incentivized to do “deeper” investigation of the algorithms. Without this, we are surely headed for the next AI winter.
Countering the arguments that support the motion:
- Cannot separate code from internal infrastructure: There exist (admittedly imperfect) solutions such as containerization. But this is a technical problem, and we are good at coming up with solutions for such well-defined problems.
- Will drive away industry researchers and will slow down progress of AI: First of all, progress of AI is not just dependent on industry researchers. Let us not have an “us vs. them” mentality. We need both industry and academia to make AI progress. I am personally happy if we can drive away researchers who are not ready to provide evidence for their claims. This will create a much healthier environment and will speed up progress.
- Reproducibility is not enough: Certainly! But it is a great first step. As next steps, we need to ensure usable and modular code. We need abstractions that allows for easy repurposing of parts of the code. These are great technical challenges: ones our community is very well equipped to tackle.
Update from Twitter conversations
There was enthusiastic participation on Twitter. A summary below:
Useful tools for reproducibility:
Lessons from other communities:
It is not just about code, but data, replication etc:
I assume that the Tweet above does not represent the official position of Deep mind, but I am not surprised.
I do not agree with the premise that it is a worthwhile exercise for others to reinvent the wheel, only to find out it is just vaporware. It is unfair to academia and unfair to graduate students whose careers depend on this.
I also find it ironic that the comment states that if an algorithm is so brittle to hyperparameters we should not trust these results. YES! That is the majority of deep RL results that are hyped up (and we know who the main culprit is).
What happens behind the doors: Even though there is overwhelming public support, I know that such efforts get thwarted in committee meetings of popular conferences like ICML and NeurIPS. We need to apply more pressure to have better accountability.
It is time to burst the bubble on hyped up AI vaporware with no supporting evidence. Let the true science begin!
This post reviews my experiences in 2018. I welcomed the year in the gorgeous beaches of Goa and am now ending it in the wilderness of South Africa. My highlights of 2018 are the following:
Joining NVIDIA: I joined NVIDIA in September and started a new research group on core AI/ML. I am hiring at full pace and have started many new projects. I am also excited about many new launches from NVIDIA over the last few months:
- Rapids: Apache open-source multi-GPU ML library.
- Clara: Platform for medical imaging.
- Physx: Open source 3D simulation framework.
Honor of being the youngest named chair professor at Caltech: I was one of the six faculty members that Caltech recognized during the 2017-18 academic year. This is the Institute’s most distinguished award for individual faculty.
Launching AWS Ground Truth: Before leaving AWS, I was working on the ground truth service which got launched during ReInvent conference in November. Data is a big barrier to adoption of AI. The availability of private workforce and not just the public crowd on Mturk will be a game changer in many applications. My team did the prototyping and many research projects on active learning, crowdsourcing and building intelligence into the data collection process.
Exciting research directions:
- Autonomous Systems: CAST at Caltech was launched in October 2017 to develop foundations for autonomy. This has been an exciting new area of research for me. We got a DARPA Physics of AI project funded that infuses physics into AI algorithms. The first paper to come out of this project has been the neural lander that uses neural networks to improve landing of drones while guaranteeing stability. Check out its videos here.
- AI4Science at Caltech: Along with Yuxin Chen and Yisong Yue, I launched AI4Science initiative at Caltech. The goal is to do truly integrated research that brings about new advances in many scientific domains. Some great use cases are high energy physics, earthquake detection, spinal cord therapy etc.
- Core ML research: We have pushed for a holistic view of AI as data + algorithms + systems.
- Active learning and crowdsourcing for intelligent data gathering that significantly reduces data requirements.
- Neural rendering model combines generation and prediction in a single model for semi-supervised learning of images.
- SignSGD yields drastic gradient compression with almost no loss in accuracy.
- Symbols + Numbers: Instead of indulging in pointless Twitter debates over which is better, can we just unify both? We combine symbolic expressions and numerical data in a common framework for neural programming.
- Principled approaches in reinforcement learning: We develop efficient Bayesian DQN that improves exploration in high dimensions. We derive new trust-region policy optimization for partially observable models with guaranteed monotonic improvement. We show negative results for combining model-based and model-free RL frameworks.
- Domain adaptation: We derive generalization bounds when there are shifts in label distribution between source and target. This is applicable for AI cloud services where training distribution can have different proportions of categories from the serving distribution.
- Tensorly: The open-source framework that allows you to write tensor algorithms in Python and choosing any of the backends: PyTorch, TensorFlow, NumPy or MxNet. It has many new features now and is now part of PyTorch ecosystem.
On academic job market: My graduating student Kamyar Azzizadenesheli has done ground-breaking work in reinforcement learning (some of which I outlined above). Hire him!
Outreach and Democratization of AI: It has been very fulfilling to educate the public about AI around the world. I gave my first TEDx talk. I shared the stage with so many luminaries such as his holiness Dalai Lama. It was special to speak to a large crowd of Chinese women entrepreneurs at the Mulan event.
2018 NYTimes GoodTech award: for raising awareness about diversity and inclusion. 2018 has been a defining year for me and for many #womeninTech. A large part of my energy went into fighting vicious sexism in our research communities. It is impossible to distill this into few sentences. I have had to fend off numerous pushbacks, trolls and threats. But the positive part has been truly uplifting: countless women have hugged me and said that I am speaking on their behalf. I have found numerous male allies who have pledged to fight sexism and racism.
I want to end the year in a positive light. I hope for a great 2019! I know it is not going to be easy, but I won’t give up. Stay strong and fight for what you truly believe in!
I am very happy to share the news that I am joining NVIDIA as Director of Machine Learning Research. I will be based in the Santa Clara HQ and will be hiring ML researchers and engineers at all levels, along with graduate interns.
I will be continuing my role as Bren professor at Caltech and will be dividing my time between northern and southern California. I look forward to building strong intellectual relationships between NVIDIA and Caltech. There are many synergies with initiatives at Caltech such as the Center for Autonomous Systems (CAST) and AI4science.
I found NVIDIA to be a natural fit and it stood out among other opportunities. I chose NVIDIA because of its track record, its pivotal role in the deep-learning revolution, and the people I have interacted with. I will be reporting to Bill Dally, the chief scientist of NVIDIA. In addition to Bill, there is a rich history of academic researchers at NVIDIA such as Jan Kautz, Steve Keckler, Joel Emer, and recent hires Dieter Fox and Sanja Fidler. They have created a nourishing environment that blends research with strong engineering. I am looking forward to working with CEO Jensen Huang, whose vision for research I find inspiring.
The deep-learning revolution would not have happened without NVIDIA’s GPUs. The latest Volta GPUs pack an impressive 125 teraFLOPS and have fueled developments in diverse areas. The recently released NVIDIA Tesla T4 GPU is the world’s most advanced inference accelerator and NVIDIA GeForce represents the biggest leap in performance for graphics rendering since it is the world’s first real-time ray tracing GPU.
As many of you know, NVIDIA is much more than a hardware company. The development of CUDA libraries at NVIDIA has been a critical component for scaling up deep learning. The CUDA primitives are also relevant to my research on tensors. I worked with NVIDIA researcher Cris Cecka to build extended BLAS kernels for tensor contraction operations a few years ago. I look forward to building more support for tensor algebraic operations in CUDA which can lead to more efficient tensorized neural network architectures.
I admire recent ML research that has come out of NVIDIA. This includes state-of-art generative models for images and video, image denoising etc. The convergence of ML research with state-of-art hardware is happening at rapid pace at NVIDIA. In addition, I am also thrilled about developments in design and visualization, self-driving, IoT/autonomous systems and data center solutions at NVIDIA.
I hope to continue building bridges between academia and industry, and between theory and practice in my new role.
AI4science is a new initiative launched at Caltech that aims to broaden the impact of AI and ML across all areas of sciences. The inaugural workshop was held on Aug. 1st. My student Jeremy Bernstein wrote a detailed article on the workshop. The slides of the talks are also available there.
A short blurb of the article: Across science—from astrophysics to molecular biology to economics—a common problem persists: scientists are overwhelmed by the sheer amount of data they are collecting. But this problem might be better viewed as an opportunity, since with appropriate computing resources and algorithmic tools, scientists might hope to unlock insights from these swathes of data to carry their field forward. AI4science is a new initiative at Caltech aiming to bring together computer scientists with experts in other disciplines. While somewhat of a suitcase term, AI or artificial intelligence here means the combination of machine learning algorithms with large compute resources.
Professor Yisong Yue of Caltech’s Computing & Mathematical Sciences department (CMS) gave the first talk, where he gave a general overview of machine learning algorithms and their relevance across science and engineering. Professor Andrew Stuart, also in CMS, gave the talk following Professor Yue. Stuart discussed his interest in fusing data science techniques with known physical law. Frederick Eberhardt, professor of philosophy, spoke next. He discussed his work on causal inference. Professor Anima Anandkumar of the CMS department was the last computer scientist to speak. Anandkumar gave an overview of a successful machine learning technique known as artificial neural networks, which have dramatically improved the ability of computers to understand images and natural language. Anandkumar also spoke about tensor methods in machine learning. The remainder of the day was devoted to talks from scientists who have had success applying machine learning techniques in their respective fields. For more, check out the AI4science website.
I recently exited out of my role as principal scientist at Amazon Web Services (AWS). In this blog post, I want to recollect the rich learning experiences I had and the amazing things we accomplished over the last two years.
This was my first “industry” job out of academia. I chose AWS for several reasons. I saw huge potential to democratize AI, since AWS is the most comprehensive and broadly adopted cloud platform. Two years ago, cloud AI was still an uncharted territory, and this made it an exciting adventure. I was also attracted by the mandate to bring applied AI research to AWS.
My early days were exciting and busy. We were a new team in bay area with a startup-like environment, while still being connected with the Seattle team and the larger Amazon ecosystem. There was a steep learning curve to understand all the AWS services, software engineering practices, product management etc, and I loved it. We were busy growing the team, designing new AI services, and thinking about research directions, all at the same time.
I am proud of what we accomplished over the last two years. We launched a vast array of AI services at all levels of the stack. At the bottom are the compute instances — the latest GPU instances are the powerful NVIDIA Tesla V100s. The middle layer consists of SageMaker, a fully managed service with high-performance dockerized ML algorithms, and Deeplens, the first deep learning camera with seamless AWS integration. The top layer includes services for computer vision, natural language processing, speech recognition and so on. We made large strides in customer adoption and today AWS has the largest number of customer references for cloud ML services. In addition, the AWS ML lab provides advanced solutions for custom use cases. I got to interact with many customers and it was eye-opening to learn about real-world AI deployment in diverse domains.
I was most closely involved in the design, development and launch of SageMaker. Its broad adoption led to AWS increasing its ML user base by more than 250 percent over the last year. SageMaker removes heavy lifting, complexity, and guesswork from each step of productionizing ML. It was personally fulfilling to build topic modeling on SageMaker (and AWS comprehend) based on my academic research, which uses tensor decompositions. SageMaker topic-modeling automatically categorizes documents at scale and is several times faster than any other (open-source) framework. Check out my talk at ReInvent for more details. Taking the tensor algorithm from its theoretical roots to an AWS production service was a big highlight for me.
It was exciting to grow applied research at AWS. I looked for problems that posed the biggest obstacles in the real world. The “data problem” is the proverbial “elephant in the room”. While researchers work on established benchmark datasets, most of the time and effort in the real world is in data collection and clean up. We developed and tested efficient deep active learning, crowdsourcing and semi-supervised learning methods in a number of domains. We found that deep networks can be trained with significantly less data (~25%). For an overview, check out tutorial slides at UAI. I was also happy to connect my earlier research on tensors with deep learning to obtain a new class of deep networks that naturally encode the data dimensions and higher-order correlations. They are more compact and generalize better in many domains. Tensorly is Keras-like frontend to easily use tensor algebraic operations in deep learning with any backend. Moreover, I realized that in practice, simple methods work even if theory cannot explain it. We tried to close this gap by looking for conditions under which simple methods provably succeed, and then experimentally verifying these conditions. For instance, we showed that 1-bit gradient quantization has almost no accuracy loss but has reduced communication requirements for distributed ML, both in theory and in practice. All these projects were executed with an excellent cohort of interns and AWS scientists.
Being at AWS gave me a platform for community outreach to democratize AI. I worked to build partnerships with universities and non-profit organizations. The Caltech-Amazon partnership funded graduate fellowships and cloud credits which is is transforming fundamental scientific research at Caltech. This partnership also resulted in a new AWS office in Pasadena with Stefano Soatto and Pietro Perona at the helm.
I am happy that I got to represent AWS at many prominent avenues, including Deep Learning Indaba 2017, the first pan-African deep learning summit, Mulan forum for Chinese women entrepreneurs, Geekpark forum for startups in China and Shaastra 2018 at IIT Madras, a student run techfest, where we held the largest deep learning workshop in India.
I had the privilege to work with and learn from so many amazing individuals. It was enlightening to hear about the early days of AWS from veteran AWS engineer and team VP Swami Sivasubramanian. I tried to develop new skills from the very best: product management from Joseph Spisak, team management from Craig Wiley, software engineering from Leo Dirac, clear exposition from Zachary Lipton, ML practice from Sunil Mallya, to name a few. Attending MARS and interacting with Jeff Bezos and many other superstars was a big highlight.
I learnt good management principles and business practices at Amazon. Leadership principles is a succinct list of desirable leadership qualities. But some principles are at odds with others, which meant there was a need for balance. For instance, the “dive deep” principle requires time and effort, while “bias for action” calls for expediency. I also found that working backwards from customer needs and having “two pizza” teams resulted in focused discussions with great outcomes. Another effective strategy I learnt was the Press Release (PR)-FAQ. Presentations are banned in Amazon and in order to pitch any new idea one had to start by writing its press release. This entailed having clarity on the product goals and its target customers, right from the beginning. I could see the effectiveness of all these principles and their role in making Amazon the huge success it is today.
To summarize, I am very thankful for the learning experience I had at AWS. In the next post, I will talk about my upcoming plans. Stay tuned!
Below is a slideshow of some of my favorite memories..
Over the years, I have followed many good academic blogs, e.g. Scott Aaronson, Moritz Hardt, Zachary Lipton, Lior Pachter etc. Until now I didn’t think I had time for blogging. May be one of the luxuries of tenure is to make room for things I always wanted to do and this is one of them 🙂
Besides writing has never been my strongest suite. My verbal skills developed much later than my mathematical skills. At the age of 3, my mom tells me that I could barely speak but could solve lots of puzzles. Apparently my grandma had written up the calendar for the next 10 years that I had memorized and could name the day for any date. At school, I struggled with essays. Fast forward, in grad school I was frustrated that papers had to be written and that just deriving math equations and proofs was not enough. Math was my native language and everything else felt foreign.
This didn’t change until a few years ago when I learnt the importance of writing and communication the hard way. I interviewed at most of the top schools and amidst a hasty schedule, I did not polish my presentations or gather thoughts on how to communicate my research to people outside my field. I did not get any offers, even though there was strong initial interest. This forced me to think how I should communicate complex research to general audience. Richard Feynman was an inspiration during my teenage years and I went back to looking at how he structured his lectures. I also had Zachary Lipton work with me. He wrote beautiful prose and I learnt a lot from him.
I am now thankful that I spent time to improve my verbal and presentation skills because it has opened up so many doors. At Amazon, I have interacted with product managers, marketing people, engineers with no ML background and customers. This requires providing customizing the explanations and listening to their concerns. I realize how critical these skills are for me to be effective. These days I insist that all my students take courses on communication and be able to write well. I will give them editorial comments, but I will not write on their behalf. I keep emphasizing the importance of clear communication.
This blog is an attempt to get my thoughts out to the world. You will hear more in the coming days.