Reproducibility Debate: Should Code Release be Compulsory for Conference Publications?

Update: Added discussions in the end based on Twitter conversations. 

Yesterday, I was on the debate team at DALI conference in gorgeous George in South Africa. The topic was:

“DALI believes it is justified for industry researchers not to release code for reproducibility because of proprietary code and dependencies.”

I was opposing the motion, and this matched by personal beliefs.  I am happy to talk about my own stance but I cannot disclose the arguments of others, since it was off the records (and their arguments were not necessarily their own personal opinions).

Edit: Uri Shalit and I formed the team opposing the motion. I checked with him to see if he is fine with me mentioning it. We collaboratively came back with the points below. 

This topic is timely since ICML 2019 has added reproducibility as one of the factors to be considered by the reviewers. When it first came up, it seemed natural to set standards for reproducibility: the same way we set standards for a publication at our top-tier conferences. However, I was disheartened to see vocal opposition, especially from many “big-name” industry researchers. So with that background, DALI decided to focus the reproducibility debate on industry researchers.

My main reasons for opposing the motion:

  • Pseudo-code is just not enough: Anyone who has tried to implement an algorithm from another paper knows how terribly frustrating and time consuming it can be. With complex DL algorithms, every tiny detail matters: from hyperparameters to the randomness of the machine. It is another matter that this brittleness of DL is a huge cause of concern. See excellent talk by Joelle Pineau on reproducibility issues in reinforcement learning. In the current peer-review environment, it is nearly impossible to get a paper accepted unless all comparisons are made. I have personally had papers rejected even after we clearly stated that we could not reproduce the results of another paper.
  • Unfair to academic researchers: The cards are already stacked against academic researchers: they do not have access to vast compute and engineering resources. This is exasperated by the lack of reproducibility. It is grossly unfair to expect a graduate student to reproduce the results of a 100-person engineering team. It is critical to keep academia competitive: we are training the next generation and much of basic research still happens only in academia.
  • Accountability and fostering healthy environment: As AI gets deployed in the real world, we need to be responsible and accountable. We would not allow new medical drugs into the market without careful trials.  The same standards should apply to AI , especially in safety critical applications. It first starts with setting  rigorous standards for our research publications. Having accessible code allows the research community to extensively test the claims of the paper. Only then, it can be called legitimate science.
  • No incentives for voluntary release of code: Jessica Forde gave me some depressing statistics: currently only one third of the papers voluntarily release code. Many argue that making it compulsory is Draconian.  I will take Draconian any day if it ensures a fair environment that promotes honest progress.  There is also the broader issue that the current review system is broken: fair credit assignment is not ensured and false hype is unfairly rewarded. I am proud how the AI field, industry in particular, has embraced the culture of open sourcing.  This is arguably the single most important factor for rapid progress. There is incentive for industries to open source since it allows them to capture a user base. These incentives have a smaller effect on release of individual papers. It is therefore needed to enforce standards.
  • To increase synergistic impacts of the field: Counter-intuitively, code release will move the field away from leaderboard chasing. When code is readily available, barriers of entry for incremental research are lowered. Researchers are incentivized to do “deeper” investigation of the algorithms. Without this, we are  surely headed for the next AI winter. 

Countering the arguments that support the motion:

  • Cannot separate code from internal infrastructure: There exist (admittedly imperfect) solutions such as containerization. But this is a technical problem, and we are good at coming up with solutions for such well-defined problems.
  • Will drive away industry researchers and will slow down progress of AI: First of all,  progress of AI is not just dependent on industry researchers. Let us not have an “us vs. them” mentality. We need both industry and academia to make AI progress. I am personally happy if we can drive away researchers who are not ready to provide evidence for their claims. This will create a much healthier environment and will speed up progress.
  • Reproducibility is not enough: Certainly! But it is a great first step. As next steps, we need to ensure usable and modular code. We need abstractions that allows for easy repurposing of parts of the code. These are great technical challenges: ones our community is very well equipped to tackle.

Update from Twitter conversations

There was enthusiastic participation on Twitter. A summary below:


Useful tools for reproducibility:

Screen Shot 2019-01-28 at 4.45.56 PMScreen Shot 2019-01-28 at 4.42.29 PMScreen Shot 2019-01-28 at 4.50.25 PMScreen Shot 2019-01-28 at 5.07.10 PMScreen Shot 2019-01-28 at 4.51.04 PMScreen Shot 2019-01-28 at 4.42.03 PMScreen Shot 2019-01-28 at 5.17.36 PMScreen Shot 2019-01-28 at 5.07.24 PMLessons from other communities: 

Screen Shot 2019-01-28 at 5.18.37 PMScreen Shot 2019-01-28 at 5.18.58 PMScreen Shot 2019-01-28 at 4.46.25 PMIt is not just about code, but data, replication etc: 

Screen Shot 2019-01-28 at 5.16.42 PMScreen Shot 2019-01-28 at 5.08.23 PMScreen Shot 2019-01-28 at 5.17.56 PMScreen Shot 2019-01-28 at 5.20.26 PMDisagreements: 

Screen Shot 2019-01-28 at 5.22.31 PMScreen Shot 2019-01-28 at 5.23.24 PM

I assume that the Tweet above does not represent the official position of Deep mind, but I am not surprised.

I do not agree with the premise that it is a worthwhile exercise for others to reinvent the wheel, only to find out it is just vaporware. It is unfair to academia and unfair to graduate students whose careers depend on this.

I also find it ironic that the comment states that if an algorithm is so brittle to hyperparameters we should not trust these results. YES! That is the majority of deep RL results that are hyped up (and we know who the main culprit is).

What happens behind the doors: Even though there is overwhelming public support, I know that such efforts get thwarted in committee meetings of popular conferences like ICML and NeurIPS. We need to apply more pressure to have better accountability.

It is time to burst the bubble on hyped up AI vaporware with no supporting evidence. Let the true science begin!


2018 in Review

This post reviews my experiences in 2018. I welcomed the year in the gorgeous beaches of Goa and am now ending it in the wilderness of South Africa. My highlights of 2018 are the following:

Joining NVIDIA: I joined NVIDIA in September and started a new research group on core AI/ML. I am hiring at full pace and have started many new projects. I am also excited about many new launches from NVIDIA over the last few months:

  1. Rapids: Apache open-source multi-GPU ML library.
  2. Clara: Platform for medical imaging.
  3. Physx: Open source 3D simulation framework.

Honor of being the youngest named chair professor at CaltechI was one of the six faculty members that Caltech recognized during the 2017-18 academic year. This is the Institute’s most distinguished award for individual faculty.

Launching AWS Ground Truth: Before leaving AWS, I was working on the ground truth service which got launched during ReInvent conference in November. Data is a big barrier to adoption of AI. The availability of private workforce and not just the public crowd on Mturk will be a game changer in many applications. My team did the prototyping and many research projects on active learning, crowdsourcing and building intelligence into the data collection process.

Exciting research directions:

  1. Autonomous Systems: CAST at Caltech was launched in October 2017 to develop foundations for autonomy. This has been an exciting new area of research for me. We got a DARPA Physics of AI project funded that infuses physics into AI algorithms. The first paper to come out of this project has been the neural lander that uses neural networks to improve landing of drones while guaranteeing stability. Check out its videos here.
  2. AI4Science at Caltech: Along with Yuxin Chen and Yisong Yue, I launched AI4Science initiative at Caltech. The goal is to do truly integrated research that brings about new advances in many scientific domains. Some great use cases are high energy physics, earthquake detection, spinal cord therapy etc.
  3. Core ML research: We have pushed for a holistic view of AI as data + algorithms + systems.
    • Active learning and crowdsourcing for intelligent data gathering that significantly reduces data requirements.
    • Neural rendering model combines generation and prediction in a single model for semi-supervised learning of images.
    • SignSGD yields drastic gradient compression with almost no loss in accuracy.
    • Symbols + Numbers: Instead of indulging in pointless Twitter debates over which is better, can we just unify both? We combine symbolic expressions and numerical data in a common framework for neural programming.
    • Principled approaches in reinforcement learning: We develop efficient Bayesian DQN that improves exploration in high dimensions.  We derive new trust-region policy optimization for partially observable models with guaranteed monotonic improvement. We show negative results for combining model-based and model-free RL frameworks.
    • Domain adaptation: We derive generalization bounds when there are shifts in label distribution between source and target. This is applicable for AI cloud services where training distribution can have different proportions of categories from the serving distribution.
    • Tensorly: The open-source framework that allows you to write tensor algorithms in Python and choosing any of the backends: PyTorch, TensorFlow, NumPy or MxNet. It has many new features now and is now part of PyTorch ecosystem.

On academic job market: My graduating student Kamyar Azzizadenesheli has done ground-breaking work in reinforcement learning (some of which I outlined above). Hire him!

Having grandkids: academically speaking 😉 It is great to see my former student Furong Huang and my former postdoc Rose Yu thrive in their faculty careers.

Outreach and Democratization of AI: It has been very fulfilling to educate the public about AI around the world. I gave my first TEDx talkI shared the stage with so many luminaries such as his holiness Dalai Lama. It was special to speak to a large crowd of Chinese women entrepreneurs at the Mulan event.

2018 NYTimes GoodTech award: for raising awareness about diversity and inclusion. 2018 has been a defining year for me and for many #womeninTech. A large part of my energy went into fighting vicious sexism in our research communities. It is impossible to distill this into few sentences. I have had to fend off numerous pushbacks, trolls and threats. But the positive part has been truly uplifting: countless women have hugged me and said that I am speaking on their behalf. I have found numerous male allies who have pledged to fight sexism and racism.

I want to end the year in a positive light. I hope for a great 2019! I know it is not going to be easy, but I won’t give up. Stay strong and fight for what you truly believe in!

AI4Science @Caltech

AI4science is a new initiative launched at Caltech that aims to broaden the impact of AI and ML across all areas of sciences. The inaugural workshop was held on Aug. 1st. My student Jeremy Bernstein wrote a detailed article on the workshop. The slides of the talks are also available there.

A short blurb of the article: Across science—from astrophysics to molecular biology to economics—a common problem persists: scientists are overwhelmed by the sheer amount of data they are collecting. But this problem might be better viewed as an opportunity, since with appropriate computing resources and algorithmic tools, scientists might hope to unlock insights from these swathes of data to carry their field forward. AI4science is a new initiative at Caltech aiming to bring together computer scientists with experts in other disciplines. While somewhat of a suitcase term, AI or artificial intelligence here means the combination of machine learning algorithms with large compute resources.

Professor Yisong Yue of Caltech’s Computing & Mathematical Sciences department (CMS) gave the first talk, where he gave a general overview of machine learning algorithms and their relevance across science and engineering. Professor Andrew Stuart, also in CMS, gave the talk following Professor Yue. Stuart discussed his interest in fusing data science techniques with known physical law. Frederick Eberhardt, professor of philosophy, spoke next. He discussed his work on causal inference. Professor Anima Anandkumar of the CMS department was the last computer scientist to speak. Anandkumar gave an overview of a successful machine learning technique known as artificial neural networks, which have dramatically improved the ability of computers to understand images and natural language. Anandkumar also spoke about tensor methods in machine learning. The remainder of the day was devoted to talks from scientists who have had success applying machine learning techniques in their respective fields. For more, check out the AI4science website. 


My first blog post

Over the years, I have followed many good academic blogs, e.g. Scott Aaronson, Moritz HardtZachary LiptonLior Pachter etc. Until now I didn’t think I had time for blogging. May be one of the luxuries of tenure is to make room for things I always wanted to do  and this is one of them 🙂

Besides writing has never been my strongest suite. My verbal skills developed much later than my mathematical skills. At the age of 3, my mom tells me that I could barely speak but could solve lots of puzzles. Apparently my grandma had written up the calendar for the next 10 years that I had memorized and could name the day for any date. At school, I struggled with essays. Fast forward, in grad school I was frustrated that papers had to be written and that just deriving math equations and proofs was not enough. Math was my native language and everything else felt foreign.

This didn’t change until a few years ago when I learnt the importance of writing and communication the hard way. I interviewed at most of the top schools and amidst a hasty schedule, I did not polish my presentations or gather thoughts on how to communicate my research to people outside my field. I did not get any offers, even though there was strong initial interest. This forced me to think how I should communicate complex research to general audience. Richard Feynman was an inspiration during my teenage years and I went back to looking at how he structured his lectures. I also had Zachary Lipton work with me. He wrote beautiful prose and I learnt a lot from him.

I am now thankful that I spent time to improve my verbal and presentation skills because it has opened up so many doors. At Amazon, I have interacted with product managers, marketing people, engineers with no ML background and customers. This requires providing customizing the explanations and listening to their concerns.  I realize how critical these skills are for me to be effective. These days I insist that all my students take courses on communication and be able to write well. I will give them editorial comments, but I will not write on their behalf. I keep emphasizing the importance of clear communication.

This blog is an attempt to get my thoughts out to the world. You will hear more in the coming days.