David Dao
PhD candidate at ETH Zurich | Past: Stanford, Berkeley, MIT

I'm the founder of GainForest and a Ph.D. candidate at ETH Zurich on AI and Data Systems for the SDGs, advised by Prof. Ce Zhang and Prof. Gustavo Alonso.
GainForest is a non-profit grantee of Microsoft’s AI for Earth program, and a $10M XPRIZE Rainforest Semi-Finalist. We leverage decentralized technology to prevent deforestation. At ETH Zurich, I founded the Climate + AI initiative at DS3Lab and maintain Github's most starred collection on ethical use of AI. Previously, I was an engineer in Silicon Valley and a research fellow at Berkeley AI Research (BAIR), Stanford University and at Broad Institute of MIT and Harvard. I'm a Global Shaper (Davos 50) at World Economic Forum, a Core Member of Climate Change AI, a Climate Leader at Climate Reality Project, a Mentor at Creative Destruction Lab, a United Nations delegate at COP (since COP23 Bonn), and organized conferences with thousands of attendees in in Germany, Silicon Valley, and at Harvard.
πŸ—žοΈ Media features (selected):
GainForest featured in MIT Technology Review, Microsoft, United Nations, World Economic Forum, Swiss Re, The Edge Malaysia, ETH News, Swiss Tagblatt
Decentralized Science featured in WIRED, The New York Times, MIT Technology Review and ETH News
Ethics & AI featured in Radio TΓ©lΓ©vision Suisse, Goethe Institute
Previous research at MIT featured in The Scientist

🎨 Art work:
Awful AI featured in Fotomuseum Winterthur
Provocation in BeFantastic

πŸŽ™οΈ Interviews:
BBC Radio 4, ETH Spotlight, ETH Podcast, digitalculture.la, KΓΆrber-Stiftung

πŸŽ₯ Featured Talks
COP26 Goals House: Youth Leadership
UN Climate Change: GainForest
UN Climate Change: Radical Transparency in Monitoring
TEDx Geneva: Learning from Nature's Stewards
WEF Davos: A Wake-Up Call from Nature
WEF Davos: Addressing the Drivers of Eco-Anxiety

πŸ‘‡ In short for millenials:
Founder @GainForest. Using πŸ›° and β›“ to restore 🌴
PhD candidate @DS3Lab. AI for Sustainable Development 🌱
Goal: Save the world with crazy technology 🌍
Academic: ETH, Past: Stanford, Berkeley, MIT

Research Β·Medium Blog

Want to work on the UN's Sustainable Development Goals (SDGs) and finish your thesis at the same time? Here are some thesis proposals!

Data Markets and Valuation
Data Valuation: How much is your data worth? Β· Joint project with UC Berkeley
AI Systems to help restore the natural world
Komorebi: Deforestation prediction of the Amazon Rainforest using satellite imagery Β· 2 x Microsoft AI for Earth Grant Β· Grand Prize UNFCCC Hack4Climate at COP23 Β· Presentation at COP24 Β· Presentation at COP25
ForestBench: A global benchmark for forest carbon stock prediction Β· Joint project with MIT and TUM
GainForest XPRIZE RainForest: Biodiversity monitoring with satellites, drones and eDNA Β· Joint project with Stanford
Decentralized Science
Kara: A privacy-preserving tokenized data market for medical data Β· Joint project with UC Berkeley and Stanford
Piximi: Interactive machine learning for cell biology Β· Joint project with Broad Institute of MIT and Harvard

tldr; Design systems such that society and AI gain from each other.
Here is my mission statement and a timeline.

Selected Publications

ReforesTree: A Dataset for Estimating Tropical Forest Carbon Stock with Deep Learning and Aerial Imagery
Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms
Towards Efficient Data Valuation Based on the Shapley Value
CellProfiler Analyst: interactive data exploration, analysis and classification of large biological image sets

All Publications Β· Google Scholar


Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines
ReforesTree: A Dataset for Estimating Tropical Forest Carbon Stock with Deep Learning and Aerial Imagery


RumbleML: program the lakehouse with JSONiq
Toward Foundation Models for Earth Monitoring: Proposal for a Climate Change Benchmark
Challenges in KDD and ML for Sustainable Development
Tackling the Overestimation of Forest Carbon with Deep Learning and Aerial Imagery πŸ† Spotlight (Top 5%)
Scalability vs. Utility: Do We Have To Sacrifice One for the Other in Data Importance Quantification?
Ease. ML: A Lifecycle Management System for Machine Learning


TrueBranch: Metric Learning-based Verification of Forest Conservation Projects πŸ† Best Proposal Award (Top 2%)
Xingu: Curating Weak Supervision Signals for Sustainable Climate Finance


GeoLabels: Towards Efficient Ecosystem Monitoring using Data Programming on Geospatial Information
Data Capsule: A New Paradigm for Automatic Compliance with Data Privacy Regulations
Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms
GainForest: Scaling Climate Finance for Forest Conservation using Interpretable Machine Learning on Satellite Imagery
Towards Efficient Data Valuation Based on the Shapley Value


DataBright: A Data Curation Platform for Machine Learning based on Markets and Trusted Computation
A Demonstration of Sterling: A Privacy-Preserving Data Marketplace


An open-source solution for advanced imaging flow cytometry data analysis using machine learning


CellProfiler Analyst: interactive data exploration, analysis and classification of large biological image sets


Anatomy of BioJS, an open source community for the life sciences


Automated Plausibility Analysis of Large Phylogenies

Open Source Software Β· Github

Almost all of my work is open source

profile for David Dao on Stack Exchange, a network of free, community-driven Q&A sites

Star · Awful-AI 😈 is a curated list to track current scary usages of AI - hoping to raise awareness
Star Β· Awesome-Deep-Learning πŸ”₯ is a curated list of papers about very deep neural networks
Star · Spatial Transformer 🌐 is part of TensorFlow Models (where I'm co-author)
Star Β· CellProfiler Analyst πŸ”¬ is an adaptive machine learning tool for biologists
Star · Green Artificial Intelligence Standard 🌱 aims to develop a standard and raise awareness for best environmental practices in AI research and development
Star Β· BioJS πŸ”¬ is an interactive visualization ecosystem for life science

Scientific Collaborators

I'm grateful to work with my scientific collaborators

Β· Lucas Czech (Carnegie Institution of Science)
Β· Crowther Lab (ETH Zurich)
Β· BjΓΆrn LΓΌtjens (MIT)
Β· Dawn Song (UC Berkeley)
Β· Robert Chang (Stanford Medicine)
Β· Anne Carpenter (MIT Broad Imaging Platform)
Β· Joe Near (University of Vermont)
Β· Yan Meng (Mercedes-Benz Research)

Former/Current Students

I'm proud of my students

Β· Lasse Wolff Anthony (ETH)
Β· Thomas Huber (Uppsala University)
Β· Nils Lehmann (UvA / Now TUM)
Β· Ghjulia Sialelli (ETH)
Β· Marc Watine (ETH / Went to Harvard)
Β· Gyri Reiersen (TUM / Now CTO Tanso)
Β· Kenza Amara (ETH / Went to Facebook AI)
Β· Simona Santamaria (ETH / Now CTO RYVER.AI)
Β· Iveta Rott (ETH / Went to McKinsey)
Β· Mina Huh (KAIST)
Β· Levin Moser (ETH / Went to MIT)
Β· Catherine Cang (UC Berkeley / Went to AirBnB)
Β· Ming Zhang (ETH / Went to Roche)
Β· Luca Lanzendorfer (ETH / Went to Mercedes-Benz Research)
Β· Florian Chlan (ETH / Went to Amazon)
Β· Nino Weingart (ETH / Went to BSI)
Β· Christopher Friedrich (Reutlingen / Went to MIT)

Scientific Service

Proud and active member of the scientific community

Β· Program Comittee NeurIPS Climate Change AI'22
Β· Reviewer for CCAI Innovation Grant (Total value of 1.8 Mio USD)
Β· Organizer CCAI Side Event at COP26 (together with ClimateTRACE and CAIC)
Β· Program Committee NeurIPS Climate Change AI'21
Β· Tutorial KDD Challenges in ML for Sustainable Development'21
Β· Program Committee ICML Climate Change AI'21
Β· Co-Lead Organizer NeurIPS Climate Change AI'20
Β· Organizer ICML Economics of Privacy and Data Labor'20
Β· Co-Organizer ICLR Climate Change AI'20
Β· Program Committee NeurIPS Climate Change AI'19
Β· Reviewer ISC-HPC'15


ErdΓΆs number: 4 (via A. Stamatakis)

Climate Emergency

You can follow me on Twitter at @dwddao.