2 Introduction and overview
2.1 Is this course for me?
2.1.1 Industry drivers of causal modeling
Three practical forces are driving the need for data scientists and machine learning engineers with causal inference expertise:
- The need for better design and analysis of experiments on online platforms.
- The need to improve in-production machine learning.
- The need for decision-making agents (as in reinforcement learning) that can reason causally
This course provides the foundation and the tools needed to solve these practical problems in the industry. If these are problems that exist in your company, or that you know you want to be able to tackle in your career, then you are definitely in the right place.
2.1.2 Extending deep learning with causal modeling
Yoshua Bengio is known for his significant contributions to deep learning, for which he won the Turing award. So it was interesting to see this patriarch of deep learning sitting in on a causal inference panel in Neurips 2019 (3rd from right).

Shortly before I took this photo, Bengio said in an interview:
I think we need to consider the hard challenges of AI and not be satisfied with short-term, incremental advances. I’m not saying I want to forget deep learning. On the contrary, I want to build on it. But we need to be able to extend it to do things like reasoning, learning causality, and exploring the world to learn and acquire information.
Building causal models with deep learning architectures is a primary objective of this course. We do this with a tool called Pyro, a library for generative machine learning in Python that is built on top deep learning framework Torch. This course will use Pyro to illustrate both basic causal modeling and inference concepts and algorithms, as well as demonstrate concrete examples of causal models that incorporate deep learning architectures.
If you have experience with tensor-based machine learning frameworks such as PyTorch, Tensorflow, and Theano, then this course will enable you to build causal model experience directly on this technical foundation
Conversely, if you have some causal inference background but lack experience with cutting edge machine learning frameworks, then you are in luck. This course will grounds causal inference in a probabilistic machine learning approach that employes these tools.
2.2 Course outcomes
- Familiarity with techniques in probabilistic machine learning
- How to extend those techniques to building causal models
- Familiarity with core causal modeling concepts, including interventions, the “do-calculus,” and counterfactual reasoning
- Familiarity with how to engineer deep probabilistic models with Pyro
2.3 Prerequisites
To get the most from this course, you should have a basic understanding of probability, probability distributions and random variables, conditional probability, Baye’s rule, and expectation.
I recommend the following books for a gentle introduction to these core ideas:
2.4 What is not covered
Causal inference spans many other concepts, and we won’t be able to cover all of them. Though the concepts below are essential, they are out of scope for this course.
Causal discovery. Causal discovery is the problem of learning cause and effect relationships from data, typically in the form of graphical model structures. Causal discovery is a broad topic and could constitute a whole independent course. For resources on learning causal discovery, I suggest looking at the website for the R package bnlearn (I am a past contributor to this package). The package contains links to useful references and books, as well as a large body of example code and vignettes.
In general, we will construct our causal structure from our theories and domain knowledge about how the real-world process that generates the data we are modeling. We will rely on methods of evaluating the performance of our model that are standard in Bayesian machine learning to see if we need to make a change to that structure. However, this class will provide an excellent foundation for an exploration of causal discovery algorithms.
Advanced approaches from the potential outcomes framework. This course provides an introduction to core ideas in the potential outcomes framework, such as ignorability, the g-formula, and single world intervention graphs. It explains those ideas and how they translate to the context of probabilistic machine learning. It provides you with enough to discuss these topics to practitioners who use these frameworks and pounds a starting point for self-study of advanced topics that use potential outcomes as a foundation.
Deep-dives into causal inference with linear regression. Much of causal inference research likes to assume a basic linear model and build out a bunch of math that relies on these underlying assumptions. This course assumes that the problems that you will work on in practice will not typically be suitable for a linear model. We expect things will be non-linear and hierarchical. We will rely on our probabilistic modeling framework to build bespoke models perfectly suited for our beliefs about the data generating process.