Instructors & Organizers

Course Description

What is this course about?

Questions we will explore:
  1. Why do we care about interpretable AI? Why do we care about fair AI? Real-life examples.
  2. What are model explanations? What is fair Artificial intelligence?
  3. Given a black box AI model, how do you explain it? Fundamental and State-of-the-art method taxonomies.
  4. Given an application, what type of model should you use (interpretability vs performance trade-off)?
  5. When a model is un'fair', what can you do?
  6. Regulatory trends and societal perspectives on interpretable and fair AI.

Course Description

Topic: As black-box AI models grow increasingly relevant in human-centric applications, explainability and fairness becomes increasingly necessary for trust in adopting AI models. This seminar class introduces students to major problems in AI explainability and fairness, and explores key state-of-the art methods to enable students to apply to future work. Key technical topics include surrogate methods, feature visualization, network dissection, adversarial debiasing, and fairness metrics. There will be a survey of recent legal and policy trends.

From this course, students will receive an overview of some of the most important recent methods on how to explain AI models, how it's currently done, how to choose models based on level of interpretability needed, and relevant user implications of interpretability and fairness in applications such as healthcare, facial recognition, and regulatory decision-making.

Format: Each week a guest lecturer from AI research, industry, and related policy fields will present an open problem and solution, followed by a short roundtable Q&A/discussion with the class.

Participation: This class is offered C/NC. You should attend at least 6 sessions to receive credit. Given the time zone differences + extra challenges that come with this quarter, we’ll make accommodations for lecture attendance on video watching instead for C/NC if you reach out to us. We strongly encourage attending class to get the most out of interactive, small, seminar style lectures, but definitely recognize the additional challenges that come with this quarter. Since we keep the class size small to facilitate discussion with guest lecturers, auditing may be approved on a case-by-case basis depending on student numbers, with permission from the course staff.

Contact: If you have questions about the class, please email cs81si-spr1920-staff@lists.stanford.edu.


Week Description
1 What is AI Interpretability (& Fairness)? - Marco Ribeiro, Microsoft Research
Why should you care? Motivation Research progress, Applications, Regulatory and Ethical Considerations
Definitions and desiderata of interpretability and fairness, terminology (model agnostic, whitebox vs blackbox, global vs local interpretability)
Taxonomy of AI Interpretability research, major recent development summary. Tradeoff between interpretability and model accuracy.
Overview of examples of model-agnostic methods: LIME.
2 Visualizing black models - David Bau, MIT
Learned content map reconstruction, latent layer observation, units attribution.
Feature Visualization and attribution methods, Network dissection.
3 Human-Friendly Attribution - Been Kim, Google Brain
Saliency Map Sanity Checks, Tradeoffs of Interpretability Methods
Post-training explanations, Concept Activation Vector Testing
Applications and Concept Representations
4 Feature Representation & Motifs - Chris Olah, OpenAI
Neural Network abstractions, Features & Weights Visualization
Synthetic Reconstructions, Interpreting Neuron Connections Circuitry
5 What is AI Fairness? Motivation, Theory, and Applications
Case studies and motivation (Word embedding gender biases, Amazon facial recognition, Google employment tool, COMPAS) .
Desiderata and fairness terminology. Types of bias and fairness.
Legal and Ethical Considerations in Fairness: GDPR, disparate impact and treatment examples.
6 Explanations in Policy and Law
Recent trends and barriers to adoption in policy and law.
Desiderata of AI explanations, predictions.
Case study on AI explanations gone wrong and right in research and industry.
Interpretability application to adversarial attacks and privacy.
7 Interpretability as a Pipeline
Choosing models: when to use whitebox models (decision trees, monotonic GBMs, Supersparse Linear Integer Models), performance tradeoff.
Who are explanations for?
Evaluating AI Interpretability, tradeoffs of methods mentioned in class.
8 Evaluating Fairness
Quantitative and qualitative fairness evaluation.
Statistical measures of discrimination. Fairness metrics example 1: Model Cards for Model Reporting.
Model calibration.
9 Fairness Model Treatments
Reweighing, adversarial debiasing, and reject option-based classification.
Causal fairness - Learning with imbalanced data: techniques.

Last updated April 2020. Reach out to Eva if there are website issues. HTML design template thanks to 224N course staff.