People of ACM - Cynthia Rudin

March 10, 2026

Cynthia D. Rudin is a professor who leads the Interpretable Machine Learning Lab at Duke University. Her lab, which seeks to design predictive machine learning models that people can understand, focuses on areas including healthcare, criminal justice, and energy reliability.

Among her honors, she has received the Squirrel Award for Artificial Intelligence from the Association for the Advancement of Artificial Intelligence (AAAI), as well as the IJCAI John McCarthy Award.  Rudin was recently named an ACM Fellow for contributions to and leadership in interpretable machine learning and societal applications.

Early in your career, you used machine learning to predict which manholes in New York City were at risk of exploding due to degrading and overlooked electrical circuitry.  Will you discuss how this experience eventually led to the ideas expressed in your paper “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead”?

I was working with power engineers from Con Edison in New York City. They had an impressive amount of data collected since the very beginning of the power grid in the 1880’s, but a lot of the data was very challenging to work with—it included trouble tickets typed by dispatchers while they were supervising groups of repair crews. It took us months just to understand what the documents meant, as they were not written in natural language meant for data scientists! Once the data were in reasonable shape, I tried many different machine learning methods, and they all performed very similarly. I wasn’t seeing an advantage of the extra complexity of fancy machine learning approaches over, for instance, logistic regression. The biggest benefit of a much simpler model was that the power engineers could give us feedback on it because they understood it! And some of their feedback turned out to be essential. With their feedback, the models could predict into the future; without it, the models were useless. So, interpretability saved the day, not complexity.

As it turns out, many data science problems are like this, where there’s no benefit to having a more complex model, but there’s a massive benefit to having a model you can understand. In fact, this occurs quite often with tabular data. When I realized this, I started trying to design interpretable models—really tiny models that people could understand—that didn’t sacrifice accuracy at the expense of interpretability. These models could be compared to little scorecards, or little decision trees—perhaps a formula so small that a doctor could memorize it.

At the time, interpretability held no value to the academic machine learning community. It was hard to work in a field where people didn’t value what I was trying to do, but what I knew the world needed. Heck, if databases are not trustworthy, how can machine learning models built from them be trusted? I’m glad people have started to realize the essential role interpretability plays in high-stakes situations involving machine learning.

A key idea in your Rashomon set theory and Rashomon Set paradigm is that “optimizing for simplicity in machine learning models won’t sacrifice accuracy.” Can you briefly discuss this and why it is important?

This relates back to the (fallacy of the) accuracy interpretability tradeoff that I was speaking about above. Indeed, for tabular data, we can often find very simple models that perform as well as the best black box machine learning models. Why is that? It’s totally unintuitive, but it’s true! Several years ago, I started working with PhD student Lesia Semenova and my colleague Ron Parr to try to mathematically explain this unintuitive phenomenon. Our theory currently explains that noise in the world causes enough uncertainty that we can’t tell models apart statistically. This creates a lot of equally good models—a “Rashomon Effect” as coined by famous statistician Leo Breiman. Then, when there are a lot of equally good models, some of them are simple models. So, if you are interested in finding a good simple model, you can optimize for simplicity, constraining it to perform well. In that case, we have obtained a simple model that performs as well as more complex models.

But once you’ve developed simple models, another problem shows up in practice. Often when we give the user an interpretable model thinking they are going to be delighted, they aren’t! As soon as they can understand what the model is doing, they want to change it. This doesn’t happen with black box models because it’s not possible for humans to understand or critique black boxes, but it happens all the time for interpretable models. So, we needed to add flexibility for users to select a good model. However, the standard machine learning paradigm isn’t interactive; machine learning algorithms produce only one model at a time, creating an “interaction bottleneck” with users. This is why I worked with Chudi Zhong, Margo Seltzer, Rui Xin, Zhi Chen and others to create the “Rashomon set paradigm” for machine learning. In the Rashomon set paradigm, machine learning algorithms produce all good interpretable models at the same time, and the user can look through them using a visualization interface. It allows users the flexibility to design models themselves instead of being stuck with whatever the algorithm gives them.

I’m really excited about this paradigm. I think it’s going to be the way people construct machine learning models for high stakes decisions. It’s just so much easier this way!

You’ve noted that a central challenge in interpretable machine learning is building optimal risk scores. For those who are unfamiliar with your field, what are optimal risk scores? What recent advances make you hopeful about progress in this area?

Risk scores are arguably the most popular type of interpretable machine learning model. These are very tiny models that add up “points” to produce a predicted risk. These kinds of models are used throughout medicine and criminal justice and in many other areas, and they have been used in criminal justice for over a century (well before computers existed!).

The most famous risk score is probably the CHADS2 score, which estimates stroke risk in patients with atrial fibrillation. It gives patients a point for each of the following: history of coronary heart failure, hypertension, age, and diabetes, and it gives patients two points for a past stroke. CHADS2 is a full-blown predictive model even though it’s really tiny. It was constructed by a team of doctors, whereas my lab is trying to construct these kinds of models directly from datasets. One of the models my lab created (with Berk Ustun, Brandon Westover, and Aaron Struck) is called the 2HELPS2B score, and it is widely used in intensive care units across the country. It predicts whether a patient is likely to have a seizure from their EEG brain signals. It helps doctors make decisions about how to care for very ill patients. It is a rare case of a machine learning model being used in a high-stakes situation. And that can only happen because the model is interpretable. Doctors don’t need to trust it, and they can judge for themselves whether to use it.

Our latest code for producing risk scores is called FasterRisk, written with Jiachang Liu, Chudi Zhong, and Margo Seltzer. We’ve found it to be really useful for producing risk scores, and the code is public so anyone can use it!

The PaCMAP algorithm for data visualization (which you co-authored with Yingfan Wang, Haiyang Huang, and Yaron Shaposhink) has won two software awards from the American Statistical Association. Why has this algorithm been particularly popular with scientists working in bioinformatics, biology and ecology?

Before working on PaCMAP, I had not realized how incredibly useful dimension reduction for data visualization is. It is amazingly useful! Almost every project in my lab now uses PaCMAP to visualize data because it gives a bird’s eye view of clusters, as well as manifolds and branching structures in the data that we can’t see any other way. It is a great way to make hypotheses that can be tested to make scientific discoveries.

PaCMAP is particularly useful because it preserves the structure of high dimensional data better than previous methods like t-SNE and UMAP; PaCMAP’s results are just more trustworthy. The students who developed it—Yingfan Wang and Haiyang Huang—figured out that they could preserve both large-scale global structure of the data as well as small-scale local structure of the data simultaneously. This is useful for researchers in bioinformatics and biology since it leads to less false discoveries that they could waste a lot of time investigating.

PaCMAP is used in many different fields—it’s used in everything from ecology to marketing to the study of people’s names. It had over 43K downloads last month from PyPI alone, which is only one of several ways it can be accessed. My students Yiyang Sun and Gaurav Parikh have been doing a great job maintaining it, and it has a great ecosystem of users that help us, too!

What’s another important trend in your field we should be keeping an eye on?

An important thing to remember about interpretable machine learning is that it is not the same as explainable machine learning. In explainable machine learning (XAI), the goal is to “explain” predictions of a black box model. The explanations are incomplete and sometimes flawed, and such explanations cannot be used in high stakes decisions. Interpretable machine learning is different. There is no black box at all, and the models are constrained so that humans can directly understand their reasoning processes.

Unfortunately, people get confused between these subfields since they have names that mean the same thing in English. Plus, many people doing explainability research don’t use consistent terminology. But they are different! One of them (interpretability) can be used in high stakes decisions and the other one (explainability) cannot. What I have been concerned about for a long time is that people will forego designing interpretable models for high stakes decisions when they think they can just design black box models and “explain” them instead.

Another important trend in my field is trying to figure out what to do with LLMs. So far, they have mostly defied being explained. The closest we’ve come is to ask LLMs to use interpretable tools when they can, so that more of their reasoning process can be understood. But LLMs are hard to train, which makes it difficult to do research in that area.

 

 

 

Cynthia D. Rudin is a professor who leads the Interpretable Machine Learning Lab at Duke University. Her lab, which seeks to design predictive machine learning models that people can understand, focuses on areas including healthcare, criminal justice, and energy reliability.

Among her honors, she has received the Squirrel Award for Artificial Intelligence from the Association for the Advancement of Artificial Intelligence (AAAI), as well as the IJCAI John McCarthy Award.  Rudin was recently named an ACM Fellow for contributions to and leadership in interpretable machine learning and societal applications.