Tell Me a Story: How to Generate Textual Explanations for Predictive Models

TL;DR: If you are going to explain predictions for a black box model you should combine statistical charts with natural language descriptions. This combination is more powerful than SHAP/LIME/PDP/Break Down charts alone. During this summer Adam Izdebski implemented this feature for explanations generated in R with DALEX library. How he did it? Find out here:

Long version:
Amazing things were created during summer internships at MI2DataLab this year. One of them is the generator of natural language descriptions for DALEX explainers developed by Adam Izdebski.

Is text better than charts for explanations?
Packages from DrWhy.AI toolbox generate lots of graphical explanations for predictive models. Available statistical charts allow to better understand how a model is working in general (global perspective) or for a specific prediction (local perspective).
Yet for domain experts without training in mathematics or computer science, graphical explanations may be insufficient. Charts are great for exploration and discovery, but for explanations they introduce some ambiguity. Have I read everything? Maybe I missed something?
To address this problem we introduced the describe() function, which
automatically generates textual explanations for predictive models. Right now these natural language descriptions are implemented in R packages by ingredients and iBreakDown.

Insufficient interpretability
Domain experts without formal training in mathematics or computer science can often find statistical explanations as hard to interpret. There are various reasons for this. First of all, explanations are often displayed as complex plots without instructions. Often there is no clear narration or interpretation visible. Plots are using different scales mixing probabilities with relative changes in model’s prediction. The order of variables may be also misleading. See for example a Break-Down plot.

The figure displays the prediction generated with Random Forest that a selected passenger survived the Titanic sinking. The model’s average response on titanic data set (intercept) is equal to 0.324. The model predicts that the selected passenger survived with probability 0.639. It also displays all the variables that have contributed to that prediction. Once the plot is described it is easy to interpret as it posses a very clear graphical layout.
However, interpreting it for the first time may be tricky.

Properties of a good description
Effective communication and argumentation is a difficult craft. For that reason, we refer to winning debates strategies as for guiding in generating persuasive textual explanations. First of all, any description should be
intelligible and persuasive. We achieve this by using:

  • Fixed structure: Effective communication requires a rigid structure. Thus we generate descriptions from a fixed template, that always includes a proper introduction, argumentation and conclusion part. This makes the description more predictable, hence intelligible.
  • Situation recognition: In order to make a description more trustworthy, we begin generating the text by identifying one of the scenarios, that we are dealing with. Currently, the following scenarios are available:
    • The model prediction is significantly higher than the average model prediction. In this case, the description should convince the reader why the prediction is higher than the average.
    • The model prediction is significantly lower than the average model prediction. In this case, the description should convince the reader why the prediction is lower than the average.
    • The model prediction is close to the average. In this case the description should convince the reader that either: variables are contradicting each other or variables are insignificant.

Identifying what should be justified, is a crucial step for generating persuasive descriptions.

Description’s template for persuasive argumentation
As noted before, to achieve clarity we generate descriptions with three separate components: an introduction, an argumentation part, and a summary.

An introduction should provide a claim. It is a basic point that an arguer wishes to make. In our case, it is the model’s prediction. Displaying the additional information about the predictions’ distribution helps to place it in a context — is it low, high or close to the average.

An argumentation part should provide evidence and reason, which connects the evidence to the claim. In normal settings this will work like that: This particular passenger survived the catastrophe (claim) because it was a child (evidence no. 1) and children were evacuated from the ship in the first order as in the phrase women and children first. (reason no. 1) What is more, the children were traveling in the 1-st class (evidence no. 2) and first-class passengers had the best cabins, which were close to the rescue boats. (reason no. 2).

The tricky part is that we are not able to make up a reason automatically, as it is a matter of context and interpretation. However what we can do is highlight the main evidence, that made the model produce the claim. If a model is making its’ predictions for the right reason, evidences should make much sense and it should be easy for the reader to make a story and connect the evidence to the claim. If the model is displaying evidence, that makes not much sense, it also should be a clear signal, that the model may not be trustworthy.

A summary is just the rest of the justification. It states that other pieces of evidence are with less importance, thus they may be omitted. A good rule of thumb is displaying three most important evidence, not to make the picture too complex. We can refer to the above scheme as to creating relational arguments as in winning debates guides.

The logic described above is implemented in ingredients and iBreakDown packages.

For generating a description we should pass the explanation generated by ceteris_paribus() or break_down() or shap() to the describe() function.

# Random Forest predicts, that the prediction for the selected instance is 0.639 which is higher than the average.
# The most important variables that increase the prediction are gender, fare.
# The most important variable that decrease the prediction is class.
# Other variables are with less importance. The contribution of all other variables is -0.063 .

There are various parameters that control the display of the description making it more flexible, thus suited for more applications. They include:

  • generating a short version of descriptions,
  • displaying predictions’ distribution details,
  • generating more detailed argumentation.

While explanations generated by iBreakDown are feature attribution explanations that aim at providing interpretable reasons for the model’s prediction, explanations generated by ingredients are rather speculative. In fine, they explain how the model’s prediction would change if we perturb the instance being explained. For example, ceteris_paribus() explanation explores how would the prediction change if we change the values of a single feature while keeping the other features unchanged.

describe(ceteris_paribus_explanation, variables = “age”)
# For the selected instance Random Forest predicts that , prediction is equal to 0.699.
# The highest prediction occurs for (age = 16), while the lowest for (age = 74).
# Breakpoint is identified at (age = 40).
# Average model responses are *lower* for variable values *higher* than breakpoint.

Applications and future work

Generating natural language explanations is a sensitive task, as the interpretability always depends on the end user’s cognition. For this reason, experiments should be designed to assess the usefulness of the descriptions being generated. Furthermore, more vocabulary flexibility could be added, to make the descriptions more human alike. Lastly, descriptions could be integrated with a chatbot that would explain predictions interactively, using the framework described here. Also, better discretization techniques can be used for generating better continuous ceteris paribus and aggregated profiles textual explanations.

modelDown is now on CRAN!

The modelDown package turns classification or regression models into HTML static websites.
With one command you can convert one or more models into a website with visual and tabular model summaries. Summaries like model performance, feature importance, single feature response profiles and basic model audits.

The modelDown uses DALEX explainers. So it’s model agnostic (feel free to combine random forest with glm), easy to extend and parameterise.

Here you can browse an example website automatically created for 4 classification models (random forest, gradient boosting, support vector machines, k-nearest neighbours). The R code beyond this example is here.

Fun facts:

archivist hooks are generated for every documented object. So you can easily extract R objects from the HTML website. Try


– session info is automatically recorded. So you can check version of packages available at model development (

– This package is initially created by Magda Tatarynowicz, Kamil Romaszko, Mateusz Urbański from Warsaw University of Technology as a student project.

iBreakDown: faster, prettier and more precise explanations for predictive models (with interactions)

LIME and SHAP are two very popular methods for instance level explanations of machine learning models (XAI).
They work nicely for images and text inputs, but share similar weakness in case of tabular data: explanations are additive while complex models are (sometimes) not. iBreakDown addresses this problem.

iBreakDown is a a successor of the breakDown package. Yesterday it has arrived on CRAN. Key new features are:

– It identifies and shows feature interactions (if there are local interactions in the model).
– It is much faster. For additive explanations the complexity is O(p) instead of O(p^2).
– The plotD3 function creates an interactive D3-based break-down plot (thanks to r2d3).
– iBreakDown has a new design, created by Hanna Dyrcz. We will have a talk about it ,,Machine learning meets design. Design meets machine learning.” at satRdays. Try the new theme `theme_drwhy()`!.
– It shows explanation level uncertainty – how good are explanations?

A methodology behind this package is described in the iBreakDown: Uncertainty of Model Explanations for Non-additive Predictive Models.

A nice titanic-powered use-case is described in the titanic vignette.

An example of the D3 interactive explainer is here.

Some intuition is introduced in the Visual Exploration, Explanation and Debugging (working version, still in progress).

iBreakDown is a part of the DrWhy.AI family of explainers consistent with the DALEX.

Let us know if you like it. Feel free to create a pull request with new features, add issue with new idea or star the github repository if you like this package.

Bank będzie musiał wyjaśnić… czyli o wyjaśnialnych modelach predykcyjnych

Czym są wyjaśnialne modele predykcyjne?

Interpretowalne uczenie maszynowe (IML od Interpretable Machine Learning) czy wyjaśnialna syntetyczna inteligencja (XAI od eXplainable Artificial Intelligence) to względnie nowa, a ostatnio bardzo szybko rozwijająca się, gałąź uczenia maszynowego.

W skrócie chodzi o to, by konstruować takie modele, dla których człowiek możne zrozumieć skąd biorą się decyzje modelu. Złożone modele typu lasy losowe czy głębokie sieci są ok, o ile potrafimy w jakiś sposób wyjaśnić co wpłynęło na konkretną decyzję modelu.

Po co?

W ostatnich latach często uczenie maszynowe było uprawiane ,,w stylu Kaggle”. Jedynym kryterium oceny modelu była skuteczność modelu na jakimś ustalonym zbiorze testowym. Takie postawienie sprawie często zamienia się w bezsensowne żyłowanie ostatnich 0.00001% accuracy na zbiorze testowym.

Tak wyżyłowane modele najczęściej epicko upadają w zderzeniu z rzeczywistością. Ja na prezentacjach lubię wymieniać przykłady Google Flu, Watson for Oncology, Amazon CV, COMPAS i recydywizm czy przykłady z książki ,,Broń matematycznej zagłady”. Ale lista jest znacznie dłuższa.

Dlaczego to takie ważne?

W lutym fundacja Panoptykon pisała Koniec z „czarną skrzynką” przy udzielaniu kredytów. W ostatni czwartek (21 marca) w gazecie Bankier można było znaleźć ciekawy artykuł Bank będzie musiał wyjaśnić, dlaczego odmówił kredytu, w której opisuje niektóre konsekwencje ustawy przyjętej przez Senat.

Przykładowy cytat:
,,Ustawa wprowadza także przepis nakazujący bankom przedstawienie klientowi wyjaśnienia dotyczącego tego, które dane osobowe miały wpływ na ostatecznie dokonaną ocenę zdolności kredytowej. Obowiązek ten będzie dotyczył zarówno sytuacji, w której decyzja ta została podjęta w pełni zautomatyzowanym procesie, na podstawie tzw. algorytmów, jak i sytuacji, w której w podejmowaniu decyzji brał udział także człowiek”.

Wygląda więc na to, że niedługo wyjaśnialne uczenie maszynowe spotka nas w okienkach bankowych przy okazji decyzji kredytowych.

Nie tylko banki

Okazuje się, że temat wyjaśnialności w czwartek omawiany był nie tylko w Senacie. Akurat byłem tego dnia na bardzo ciekawej konferencji Polish Business Analytics Summit, na której dr Andrey Sharapov opowiadał o tym jak Lidl wykorzystuje techniki XAI i IML do lepszego wspomagania decyzji.

Zbudować model jest prosto, ale pokazać wyniki modelu biznesowi, tak by ten wiedział jak na ich podstawie podejmować lepsze decyzje – to jest wyzwanie dla XAI. Andrey Sharapov prowadzi na LinkedIn ciekawą grupę na którą wrzuca materiały o wyjaśnialnym uczeniu maszynowym. Sporo pozycji można też naleźć na tej liście.

Na poniższym zdjęciu jest akurat przykład wykorzystania techniki Break Down (made in MI2 Data Lab!!!) do wspomagania decyzji dotyczących kampanii marketingowych.

Warszawa po raz trzeci

Aż trudno uwierzyć w ten zbieg okoliczności, ale tego samego dnia (tak, wciąż piszę o 21 marca) na Spotkaniach Entuzjastów R profesor Marco Robnik omawiał różne techniki wyjaśnialności opartej o permutacje.

Skupił się na technika EXPLAIN i IME, ale było też o LIME i SHAP a na niektórych slajdach pojawiał się nasz DALEX i live (choć pewnie my byśmy już reklamowani nowsze rozwiązanie Mateusza Staniaka, czyli pakiet localModels).

Btw, spotkanie było nagrywane, więc niedługo powinno być dostępne na youtube.

Gdzie mogę dowiedzieć się więcej?

Wyjaśnialne uczenie maszynowe to przedmiot badań znacznej części osób z MI2DataLab. Rozwijamy platformę do automatycznej analizy, eksploracji i wyjaśnień dla modeli predykcyjnych DrWhy.AI.

Niedługo napisze więcej o materiałach i okazjach podczas których można dowiedzieć się więcej o ciekawych zastosowaniach technik wyjaśnialnego uczenia maszynowego w finansach, medycynie spersonalizowanej czy innych ciekawych miejscach.

DALEX has a new skin! Learn how it was designed at gdansk2019.satRdays

DALEX is an R package for visual explanation, exploration, diagnostic and debugging of predictive ML models (aka XAI – eXplainable Artificial Intelligence). It has a bunch of visual explainers for different aspects of predictive models. Some of them are useful during model development some for fine tuning, model diagnostic or model explanations.

Recently Hanna Dyrcz designed a new beautiful theme for these explainers. It’s implemented in the `DALEX::theme_drwhy()` function.
Find some teaser plots below. A nice Interpretable Machine Learning story for the Titanic data is presented here.

Hanna is a very talented designer. So I’m super happy that at the next satRdays @ gdansk2019 we will have a joint talk ,,Machine Learning meets Design. Design meets Machine Learning”.

New plots are available in the GitHub version of DALEX 0.2.8 (please star if you like it/use it. This helps to attract new developers). Will get to the CRAN soon (I hope).

Instance level explainers, like Break Down or SHAP

Instance level profiles, like Ceteris Paribus or Partial Dependency

Global explainers, like Variable Importance Plots

See you at satRdays!