DALEX for keras and parsnip

DALEX is a set of tools for explanation, exploration and debugging of predictive models. The nice thing about it is that it can be easily connected to different model factories.

Recently Michal Maj wrote a nice vignette how to use DALEX with models created in keras (an open-source neural-network library in python with an R interface created by RStudio). Find the vignette here.
Michal compared a keras model against deeplearning from h2o package, so you can check which model won on the Titanic dataset.

Next nice vignette was created by Szymon Maksymiuk. In this vignette Szymon shows how to use DALEX with parsnip models (parsnip is a part of the tidymodels ecosystem, created by Max Kuhn and Davis Vaughan). Models like boost_tree, mlp and svm_rbf are competing on the Titanic data.

These two new vignettes add to our collection how to use DALEX with mlr, caret, h2o and others model factories.

Explore the landscape of R packages for automated data exploration

Do you spend a lot of time on data exploration? If yes, then you will like today’s post about AutoEDA written by Mateusz Staniak.

If you ever dreamt of automating the first, laborious part of data analysis when you get to know the variables, print descriptive statistics, draw a lot of histograms and scatter plots – you weren’t the only one. Turns out that a lot of R developers and users thought of the same thing. There are over a dozen R packages for automated Exploratory Data Analysis and the interest in them is growing quickly. Let’s just look at this plot of number of downloads from the official CRAN repository.

Replicate this plot with

stats <- archivist::aread("mstaniak/autoEDA-resources/autoEDA-paper/52ec")
 
stat <- stats %>%
  filter(date > "2014-01-01" ) %>%
  arrange(date) %>%
  group_by(package) %>%
  mutate(cums = cumsum(count),
         packages = paste0(package, " (",max(cums),")"))
 
stat$packages <- reorder(stat$packages, stat$cums, function(x)-max(x))
 
ggplot(stat, aes(date, cums, color = packages)) +
  geom_step() +
  scale_x_date(name = "", breaks = as.Date(c("2014-01-01", "2015-01-01",
                                           "2016-01-01", "2017-01-01",
                                           "2018-01-01", "2019-01-01")),
               labels = c(2014:2019)) +
  scale_y_continuous(name = "", labels = comma) + 
  DALEX::theme_drwhy() +
  theme(legend.position = "right", legend.direction = "vertical") +
  scale_color_discrete(name="") +
  ggtitle("Total number of downloads", "Based on CRAN statistics")

New tools arrive each year with a variety of functionalities: creating summary tables, initial visualization of a dataset, finding invalid values, univariate exploration (descriptive and visual) and searching for bivariate relationships.

We compiled a list of R packages dedicated to automated EDA, where we describe twelve packages: their capabilities, their strong aspects and possible extensions. You can read our review paper on arxiv: https://arxiv.org/abs/1904.02101.

Spoiler alert: currently, automated means simply fast. The packages that we describe can perform typical data analysis tasks, like drawing bar plot for each categorical feature, creating a table of summary statistics, plotting correlations, with a single command. While this speeds up the work significantly, it can be problematic for high-dimensional data and it does not take the advantage of AI tools for actual automatization. There is a lot of potential for intelligent data exploration (or model exploration) tools.

More extensive list of software (including Python libraries and web applications) and papers is available on Mateusz’s GitHub. Researches can follow our autoEDA project on ResearchGate.

Kto myśli na rok do przodu sieje zboże (…) a kto myśli na wiele wiele lat do przodu wychowuje młodzież

Dzisiaj rozpoczyna się strajk nauczycieli. Gorąco kibicuję nauczycielom. I jako rodzic dzieci w wieku szkolnym, i jako nauczyciel akademicki, i jako entuzjasta edukacji dzieci i młodzieży. Bardzo dużo zawdzięczam moim nauczycielom, a los zetknął mnie z wieloma pozytywnie zakręconymi pasjonatami.

W czasach gospodarki opartej na wiedzy to edukacja jest sprawą kluczową. A nie ma dobrej edukacji bez pozytywnej selekcji, którą zapewnić mogą dobre warunki pracy. Dobre zarówno jeżeli chodzi o wynagrodzenia jak i stabilne podstawy programowe, możliwości rozwoju i odpowiednie wyposażenie szkół.
Dlatego popieram strajkujących nauczycieli.

Przemysław Biecek

Btw: Poniższy wykres z twittera KPRM ma współczynnik Lie-Factor przekraczający 350%. Jednak warto zwiększyć liczbę godzin matematyki w szkołach.

iBreakDown: faster, prettier and more precise explanations for predictive models (with interactions)

LIME and SHAP are two very popular methods for instance level explanations of machine learning models (XAI).
They work nicely for images and text inputs, but share similar weakness in case of tabular data: explanations are additive while complex models are (sometimes) not. iBreakDown addresses this problem.

iBreakDown is a a successor of the breakDown package. Yesterday it has arrived on CRAN. Key new features are:

– It identifies and shows feature interactions (if there are local interactions in the model).
– It is much faster. For additive explanations the complexity is O(p) instead of O(p^2).
– The plotD3 function creates an interactive D3-based break-down plot (thanks to r2d3).
– iBreakDown has a new design, created by Hanna Dyrcz. We will have a talk about it ,,Machine learning meets design. Design meets machine learning.” at satRdays. Try the new theme `theme_drwhy()`!.
– It shows explanation level uncertainty – how good are explanations?

A methodology behind this package is described in the iBreakDown: Uncertainty of Model Explanations for Non-additive Predictive Models.

A nice titanic-powered use-case is described in the titanic vignette.

An example of the D3 interactive explainer is here.

Some intuition is introduced in the Visual Exploration, Explanation and Debugging (working version, still in progress).

iBreakDown is a part of the DrWhy.AI family of explainers consistent with the DALEX.

Let us know if you like it. Feel free to create a pull request with new features, add issue with new idea or star the github repository if you like this package.