Explore the landscape of R packages for automated data exploration

Do you spend a lot of time on data exploration? If yes, then you will like today’s post about AutoEDA written by Mateusz Staniak.

If you ever dreamt of automating the first, laborious part of data analysis when you get to know the variables, print descriptive statistics, draw a lot of histograms and scatter plots – you weren’t the only one. Turns out that a lot of R developers and users thought of the same thing. There are over a dozen R packages for automated Exploratory Data Analysis and the interest in them is growing quickly. Let’s just look at this plot of number of downloads from the official CRAN repository.

Replicate this plot with

New tools arrive each year with a variety of functionalities: creating summary tables, initial visualization of a dataset, finding invalid values, univariate exploration (descriptive and visual) and searching for bivariate relationships.

We compiled a list of R packages dedicated to automated EDA, where we describe twelve packages: their capabilities, their strong aspects and possible extensions. You can read our review paper on arxiv: https://arxiv.org/abs/1904.02101.

Spoiler alert: currently, automated means simply fast. The packages that we describe can perform typical data analysis tasks, like drawing bar plot for each categorical feature, creating a table of summary statistics, plotting correlations, with a single command. While this speeds up the work significantly, it can be problematic for high-dimensional data and it does not take the advantage of AI tools for actual automatization. There is a lot of potential for intelligent data exploration (or model exploration) tools.

More extensive list of software (including Python libraries and web applications) and papers is available on Mateusz’s GitHub. Researches can follow our autoEDA project on ResearchGate.

Kto myśli na rok do przodu sieje zboże (…) a kto myśli na wiele wiele lat do przodu wychowuje młodzież

Dzisiaj rozpoczyna się strajk nauczycieli. Gorąco kibicuję nauczycielom. I jako rodzic dzieci w wieku szkolnym, i jako nauczyciel akademicki, i jako entuzjasta edukacji dzieci i młodzieży. Bardzo dużo zawdzięczam moim nauczycielom, a los zetknął mnie z wieloma pozytywnie zakręconymi pasjonatami.

W czasach gospodarki opartej na wiedzy to edukacja jest sprawą kluczową. A nie ma dobrej edukacji bez pozytywnej selekcji, którą zapewnić mogą dobre warunki pracy. Dobre zarówno jeżeli chodzi o wynagrodzenia jak i stabilne podstawy programowe, możliwości rozwoju i odpowiednie wyposażenie szkół.
Dlatego popieram strajkujących nauczycieli.

Przemysław Biecek

Btw: Poniższy wykres z twittera KPRM ma współczynnik Lie-Factor przekraczający 350%. Jednak warto zwiększyć liczbę godzin matematyki w szkołach.

iBreakDown: faster, prettier and more precise explanations for predictive models (with interactions)

LIME and SHAP are two very popular methods for instance level explanations of machine learning models (XAI).
They work nicely for images and text inputs, but share similar weakness in case of tabular data: explanations are additive while complex models are (sometimes) not. iBreakDown addresses this problem.

iBreakDown is a a successor of the breakDown package. Yesterday it has arrived on CRAN. Key new features are:

– It identifies and shows feature interactions (if there are local interactions in the model).
– It is much faster. For additive explanations the complexity is O(p) instead of O(p^2).
– The plotD3 function creates an interactive D3-based break-down plot (thanks to r2d3).
– iBreakDown has a new design, created by Hanna Dyrcz. We will have a talk about it ,,Machine learning meets design. Design meets machine learning.” at satRdays. Try the new theme theme_drwhy()!.
– It shows explanation level uncertainty – how good are explanations?

A methodology behind this package is described in the iBreakDown: Uncertainty of Model Explanations for Non-additive Predictive Models.

A nice titanic-powered use-case is described in the titanic vignette.

An example of the D3 interactive explainer is here.

Some intuition is introduced in the Visual Exploration, Explanation and Debugging (working version, still in progress).

iBreakDown is a part of the DrWhy.AI family of explainers consistent with the DALEX.

Let us know if you like it. Feel free to create a pull request with new features, add issue with new idea or star the github repository if you like this package.