No worries! Afterthoughts from UseR 2018


This year the UseR conference took place in Brisbane, Australia. UseR is my favorite conference and this one was mine 11th (counting from Dortmund 2008). 
Every UseR is unique. Every UseR is great. But my feelings are that European UseRs are (on average) more about math, statistics and methodology while US UseRs are more about big data, data science, technology and tools. 

So, how was the one in Australia? Was it more similar to Europe or US?

IMHO – neither of them. 
This one was (for me) about welcoming of new users, being open for diversified community, being open for changes, caring about R culture. Some footmarks of these values were present in most keynotes.

Talking about keynotes. All of them were great, but the ,,Teaching R to New Users” given by Roger Peng was outstanding. I will use the video or the essay as the MUST READ material for students in my R programming classes.

Venue, talks, atmosphere were great as well (thanks to the organizing crew led by Di Cook). Lots of people (including myself) spend time around the hex wall looking for their favorite packages (here you will read more about it ). There was an engaging team exercise during the conference diner (how much your table knows about R). The poster sessions was being handled on TV screens, therefore some posters were interactive (Miles McBain had poster related to R and Virtual Reality, cool). 

Last but not least, there was a great mixture of contributed talks and workshops. Everyone could find something for himself. And even too often it was hard to choose between few tempting options (fortunately, talks are recorded). 
Here I would like to mention three talks I found inspiring.

,,The Minard Paradox” given by Paul Murrel was refreshing. 
One may think nowadays we are so good in data vis, with all these shiny tools and interactive widgets. Yet Paul showed how hard it is to reproduce great works like Minard’s Map even in the cutting edge software (i.e. R). God is in the detail. Watch Paul’s talk here.

,,Data Preprocessing using Recipes” given by Max Kuhn touched an important, jet often neglected truth: Columns in the source data are unnecessary final features. Between ‘read the data’ and ‘fit the model’ there is an important process of feature engineering. This process needs to be reproducible, needs to be based on some well planned grammar. The recipes package helps here. Find the recipes talk here (tutorial is also recorded)

,,Glue strings to data in R” given by James Hester shows a package that is doing only one thing (glue strings) but is doing it extremely well. I have not expected 20 minutes of absorbing talk focused only on gluing strings. Yet, this is my third favourite. Watch it here.

David Smith shared his highlights here. You will find there quite a collection of links.

Videos for recorded talks, keynotes and tutorials are on R consortium youtube.

Local Goodness-of-Fit Plots / Wangkardu Explanations – a new DALEX companion

The next DALEX workshop will take place in 4 days at UseR. In the meantime I am working on a new explainer for a single observation.
Something like a diagnostic plot for a single observation. Something that extends Ceteris Paribus Plots. Something similar to Individual Conditional Expectation (ICE) Plots. An experimental version is implemented in ceterisParibus package.
 
Intro

For a single observation, Ceteris Paribus Plots (What-If plots) show how predictions for a model change along a single variable. But they do not tell if the model is well fitted around this observation.

Here is an idea how to fix this:
(1) Take N points from validation dataset, points that are closest to a selected observation (Gower distance is used by default).
(2) Plot N Ceteris Paribus Plots for these points,
(3) Since we know the true y for these points, then we can plot model residuals in these points.
 
Examples

Here we have an example for a random forest model. The validation dataset has 9000 observations. We use N=18 observations closest to the observation of interest to show the model stability and the local goodness-of-fit.


(click to enlarge)

The empty circle in the middle stands for the observation of interest. We may read its surface component (OX axis, around 85 sq meters), and the model prediction (OY axis, around 3260 EUR).
The thick line stands for Ceteris Paribus Plot for the observation of interest.
Grey points stands for 18 closest observations from the validation dataset while grey lines are their Ceteris Paribus Plots. 
Red and blue lines stand for residuals for these neighbours. Corresponding true values of y are marked with red and blue circles. 

Red and blue intervals are short and symmetric so one may say that the model is well fitted around the observation of interest.
Czytaj dalej Local Goodness-of-Fit Plots / Wangkardu Explanations – a new DALEX companion