ML models: What they can’t learn?

What I love in conferences are the people, that come after your talk and say: It would be cool to add XYZ to your package/method/theorem.

After the eRum (great conference by the way) I was lucky to hear from Tal Galili: It would be cool to use DALEX for teaching, to show how different ML models are learning relations.

Cool idea. So let’s see what can and what cannot be learned by the most popular ML models. Here we will compare random forest against linear models against SVMs.
Find the full example here. We simulate variables from uniform U[0,1] distribution and calculate y from following equation

In all figures below we compare PDP model responses against the true relation between variable x and the target variable y (pink color). All these plots are created with DALEX package.

For x1 we can check how different models deal with a quadratic relation. The linear model fails without prior feature engineering, random forest is guessing the shape but the best fit if found by SVMs.

With sinus-like oscillations the story is different. SVMs are not that flexible while random forest is much closer.

Turns out that monotonic relations are not easy for these models. The random forest is close but event here we cannot guarantee the monotonicity.

The linear model is the best one when it comes to truly linear relation. But other models are not that far.

The abs(x) is not an easy case for neither model.

Find the R codes here.

Of course the behavior of all these models depend on number of observation, noise to signal ratio, correlation among variables and interactions.
Yet is may be educational to use PDP curves to see how different models are learning relations. What they can grasp easily and what they cannot.