MinechaRts #1 (Minecraft + R + Edgar Anderson’s Iris Data)

How to use R to draw 3D scatterplots in Minecraft? Let’s see.

Minecraft is a game about placing blocks and going on adventures (source). Blocks are usually placed by players but there are add-ons that allow to add/modify/remove blocks through external API.
And this feature is being used in educational materials that show how to use Minecraft to learn Python (or how to use Python to modify Minecraft worlds, see this book for example). You need to master loops to build a pyramid or a cube. And you need to touch some math to build an igloo or a fractal. Find a lot of cool examples by googling ’minecraft python programming’.

So, Python+Minecraft are great, but how to do similar things in R?
You need to do just three things:

  1. Install the Spigot Minecraft Server along with all required dependencies. The detailed instruction how to do this is here.
  2. Create a socket connection to the Minecraft Server port 4711. In R it’s just a single line
    conn <- socketConnection(host="localhost", port = 4711, blocking=T, server=F, open="r+")
  3. Send building instructions through this connection. For example
    writeLines("world.setBlocks(0,70,0,10,80,10, 46)", conn)

    will create a cube 11x11x11 made of TNT blocks (id=46 is for TNT, see the full list here) placed between coordinates (0,70,0) (10,80,10). You can add and remove blocks, move players, spawn entities and so on. See a short overview of the server API.

The R code below creates a connection to the minecraft server, builds a flat grassland around the spawning point and plots 3d scatterplot with 150 blocks (surprise surprise, blocks coordinates correspond to Sepal.Length, Sepal.Width, Petal.Length variables from the iris dataset).

# A useful function
addBlock <- function(x, y, z, b, conn) {
  writeLines(paste0("world.setBlock(",round(x),",",round(y),",",round(z),",",round(b),")"), conn)

# Connect to the server (install and run https://www.nostarch.com/download/LTPWM_ch01_update_online.pdf)
conn <- socketConnection(host="localhost", port = 4711, blocking=TRUE, server=FALSE, open="r+")
baseline <- 70

# Add two layers of grass and wipe out everything above
writeLines(paste0("world.setBlocks(-80,",baseline-2,",-80,180,",baseline,",180,2)"), conn)
writeLines(paste0("world.setBlocks(-50,",baseline+1,",-50,150,",baseline+50,",150,0)"), conn)

# And now add blocks based on iris data
for (i in 1:nrow(iris)) {
           baseline + 2 + 10*(iris[i,"Sepal.Width"] - min(iris[,"Sepal.Width"])),

If you do not like scatterplots try barcharts 😉

A link that can tell more than dozens of lines of R code – what’s new in archivist?

Can you spot the difference between this plot:

And this one:

You are right! The latter has an embedded piece of R code.
What for?

It’s a call to a function aread from archivist – a package that manages external copies of R objects. This piece of code was added by the function addHooksToPrint(), that enriches knitr reports in links to all objects of a given class, e.g. ggplot.

You can copy this hook to your R session and you will automagically recreate this plot in your local session.


But it’s not all.
Actually here the story is just beginning.

Don’t you think, that this plot is badly annotated? It is not clear what is being presented. Something about terrorism, but for which year, are these results for all countries or there is some filtering? What is on axes? Why the author skip all these important information? Why he does not include the full R code that explains how this plot was created?

Actually, having this single link you can get answers for all these questions.

First, let’s download the plot and extract the data out of it.

pl <- archivist::aread('pbiecek/SmarterPoland_blog/arepo/e44de65f1e56ea42d2df2598c083d1ce')
## ceed21e997efd00940cdbcba497559c7

This data object is also in the repository so I can download it with the aread function.

dat <- archivist::aread('pbiecek/SmarterPoland_blog/arepo/ceed21e997efd00940cdbcba497559c7') head(dat)
#          country_txt sum_kills sum_wounds    n
# 1        Afghanistan      6208       6958 1926
# 2            Algeria        21         19   16
# 3            Bahrain         5         22   18
# 4         Bangladesh        76        695  465
# 5 Bosnia-Herzegovina         4          6    6
# 6       Burkina Faso         6          9    5

But here is the coolest part.
Having an object one can (in some cases) examine the history of this objects, i.e. check how it was created. Here is how to do this:

archivist::ahistory(md5hash = 'pbiecek/SmarterPoland_blog/arepo/ceed21e997efd00940cdbcba497559c7')

#   small_data                           [d2ad05ac3e93aeaca02f57aa4f9f58bf]
#-> dplyr::filter(iyear == "2015")       [01205474e0515ad29d3bae33ad4ba821]
#-> group_by(country_txt)                [e0d9c060107803889fbc7ffdea7a23f7]
#-> dplyr::summarise(sum_kills = sum(nkill, na.rm = TRUE), 
#                     sum_wounds = sum(nwound, na.rm = TRUE), 
#                     n = n())           [a78cf8a8e9cf10bdb1158af38422723d]
#-> dplyr::filter(sum_kills > 2, 
#                 sum_wounds > 2)        [ceed21e997efd00940cdbcba497559c7]

Now you can see what operations have been used to create data used in this plot. It’s clear how the aggregation has been done, what is the filtering condition and so on.
Also you have hashes to all objects created along the way, co you can download the partial results. This history is being recorded with an operator `%a%` that is working in a similar fashion to `%>%`.

We have the plot, now we know what is being presented, let’s change some annotations.

pl + ggtitle("Victims of terrorism in 2015\nCountries with > 2 Fatalities") + theme_bw()

The ahistory() function for remote repositories was introduced to archivist in version 2.1 (on CRAN since yesterday). Other new feature is the support for repositories in shiny applications. Now you can enrich your app in links to copies of R objects generated by shiny.
You can find more information about these and other features in the useR2016 presentation about archivist (html, video).
Or look for Marcin Kosiński talk during the european R users meeting in Poznań.

The data presented in here is just a small fraction of data from National Consortium for the Study of Terrorism and Responses to Terrorism (START) (2016) retrieved from http://www.start.umd.edu/gtd.

Shiny + archivist = reproducible interactive exploration

Shiny is a great tool for interactive exploration (and not only for that). But, due to its architecture, all objects/results that are generated are stored in a separate R process so you cannot access them easily from your R console.

In some cases you may wish to retrieve a model or a plot that you have just generated. Or maybe just wish to store all R objects (plots, data sets, models) that have been ever generated by your Shiny application. Or maybe you would like to do some further tuning or validation of a selected model or plot. Or maybe you wish to collect and compare all lm() models ever generated by your app? Or maybe you would like to have an R code that will recover given R object in future.

So, how to do this?

Czytaj dalej Shiny + archivist = reproducible interactive exploration

eRum 2016 — last days of call for papers

Only 6 days left to the end of the call for papers for eRum 2016! Register and submit your talk proposal at www.erum.ue.poznan.pl.

European R users meeting will be a great place to learn and share ideas on R. Moreover, we have already confirmed the following invited talks:

  • Browse Till You Die: Scalable Hierarchical Bayesian Modeling of cookie deletion — Jakub Glinka, GfK Data Lab,
  • Design of Experiments in R — Ulrike Grömping, Beuth University of Applied Sciences Berlin,
  • Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm and its R interface — Marek Gagolewski, Systems Research Institute, Polish Academy of Sciences,
  • Addressing the Gender Gap in the R Project — Heather Turner, University of Warwick,
  • Heteroscedastic Discriminant Analysis and its integration into „mlR” package for uniform machine learning — Katarzyna Stąpor, Institute of Computer Science, Silesian Technical University,
  • How to use R to hack the publicly available data about skills of 2M+ worldwide students? — Przemysław Biecek, University of Warsaw,
  • A survey of tools for Bayesian data analysis in R — Rasmus Bååth, Lund University,
  • Geo-located point data: measurement of agglomeration and concentration — Katarzyna Kopczewska, University of Warsaw.

European R users meeting is an international conference that aims at integrating users of the R language. eRum 2016 will be a good chance to exchange experiences, broaden knowledge on R and collaborate. One can participate in eRum 2016:
(1) with a regular oral presentation,
(2) with a lightning talk,
(3) with a poster presentation,
(4) or attending without presentation nor poster.

Due to space available at the conference venue, organizers set limit of participants at 250 (only 97 left!).

Frequency analysis challenge – a console-based game for R/python

Six months ago we’ve introduced ’The Proton’ – a console based R game with six data wrangling puzzles. Around 15-30 minutes of fun with data. The game is on CRAN in the package `BetaBit`.

And just few days ago we’ve added a second game – `frequon()`. Eight puzzles related with frequency analysis of encoded messages.

It’s much harder than `proton`.
Expect around two hours of playing with ciphers.
Try it yourself. To get the R version just type


You can also try the experimental python version.

pip install --upgrade https://github.com/BetaAndBit/BetaBitPython/archive/master.tar.gz

If you like these games and going to attend useR2016 (June, Stanford, USA) or eRum2016 (October, Poznań, Poland) feel free to ping me (Przemyslaw.Biecek).

All your models belong to us: how to combine package archivist and function trace()

Let’s see how to collect all linear regression models that you will ever create in R.

It’s easy with the trace() function. A really powerful, yet not that popular function, that allows you to inject any R code in any point of a body of any function.
Useful in debugging and have other interesting applications.
Below I will show how to use this function to store a copy of every linear model that is created with lm(). In the same way you may store copies of plots/other models/data frames/anything.

To store a persistent copy of an object one can simply use the save() function. But we are going to use the archivist package instead. It stores objects in a repository and give you some nice features, like searching within repository, sharing the repository with other users, checking session info for a particular object or restoring packages to versions consistent with a selected object.

To use archivist with the trace() function you just need to call two lines. First one will create an empty repo, and the second will execute ‘saveToLocalRepo()’ at the end of each call to the lm() function.

# create an empty repo
createLocalRepo ("allModels", default = TRUE)
# add tracing code
trace(lm, exit = quote(saveToRepo(z)))

Now, at the end of every lm() function the fitted model will be stored in the repository.
Let’s see this in action.

> lm(Sepal.Length~., data=iris) -> m1
Tracing lm(Sepal.Length ~ ., data = iris) on exit 

> lm(Sepal.Length~ Petal.Length, data=iris) -> m1
Tracing lm(Sepal.Length ~ Petal.Length, data = iris) on exit 

> lm(Sepal.Length~-Species, data=iris) -> m1
Tracing lm(Sepal.Length ~ -Species, data = iris) on exit

All models are stored as rda files in a disk based repository.
You can load them to R with the asearch() function.
Let’s get all lm objects, apply the AIC function to each of them and sort along AIC.

> asearch("class:lm") %>% 
    sapply(., AIC) %>% 
4c3ae060f3aaa2509b2faf63d857358e 5c5751e36b31b2251d2767d96993320a 
                        79.11602                        160.04042 

The aread() function will download the selected model.

> aread("4c3ae060f3aaa2509b2faf63d857358e")

lm(formula = Sepal.Length ~ ., data = iris)

      (Intercept)        Sepal.Width       Petal.Length        Petal.Width  
           2.1713             0.4959             0.8292            -0.3152  
Speciesversicolor   Speciesvirginica  
          -0.7236            -1.0235 

Now you can just create model after model and if needed they all can be restored.

Read more about the archivist here: http://pbiecek.github.io/archivist/.

Call for Papers: eRum 2016 (European R users meeting)


The European R users meeting (eRum) is an international conference that aims at integrating users of the R language. eRum 2016 will be held on October 13 and 14, 2016, in Poznan, Poland at the Poznan University of Economics and Business. We already confirm the following invited speakers: Rasmus Bååth, Romain Francois, Ulrike Grömping, Matthias Templ, Heather Turner, Przemysław Biecek, Marek Gągolewski, Jakub Glinka, Katarzyna Kopczewska and Katarzyna Stąpor.

We would like to bring together participants from around the world. It will be a good chance to exchange experiences, broaden knowledge of R and collaborate. The conference will cover topics including:

• Bayesian Statistics,
• Bioinformatics,
• Economics, Finance and Insurance,
• High Performance Computing,
• Reproducible Research,
• Industrial Applications,
• Statistical Learning with Big Data,
• Spatial Statistics,
• Teaching,
• Visualization & Graphics,
• and many more.

We invite you to participate in eRum 2016:
(1) with a regular oral presentation,
(2) with a lightning talk,
(3) with a poster presentation,
(4) or without a presentation or poster.

Due to limited space at the conference venue, the organizers have set a limit for the number of participants at 250 and persons with regular/lighting talks/posters will be considered first and those attending without a presentation or poster will be handled on a first-come, first-served basis.

Please make your submission online at http://erum.ue.poznan.pl/#register. The submission deadline is June 15, 2016. Submitters will be notified via email by July 1, 2016 of acceptance. Additional details will be announced via the eRum conference website.

Why should you backup your R objects?

There is a saying that there are two groups of people: those who are already doing backups and those who will. So, how this is linked with reproducible research and R?

If your work is to analyze data then you often face a need to restore/recreate/update results that you have generated some time ago.
You may think ,,I have a knitr reports for everything!”. That’s great! It will save you a lot of troubles. But to have 100% of warranty for exactly same results you need to have exactly the same environment and same versions of packages.

Do you know how many R packages have been updated during last 12 months?

I took list of top 20 R packages from here, scrap dates of their current and older CRAN releases from here and generate a plot with dates of submissions to CRAN sorted along date of last submission.

Czytaj dalej Why should you backup your R objects?