Proficiency levels @ PISA and visualisation challenge @ useR!2014

16 days to go for submissions in the DataVis contest at useR!2014 (see contest webpage).
The contest is focused on PISA data and students’ skills. The main variables that reflect pupil skills in math / reading / science are plausible values e.g. columns PV1MATH, PV1READ, PV1SCIE in the dataset.
But, these values are normalized to have mean 500 and sd 100. And it is not that easy to understand what the skill level 600 means and is 12 points in average a big difference. To overcome this PISA has introduced seven proficiency levels (from 0 to 6, see that base on plausible values with cutoffs 358, 420, 482, 545, 607, 669.
It is assumed that, for example, at level 6 ,,students can conceptualize, generalize, and utilize information based on their investigations and modeling of complex problem situations, and can use their knowledge in relatively non-standard contexts”.

So, instead of looking at means we can now take a look at fractions of students at given proficiency level. To have some fun we use sp and rworldmap and RColorBrewer packages to have country shapes instead of bars and dots that are supposed to represent pupils that take part in the study. The down side is that area does not correspond to height so it might be confusing. We add horizontal lines to expose the height.

And here is the R code

library(RColorBrewer) <- map_data(map = "world")
cols <- brewer.pal(n=7, "PiYG")
# read students data from PISA 2012
# directly from URL
con <- url("")
prof.scores <- c(0, 358, 420, 482, 545, 607, 669, 1000)
prof.levels <- cut(student2012$PV1MATH, prof.scores, paste("level", 1:7))
plotCountry <- function(cntname = "Poland", cntname2 = cntname) {
  props <- prop.table(tapply(student2012$W_FSTUWT[student2012$CNT == cntname],
         prof.levels[student2012$CNT == cntname], 
  cntlevels <- rep(1:7, times=round(props*5000))
  cntcontour <-[$region == cntname2,]
  cntcontour <- cntcontour[cntcontour$group == names(which.max(table(cntcontour$group))), ]
  wspx <- range(cntcontour[,1])
  wspy <- range(cntcontour[,2])
  N <- length(cntlevels)
  px <- runif(N) * diff(wspx) + wspx[1]
  py <- sort(runif(N) * diff(wspy) + wspy[1])
  sel <- which(, py, cntcontour[,1], cntcontour[,2], mode.checked=FALSE) == 1)
  df <- data.frame(long = px[sel], lat = py[sel], level=cntlevels[sel])  
  par(pty="s", mar=c(0,0,4,0))
  plot(df$long, df$lat, col=cols[df$level], pch=19, cex=3,
       bty="n", xaxt="n", yaxt="n", xlab="", ylab="")
# PISA and World maps are using differnt country names,
# thus in some cases we need to give two names
plotCountry(cntname = "Korea", cntname2 = "South Korea")
plotCountry(cntname = "Japan", cntname2 = "Japan")
plotCountry(cntname = "Finland")
plotCountry(cntname = "Poland")
plotCountry(cntname = "France", cntname2 = "France")
plotCountry(cntname = "Italy", cntname2 = "Italy")
plotCountry(cntname = "United States of America", cntname2 = "USA")

3 thoughts on “Proficiency levels @ PISA and visualisation challenge @ useR!2014”

  1. It’s a letter against country rankings, most of us agree that rankings do not give any insights and it is not a good path.
    But it is not a letter against data driven improvements.

Dodaj komentarz

Twój adres e-mail nie zostanie opublikowany. Wymagane pola są oznaczone *