PISA 2015 – how to read/process/plot the data with R

Yesterday OECD has published results and data from PISA 2015 study (Programme for International Student Assessment). It’s a very cool study – over 500 000 pupils (15-years old) are examined every 3 years. Raw data is publicly available and one can easily access detailed information about pupil’s academic performance and detailed data from surveys for studetns, parents and school officials (~2 000 variables). Lots of stories to be found.

You can download the dataset in the SPSS format from this webpage. Then use the foreign package to read sav files and intsvy package to calculate aggregates/averages/tables/regression models (for 2015 data you shall use the GitHub version of the package).

Below you will find a short example, how to read the data, calculate weighted averages for genders/countries and plot these results with ggplot2. Here you will find other use cases for the intsvy package.



stud2015 <- read.spss("CY6_MS_CMB_STU_QQQ.sav", use.value.labels = TRUE, to.data.frame = TRUE)
genderMath <- pisa2015.mean.pv(pvlabel = "MATH", by = c("CNT", "ST004D01T"), data = stud2015)

genderMath <- genderMath[,c(1,2,4,5)]
genderMath %>%
  select(CNT, ST004D01T, Mean) %>%
  spread(ST004D01T, Mean) -> genderMathWide

genderMathSelected <-
  genderMathWide %>%
  filter(CNT %in% c("Austria", "Japan", "Switzerland",  "Poland", "Singapore", "Finland", "Singapore", "Korea", "United States"))

pl <- ggplot(genderMathWide, aes(Female, Male)) +
  geom_point() +
  geom_point(data=genderMathSelected, color="red") +
  geom_text(data=genderMathSelected, aes(label=CNT), color="grey20") +
  geom_abline(slope=1, intercept = 0) + 
  geom_abline(slope=1, intercept = 20, linetype = 2, color="grey") + 
  geom_abline(slope=1, intercept = -20, linetype = 2, color="grey") +
  geom_text(x=425, y=460, label="Boys +20 points", angle=45, color="grey", size=8) + 
  geom_text(x=460, y=425, label="Girls +20 points", angle=45, color="grey", size=8) + 
  coord_fixed(xlim = c(400,565), ylim = c(400,565)) +
  theme_bw() + ggtitle("PISA 2015 in Math - Gender Gap") +
  xlab("PISA 2015 Math score for girls") +
  ylab("PISA 2015 Math score for boys") 

11 thoughts on “PISA 2015 – how to read/process/plot the data with R”

  1. Oby, chociaż trend jest w drugą stronę. Ostatnio dane były w formacie SPSS, SAS i plików tekstowych, teraz już nie ma plików tekstowych.
    Ale przez grupę PISA przewinęło się dwóch zapalonych eRowców, więc może i RData kiedyś będzie.

  2. Can you make graphs more readable please?
    I don’t understand the graph.
    Maybe a legend what black and red dot’s mean?

    And at least simple conclusion of results, that would give clue how to interpret this graph.

    1. Red dots are for countries with names, black dots are for countries without names.
      Selection of named countries – subjective.

      1. And the story: on average average (between and within countries) boys are doing slightly better in math than girls.
        But for different countries these ‘gaps’ are different. Finland and Korea – higher average for girls, Austria and Japan – higher average for boys.
        Singapur shows that in PISA both genders may be equally high performance in Math, so maybe Austria could do better in the overall ranking if they try to close the gap between boys and girls.

        1. Thanks for reply. So without looking just in graph, there is no way to get your own conclusions because of lack of concrete indication what point really are and mean?
          So to be meaningful for such ordinary man, just by looking at them, they have to have next layer of description?

          1. Yes, thanks.
            Here my goal was to show how to download, read, process and plot the PISA2015 data (as the title suggests) with the intsvy package for R.
            But I agree, that in order to use this graphics in a 'story mode’, one need to add much more explanations and annotations.

        2. I would much appreciate some even basic annotations and conclusions from your graphs, in general.
          They can be valuable source of information on many topics, and source of reference, if could be understood just by analyzing them as they are.

Skomentuj smarterpoland Anuluj pisanie odpowiedzi

Twój adres e-mail nie zostanie opublikowany. Wymagane pola są oznaczone *