Yesterday OECD has published results and data from PISA 2015 study (Programme for International Student Assessment). It’s a very cool study – over 500 000 pupils (15-years old) are examined every 3 years. Raw data is publicly available and one can easily access detailed information about pupil’s academic performance and detailed data from surveys for studetns, parents and school officials (~2 000 variables). Lots of stories to be found.

You can download the dataset in the SPSS format from this webpage. Then use the foreign package to read sav files and intsvy package to calculate aggregates/averages/tables/regression models (for 2015 data you shall use the GitHub version of the package).

Below you will find a short example, how to read the data, calculate weighted averages for genders/countries and plot these results with ggplot2. Here you will find other use cases for the intsvy package.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
library("foreign") library("intsvy") library("dplyr") library("ggplot2") library("tidyr") stud2015 <- read.spss("CY6_MS_CMB_STU_QQQ.sav", use.value.labels = TRUE, to.data.frame = TRUE) genderMath <- pisa2015.mean.pv(pvlabel = "MATH", by = c("CNT", "ST004D01T"), data = stud2015) genderMath <- genderMath[,c(1,2,4,5)] genderMath %>% select(CNT, ST004D01T, Mean) %>% spread(ST004D01T, Mean) -> genderMathWide genderMathSelected <- genderMathWide %>% filter(CNT %in% c("Austria", "Japan", "Switzerland", "Poland", "Singapore", "Finland", "Singapore", "Korea", "United States")) pl <- ggplot(genderMathWide, aes(Female, Male)) + geom_point() + geom_point(data=genderMathSelected, color="red") + geom_text(data=genderMathSelected, aes(label=CNT), color="grey20") + geom_abline(slope=1, intercept = 0) + geom_abline(slope=1, intercept = 20, linetype = 2, color="grey") + geom_abline(slope=1, intercept = -20, linetype = 2, color="grey") + geom_text(x=425, y=460, label="Boys +20 points", angle=45, color="grey", size=8) + geom_text(x=460, y=425, label="Girls +20 points", angle=45, color="grey", size=8) + coord_fixed(xlim = c(400,565), ylim = c(400,565)) + theme_bw() + ggtitle("PISA 2015 in Math - Gender Gap") + xlab("PISA 2015 Math score for girls") + ylab("PISA 2015 Math score for boys") |