Three Thoughts on the PISA 2018 Results


Image of Yue-Yi Hwa

Yue-Yi Hwa

RISE Directorate

Blavatnik School of Government, University of Oxford

Here are my top three reflections on last week’s launch of the PISA 2018 results.

1. PISA results should come with a health warning: “correlation ≠ causation.”

PISA is designed to give a snapshot of some aspects of 15-year-old students’ learning in some countries at a single point in time (albeit repeated every three years). It is not designed for statistical identification of cause and effect. Researchers have demonstrated this time and again with scatterplots of PISA results against ice-cream consumption, among other spurious relationships.

Still, as one of my friends remarked after the PISA results launch, “It’s much more fun to have random hypotheses than to be nuanced and sensible.” And the PISA results reports have an abundance of fun scatterplots.

In his PISA 2018 Insights and Interpretations, OECD Director of Education and Skills Andreas Schleicher presents a scatterplot of country-level reading performance against cumulative education spending per student, accompanied by the following text:

PISA results show that there is a positive relationship between investment in education and average performance – up to a threshold of USD 50 000 in cumulative expenditure per student from age 6 to 15 … However, after that threshold, there is almost no relationship between the amount invested in education and student performance. … What may matter more after a threshold is reached is how resources are allocated. (p. 20).

It’s true that appropriate resource allocation is often more important than the absolute level of resources. But Schleicher’s remarks seem to imply that education systems below the spending threshold can expect to see performance gains if they increase their educational budgets, irrespective of what the budgets are spent on.

Yet spending more on schools doesn’t necessarily translate into more learning—even for systems well below the threshold Schleicher identified. Based on the data underlying Schleicher’s observation (Excel file download), per-pupil education spending in Indonesia stands at less than a third of that USD 50,000 threshold. However, when the RISE Indonesia team examined Indonesian learning levels between 2000 and 2014, a period that saw extensive educational reforms and a tripling of the real education budget, they found slight declines in student learning, rather than gains. These declines were evident even in primary school, which had nearly full enrolment even prior to the reforms. (This general trend is echoed in Indonesia’s PISA trajectory, which has been hump-shaped for reading and mathematics, with little overall change from 2000 to the present.) The allocation of resources clearly matters for learning, even when resource levels are low to begin with.

This is not to suggest that Schleicher would argue for indiscriminately pumping money into low-resourced education systems will magically improve learning levels. In fact, he acknowledges in his Insights and Interpretations that “[c]orrelations are often deceptive,” and that PISA data “do not really say much about cause and effect” (p. 55). But this acknowledgement comes at the end of the document, after many broad statements based on messy correlations. It is all too easy for consumers of the PISA results to read too much into such correlations—especially when such over-reading is more fun than being nuanced and sensible.

2. PISA shocks only work if they shock the right people into changing the right things.

In a PISA 2018 commentary, researchers at the Center for Global Development observed that “a high profile PISA ‘shock’ does not result in big improvement.” They cite the cases of Denmark, Germany, and Japan, where public outcry over disappointing PISA results prompted a wave of education reforms, which had similarly disappointing impact. (Although, in the German case, this conclusion depends partly on the period of analysis. While the CGD blog looks at Germany’s hump-shaped performance curve from 2000 to 2018, the OECD page celebrating Germany’s PISA shock focuses on its upward trajectory between 2000 and 2009.)

But it shouldn’t be surprising that PISA shocks don’t reliably improve student learning. PISA results are just data. For a PISA shock to improve learning, (a) these data need to shock not only the people who write news headlines, but also the people who have the power to change the education system; and (b) the changes they introduce need to improve system-level coherence for learning. Neither of these conditions are foregone conclusions.

To illustrate (a), Ruth Dixon and co-authors show in their analysis of press coverage of PISA and PIRLS 2006 that there is no straightforward relationship between how well countries perform in PISA, how negatively the media report on this performance, and whether politicians introduce reforms as a result. Finland far outperformed France in PISA 2006, but news articles indicated similarly negligible responses from Finnish and French politicians.

To illustrate (b), consider why electric shocks from defibrillators can save lives during certain types of cardiac arrest. (With thanks to Lant Pritchett for supplying the analogy and the Mayo Clinic for supplementing my scanty prior knowledge.) In a well-functioning heart, muscles contract rhythmically in response to electrical signals, producing the heartbeat. If these electrical signals go awry, the subject’s life will be in danger—not because the heart muscles are failing to contract, but because the heartbeat becomes irregular or dangerously fast. A defibrillator can halt this life-threatening situation by resetting the rhythm with a jolt of electrical current, thus re-aligning the system around a well-functioning heartbeat.

Analogously, a PISA shock could resuscitate a failing education system if the shock triggers the political will to re-align elements of the education system around student learning. Conversely, subpar PISA performance, however shocking, is unlikely to trigger improvement if the policy response does not improve system-level coherence for learning. In the wake of England’s disappointing PISA 2012 results, then-education secretary Michael Gove introduced policies that purportedly mimicked “systems like Singapore, Shanghai and Hong Kong,” but actually reflected local political priorities, as Yun You observed in a 2017 study. England has yet to see any spectacular gains in its PISA scores. (Whether or not it is advisable to copy the policies of high-performing systems is another discussion. Spoiler: not usually.)

Should we give up on the pathway of PISA shock → political will → policy reforms that improved system-level coherence for learning → student learning gains? Perhaps not. Barbara Bruns and co-authors trace such a pathway in Peru, where PISA results triggered a sustained series of teaching and learning reforms that have yielded consistent improvements in PISA scores; as well as in Ecuador, where dismal results in SERCE, the regional assessment, sparked a similar pathway of change.

3. PISA is a good (but limited) source of education data (among many).

Ecuador’s experience is a reminder that PISA is just one among many sources of assessment data for monitoring education systems. Besides international and regional large-scale assessments as well as national administrative datasets on standardised exams, RISE researchers have tracked changes in learning levels using data from cross-country surveys and within-country panel surveys that were not designed specifically to monitor education, as well as household assessments administered by local volunteers.

Compared to other datasets, PISA certainly has strengths. It covers a large number of countries, with 79 countries participating in the 2018 cycle (although this sample is skewed toward higher-income countries). PISA also includes rich contextual data on students and schools, which can be used in far more sophisticated analyses than the ubiquitous scatterplots.

But PISA also has clear weaknesses. One key issue is that PISA tests students at age 15, by which time many will have fallen far behind the curriculum by failing to master basic skills early in their school careers. The importance of foundational skills undergirds arguments for testing and benchmarking in the middle of primary school. For example, the World Bank’s new Learning Poverty measure looks at whether children are able to read and understand a simple text by age 10. Large-scale assessments that can fulfil this middle-primary benchmarking function include one of PISA’s lower-profile cousins, the IEA’s Progress in International Reading Literacy Study (PIRLS) for 4th graders, as well as citizen-led assessments, many of which use one-to-one oral testing to gauge the skills of children who cannot yet read and write with ease.

A related issue is that PISA assesses relatively complex skills, such that most children in low-performing education systems fall below the lowest performance thresholds. Commendably, PISA for Development (PISA-D), which broadened the reach of PISA to cover more lower-income countries, aims to extend the performance scale below the conventional PISA thresholds. However, worryingly low PISA-D results and the restriction of PISA-D eligibility to 15-year-olds enrolled in Grade 7 and above—such that less than half of all 15-year-olds in the seven countries that have participated in PISA-D thus far have been eligible to participate—suggest that many countries still face considerable floor effects even in PISA-D. Many low-income countries will have to look beyond PISA to meaningfully monitor their learning outcomes.

Taking advantage of other existing sources of educational data is not only efficient, it also reduces the temptation to let a single measure become the overarching target, which can generate incentives to distort the measure in question. I say this as a Malaysian who was disappointed to find that Malaysia’s PISA 2015 data had been excluded from the main survey because of response-rate issues that compromised sampling—and that likely stemmed from intentional manipulation. (I am happy to report that Malaysia’s PISA 2018 data do not display similar issues.)

In short, PISA 2018 is a rich dataset that can play useful roles in educational monitoring and research. But if we lean too hard on PISA, we may be setting ourselves up for a fall.

Author bios:

RISE blog posts and podcasts reflect the views of the authors and do not necessarily represent the views of the organisation or our funders.