Blavatnik School of Government, University of Oxford
What We Learned from Our RISE Baseline Diagnostic Exercise
A key part of the RISE agenda is to focus on getting to systems of basic education that are coherent around learning. All of the RISE country program research is focused on the system changes, not the evaluation of “pilots” or “field experiments” (one education minister recently complained, “All pilots fly, but at the end of the day we just have pilots and papers.”). Moreover, the RISE focus on the “coherence” of accountability systems (similar to the notion of the “alignment” of actors) stems from a recognition that, like a car with a motor, transmission, axles, and wheels, the pieces have to all work together as sometimes fixing one piece at a time won’t make any difference. To interpret the “evidence” about impact of doing a particular reform (e.g., school plans, publicizing information about outcomes, changing teacher pay) one needs not only a detailed description of what changed, but also a description of the key elements of the system that did not change. The impact of reforming any one element of the system almost certainly is affected by the entire context, and hence needs to be interpreted in that light.
Therefore, each of the first four country research teams: India, Pakistan, Tanzania, and Vietnam, has done (and Ethiopia and Indonesia will do) a three-fold baseline diagnostic exercise. Part one was a symptomatic description of existing conditions—what is known about the current levels and recent progress in enrollments, learning, and the standard descriptions of available inputs (e.g., class sizes, teacher labor force, etc.). Part two was a more analytic diagnostic of key elements of the education system. Part three was a description of the country research team’s research in the context of these descriptions and the RISE analytic framework—that is, what are the system changes the country research team is investigating the impact of, and the hypothesis about why (or why not) those changes are likely to produce impacts. None of these were intended to be original new research or a stand-alone publication, but rather an exercise that helps make the whole of the individual country research products add up to more than the sum of the parts by providing a common background in a common analytic frame.
The main diagnostic component (part two) was based originally on the SABER frameworks and protocols for the six RISE relevant domains: Teachers, Education Management Information Systems, Engaging the Private Sector, School Accountability and Autonomy, School Finance, Student Assessment. We started from the SABER frameworks (rather than from scratch) as those had each been subject to a process of development that included used reviews of the literatures to identify features that mattered and had some use in practice. We made changes to those based on a two-day workshop with the country research teams. The main challenge with the SABER exercise is that it is explicitly about the de jure (or formal policies) in place. Given that the gap between de jure and de facto actual practices is a major feature of developing countries generally (and plagues widely publicized international assessments like the Doing Business indicators), all involved felt it was important that key elements of the education system be assessed, not just on paper, but in practice.
What did I, as the research director of RISE, learn from this exercise? There are two sets of lessons, which I address in two different blog posts:
- What are the lessons about the realities of doing education system diagnostics?
- What are the lessons about education from the education systems diagnostics that were done?
I learned two primary lessons from the process of having four country teams do education system diagnostics and, unfortunately, the lessons learned are deeply inter-twined:
Lesson One: Doing an education system diagnostic on a globally replicable and comparable basis on countries’ actual (de facto) performance in key elements of education systems and the coherence/alignment of those elements is vitally important.
Lesson Two: Doing an education system diagnostic on a globally replicable and comparable basis on countries’ actual (de facto) performance in key elements of education systems and the coherence/alignment of those elements is going to be extremely challenging.
Lesson one, the importance of a de facto diagnostics of systems was illustrated in the diagnostics of the four countries as it was clear that neither (a) simple “enrollment and input” measures nor (b) de jure policies explain country success or failure in learning—either across countries or over time. This was illustrated in three ways:
First, the Vietnam diagnostic is of a country that has, on the standard international measures of learning like PISA (as well as detailed measures of learning over time), fantastic performance. Yet the diagnostic can easily show that the standard inputs (e.g., class size, expenditures per pupil) or the standard descriptive variables of systems like those from PISA, do not come even close to explaining Vietnam’s success. Whatever it is that explains Vietnam’s success, it is deeper and more subtle than what existing tools measure (and that is precisely where the team’s research is headed).
Second, the India diagnostic shows a country and states where much has been done in the last 15 years on improving the standard inputs, on increasing expenditures (real expenditure per pupil has tripled), on improving information about standard inputs (e.g., the DISE district and state “report cards”), and even in some reform areas (like expanding information about student performance in Madhya Pradesh). Yet both NGO and the state’s own assessments have yet to show any sustained progress on learning.
Third, the Pakistan diagnostic of Punjab province showed a very impressive track record of projects, policies, and programs in the education sector. And yet, while many of these can demonstrate from their data success at their proximate objectives, there has yet to be evidence of truly sustained performance in either attainment or learning commensurate with the efforts.
This makes a sophisticated diagnostic more important because the risk is that as the learning crisis gets more attention, the most likely first response is “we (global actors) must do more,” but “more” will be “more of the same.” That is, the response to the acknowledgement of the learning crisis will lead to more attention on “thin input” indicators—like expenditure per pupil, GDP per capita spent on education, measures of standard inputs, or more formal training of teachers—that may, or may not, have any impact. The pressure to address the learning crisis will create motivation to create simple “dashboard” indicators of items that are easy to measure, politically popular, and yet not strongly or reliably related to learning.
That makes a sophisticated de facto performance of systems and their coherence more important—but raises the stakes on difficulty.
Lesson two, trying to get even four, highly competent, well-meaning, and motivated country teams to take a common analytical structure, and even a common set of questionnaires and frameworks (adapted from SABER), and produce not even strictly comparable, but even a common diagnostic, was much less easy than I had hoped. I learned three things, some of which in retrospect I should have known, but I learned them nonetheless.
First, it is extremely difficult to do a common diagnostic if function does not strictly and strongly follow form. That is, if there is one and only one recipe, then assessing the recipe and the process of recipe compliance is nearly as good as assessing the food. But, if the goal is to “produce tasty and nutritious food,” then assessing the cook against a single recipe is silly.
On a number of dimensions, we are much closer to be able to describe the function an education system or sub-system is intended to produce, than being able to describe its form, and even if we were to describe one successful form that accomplishes function in a given context, we are pretty sure there are a number of other forms that could be equally successful.
So, the “teacher compensation” sub-system functionally should attract, retain, and motivate high quality teaching. Certainly, a single number like “pay of the average teacher relative to GDP per capita” is inadequate as a summary of whether the functional goals of a teacher compensation sub-system are being met. But even if we were more specific and said: “a teacher compensation system is effective at its ‘retention’ function if those with more effective teaching practice remain as teachers,” there still might be multiple forms to achieving that function—one system could provide compensation to retain good teachers, a different system could try and push out the bad teachers early.
An education system diagnostic will have to be able to assess, for instance, whether the system uses information on learning well, whether the system is capable of generating and diffusing better teaching and learning practices, whether the system balances autonomy for action and accountability for results well, whether the system attracts and retains good teachers—and more. For none of those do we expect “one size fits all,” or that there is a single form of “best practice,” but we also see many systems failing in these key domains.
Second, as one moves from de jure to de facto the problem of variation becomes central and, perhaps, intractable. That is, if I want to assess whether the official curriculum of country X identifies specific skill levels in arithmetic for Grade 3, I can assess that it does or it doesn’t (or it isn’t clear). But if I want to assess whether a specific arithmetic skill is actually being taught in country X’s Grade 3 classrooms, then certainly the answer is that it is in some, it isn’t in others. “It depends” is the answer as there will be variability.
In the “Teacher” domain of SABER, one functional element is “Setting Clear Expectations for Teachers” with a question “Are there standards for what students must know and be able to do?” Answering this question de jure is not necessarily easy, but it is tractable. But what is the answer de facto? Are teachers aware of the standards? Do teachers use the standards? Are the standards part of a teachers “clear expectations”? The answer to all these will ultimately be “yes for some and no for others,” in which case the methodological problem becomes order of magnitude more difficult (and more costly).
Third, doing this diagnostic in a way that will be useful for either global advocacy, or for internal pressure that will help solve the learning crisis, is going to be very challenging. One lesson from successful global movements is that having measurements that are regular, reliable, comparable, and comprehensive can be an important part of effective advocacy. Of course, one of the movements that illustrates this is the movement for basic education itself. The target of “universal schooling” created pressure for data on progress towards the goal that was regular, reliable, comparable, and comprehensive on enrollment of children in school and, while any measurement system has weaknesses, has been wildly successful. One can go online on the UIS website and have access to thousands of indicators for hundreds of countries. This itself can create a dynamic of both positive and negative (e.g., “naming and shaming”) pressures for performance.
The recent example of the Doing Business indicators illustrates the pluses and minuses. On the plus side, the Doing Business indicators, by creating indicators of the “climate” for doing business in countries on a regular (it is done every year), comparable (countries are ranked), and comprehensive ([nearly] all countries are included) basis, have been fantastically successful at promoting objectives of the proponents of the index. But, as recent controversies and data show, these data are far, far from perfect indicators.
It would be great if we, as a broad group of interested individuals and organizations working to solve the learning crisis, had a set of diagnostic indicators that were both good data (regular, reliable, comparable, and comprehensive) and tightly causally linked to improving learning outcomes. The experience of just trying to do a realistic de facto assessment of four countries with super high-quality teams taught me how hard this will be.
RISE blog posts and podcasts reflect the views of the authors and do not necessarily represent the views of the organisation or our funders.