Intervention(s)
The school closures induced by the COVID-19 outbreak has placed heightened emphasis on alternative ways to measure and track student learning besides in-person assessments. Even beyond school closures, situations like humanitarian and natural disasters, or students simply living in physically remote locations might hinder the proper assessment of their learning profile. A potential option to assess students is phone-based assessments, where an assessor calls students, and asks them to solve some questions remotely. Work such as Angrist et al. (2020a) has already used these assessments as outcomes, and their work has also led to the identification of practical recommendations to assess children over the phone (Angrist et al. (2020b). However, to the best of our knowledge, there has not yet been a formal validation of these learning assessments, where the scores obtained over the phone are correlated with the same students’ classroom scores or other measures of achievement. Furthermore, it is still unclear whether these correlations hold up across sub-groups of interests like by gender, baseline performance, grade, or different degrees of rurality of each school (which may be particularly important for access to technology).
The school closures induced by the COVID-19 outbreak has placed heightened emphasis on alternative ways to measure and track student learning besides in-person assessments. Even beyond school closures, situations like humanitarian and natural disasters, or students simply living in physically remote locations might hinder the proper assessment of their learning profile. A potential option to assess students is phone-based assessments, where an assessor calls students, and asks them to solve some questions remotely. Work such as Angrist et al. (2020a) has already used these assessments as outcomes, and their work has also led to the identification of practical recommendations to assess children over the phone (Angrist et al. (2020b). However, to the best of our knowledge, there has not yet been a formal validation of these learning assessments, where the scores obtained over the phone are correlated with the same students’ classroom scores or other measures of achievement. Furthermore, it is still unclear whether these correlations hold up across sub-groups of interests like by gender, baseline performance, grade, or different degrees of rurality of each school (which may be particularly important for access to technology).
Beyond the validation of these assessments, other important features of phone-based assessments are yet to be studied (Lupu and Michelitch, 2018). Social scientists have long discussed and identified “enumerator effects” for in-person surveys (Bischoping and Schuman, 1992; Lupu and Michelitch, 2018; Schaeffer et al., 2010; and West and Blom, 2017), where observable characteristics of enumerators drive differential rates of responses and scores for seemingly similar populations in developing countries (Adida et al., 2016; Benstead, 2014a; Benstead, 2014b; Blaydes and Gillum, 2013; Blom et al., 2007; Durrant et al., 2010; Flores-Macías and Lawson, 2008; Kane and Macaulay, 1993; Liu and Stainback, 2013; Olson, 2007). For instance, Di Maio and Fiala (2018) find that in Uganda most observable characteristics yield minimal enumerator effects, except when enumerators are asking highly sensitive political preference questions, which account for over 30 percent of the variation in responses. To clearly identify enumerator effects and avoid confounding enumerator and respondent characteristics, researchers would ideally fully randomize the assignment of assessors to assessees, or as West and Blom (2017) call it, create “fully interpenetrated designs”. In spite of the large body of work suggesting the presence of enumerator effects within in-person assessments, the physical logistics of “fully interpenetrated designs” can be challenging, and few studies, conducted only in the United States and with small samples, have actually conducted such a study (Di Maio and Fiala, 2018). Typically, the logistical issues have been dealt with by assigning assessors to small areas that are still feasible for assessors to move in and yet capture as much variability in assessee-assessor assignments as possible (Lupu and Michelitch, 2018). For example, in the case of in Uganda, the most disaggregated unit they can feasibly assign assessors to is villages.
Phone-based assessments lend themselves to a more rigorous documentation of enumerator effects, in general and for learning assessments more specifically, as the enumerators are centrally located and can be randomly allocated across the full sample. One could hypothesize that in such a personal level of assessment between assessor and assessee, especially one with a degree of power dynamics between students and teachers, the level of comfort in the relationship could indeed lead to diPhone-based assessments lend themselves to a more rigorous documentation of enumerator effects, in general and for learning assessments more specifically, as the enumerators are centrally located and can be randomly allocated across the full sample. One could hypothesize that in such a personal level of assessment between assessor and assessee, especially one with a degree of power dynamics between students and teachers, the level of comfort in the relationship could indeed lead to differential response rates.
We leverage the data collection process from a phone-based assessment in Kenya to add to the literature previously mentioned. As part of the outcomes measured in another RCT evaluation, students are given a short, phone-based assessment consisting of math questions, a student survey question, and a few parent survey questions. Assessors will be teachers from within the educational system where the RCT is conducted. Students in 3rd, 5th, and 6th grade across all 105 schools in the RCT sample were randomly selected to receive a phone-based assessment. Then, students selected to receive a phone-based assessment were fully randomized to an enumerator, as well as the order in which they are called. Therefore, there is a random assignment in the match between assessor and assessees, the day each student is called, and the order in which they are reached. Strong protocols are in place to ensure that this order is preserved.
In particular, we will explore the following research questions:
1. Are phone-based assessments valid measures of learning?
2. To what extent are there differential response rates by enumerators?
3. Does the match on observable characteristics of assessors and assessees (e.g. same gender) drive differential response rates and scores?
4. Does assessor experience and teaching skill change the average and variability of scores?
References
Adida, C.L., Feree, K.E., Posner, D.N., Robinson, A.L. (2016). Who's asking? Interviewer coethnicity effects in African survey data. Comparative Political Studies. 49: 1630–60
Angrist, N., Bergman, P., Matsheng, M. (2020a). School’s Out: Experimental Evidence on Limiting Learning Loss Using 'Low-Tech' in a Pandemic. Working Paper. https://papers.ssrn.com/sol3/Papers.cfm?abstract_id=3735967
Angrist, N., Bergman, P., Evans, D. K., Hares, S., Jukes, M. C. H., & Letsomo, T. (2020b). Practical lessons for phone-based assessments of learning. BMJ Global Health, 5(7), e003030. https://doi.org/10.1136/bmjgh-2020-003030
Benstead, L.J. (2014a). Does interviewer religious dress affect survey responses? Evidence from Morocco. Politics and Religion 7: 734–60
Benstead, L.J. (2014b). Effects of interviewer–respondent gender interaction on attitudes toward women and politics: findings from Morocco. International Journal of Public Opinion Research. 26: 369–83
Bischoping, K., Schuman, H. (1992). Pens and polls in Nicaragua: an analysis of the 1990 preelection surveys. American Journal of Political Science. 36: 331–50
Blaydes, L., Gillum, R.M. (2013). Religiosity-of-interviewer effects: assessing the impact of veiled enumerators on survey response in Egypt. Politics and Religion 6: 459–82
Blom, M., Hox, J., Koch, A. (2007). The influence of interviewers’ contact behavior on the contact and cooperation rate in face-to-face household surveys. International Journal of Public Opinion Research. 19: 97–111
Di Maio, M., Fiala, N. (2018). Be Wary of Those Who Ask : A Randomized Experiment on the Size and Determinants of the Enumerator Effect. Policy Research Working Paper; No. 8671. World Bank, Washington, DC. © World Bank. https://openknowledge.worldbank.org/handle/10986/30993 License: CC BY 3.0 IGO.
Durrant, G.B., Groves, R.M., Staetsky, L., Steele, F. (2010). Effects of interviewer attitudes and behaviors on refusal in household surveys. Public Opinion Quarterly. 74: 1–36
Flores-Macías, F., Lawson, C. (2008). Effects of interviewer gender on survey responses: findings from a household survey in Mexico. International Journal of Public Opinion Research. 20: 100–10
Kane, E. W., and Macaulay, L. J. (1993). Interviewer gender and gender attitudes. Public Opinion Quarterly. 57:1–28
Liu, M., Stainback, K. (2013). Interviewer gender effects on survey responses to marriage-related questions. Public Opinion Quarterly. 77: 606–18
Lupu, N., & Michelitch, K. (2018). Advances in Survey Methods for the Developing World. Annual Review of Political Science, 21(1), 195–214. https://doi.org/10.1146/annurev-polisci-052115-021432
Olson, K. P. A. (2007). Effect of interviewer experience on interview pace and interviewer attitudes. Public Opinion Quarterly. 71: 273–86
Schaeffer, N.C., Dykema J., Maynard, D.W. (2010). Interviewers and interviewing. In Handbook of Survey Research, ed. PV Marsden, JD Wright, pp. 437–70. Bingley, UK: Emerald Group. 2nd ed.
West, B.T., Blom, A.G. ( 2017). Explaining interviewer effects: a research synthesis. J. Surv. Stat. Methodol. 5: 175–211