The disruptions in online assessments that roiled Montana, Nevada, and North Dakota in the spring of 2015 appear to have had both negative and positive effects on individual students’ test scores, according to a new analysis.
While some students’ scores appeared to have been hurt by “logout” interruptions, evidence suggests others may have actually benefited from temporarily getting booted out of the system, a conclusion the authors of the report base on statistical modeling.
The independent analysis released Friday was conducted by the National Center for the Improvement of Educational Assessment, a research and consulting nonprofit hired by the Smarter Balanced Assessment Consortium, the group that sponsored the exams.
The report also suggests the impact of the logout interruptions varied greatly by school and district. The vast majority of districts in Montana and North Dakota, for instance, had zero students forced to cope with a logout interruption, but in some school systems in those states, 100 percent of students ran into problems.
But authors Joseph Martineau and Nathan Dadey also said their review was hamstrung by vendors’ testing systems only collecting data on one type of interruption—students getting kicked out due to server failure—and not other problems. That means the review only examines a portion of the total interruptions schools faced.
‘Tamp Down the Panic’
The report also points to what the authors see as a major flaw in how test vendors and public officials think about online testing.
Today, industry and education officials assume that computer-based testing will work relatively flawlessly, without interruptions, despite the vicissitudes and complexities associated with Internet service providers, routers, and district infrastructure, the report says.
But the evidence suggests that policymakers and vendors should reverse their thinking–and go into online testing periods expecting that interruptions will occur, the authors say. Doing so would compel states and vendors to take a systemic approach to fixing problems quickly and communicating immediately with schools. And it would “tamp down the panic,” Martineau said in an interview.
There needs to be a faster, more coordinated response that carries “from the vendor all the way down to the student,” he added.
As things now stand, interruptions have potentially “dire” implications for states because they don’t have the ability to respond and make fixes immediately, the report argues.
To date, “both states and their vendors have treated interruptions as abnormalities, rather than eventualities,” the authors write, leading to “ad-hoc” responses.
The Impact of the Interruptions
The testing disruptions in Montana, Nevada, and North Dakota deeply angered state officials, who complained about lost instructional time and the overall confusion created among teachers and students. Nevada Attorney General Adam Paul Laxalt later recouped $1.8 million in penalties from Smarter Balanced, and $1.3 million from the state’s testing vendor at the time, Measured Progress, as a result of the interruptions.
After the breakdowns, Measured Progress officials said their work was hindered by another vendor, the American Institutes for Research, failing to deliver source code in time to test the system adequately. The AIR, which disputed that claim, had developed an open-source test delivery system for the exam.
The Center for Assessment’s analysis looked at the testing interruptions in all three states. Among its conclusions:
+ Students likely appear to have experienced both positive and negative effects on their test scores, as a result of the interruptions. (The authors’ methods, which rely on creating a comparison with a matched peer group of students, don’t allow for a definite cause-and-effect link between interruptions and test scores.) There were some indications that the breakdowns had a relatively large effect on limited-English-proficient students’ scores. But the sample-size of students in that category was small, limiting the results.
+ A relatively small portion of students across states dealt with testing interruptions—with the highest number being 6 percent for any combination of subject, grade, or state. The average percentage of students who coped with interruptions was smaller—about 3 percent, Martineau said. But the authors don’t know the extent to which other breakdowns added to those totals, because they could not collect that data.
+ Participation in the test suffered because of the interruptions The portion of students with valid scores at the district level fell by between 6 percent and 16 percent across subjects, grades, and states, compared with baselines from the previous three years.
+ Vendors’ testing systems should be required to collect and document more data about the nature of interruptions, the authors recommend. They also call for the creation of independently operating databases, on independent servers, to help gather this information.
While it might seem obvious why a student’s test score could be negatively affected by a test interruption, the fact that other students may have benefited from the disruption shouldn’t be a shocker, Martineau said.
The testing glitches may have given some students time to think about the nature of the questions, take a break, and refocus, Martineau speculated. Others—if they were “highly motivated,” might have even tried to go to an outside source and look up answers, he said.
Improving the System
Despite public worries about test interruptions, the report is a reminder that “students are more resilient than we give them credit for,” he said.
Smarter Balanced will look at the report’s recommendations and talk with member states about the possibility of amending contracts with test vendors to take in that advice, said Tony Alpert, the executive director of the consortium.
But he also argued that Smarter Balanced states have already progressed beyond the setbacks that played out during the 2014-15 academic year. This year’s round of state tests were overwhelmingly problem-free, he said.
The goal of the report was “not to identify any source of blame, but to improve the system,” Alpert said. “We’re really moving on.”