“Once again, another author confuses learning with recalling information.”
“I personally would like to avoid as many tests as possible, especially with my grade on the line. Trying to learn in a stressful environment is no way to help retain information.”
“Nobody should care whether memorization is enhanced by practice testing or not. Our children cannot do much of anything anymore.”4
Forget memorization, many commenters argued; education should be about high- order skills. Hmmm. If memorization is irrelevant to complex problem solving, don’t tell your
Make It Stick ê 30
neurosurgeon. The frustration many people feel toward standardized, “dipstick” tests given for the sole purpose of mea-sur ing learning is understandable, but it steers us away from appreciating one of the most potent learning tools available to us. Pitting the learning of basic knowledge against the development of creative thinking is a false choice. Both need to be cultivated. The stronger one’s knowledge about the subject at hand, the more nuanced one’s creativity can be in addressing a new problem. Just as knowledge amounts to little without the exercise of ingenuity and imagination, creativity absent a sturdy foundation of knowledge builds a shaky house.
Studying the Testing Effect in the Lab The testing effect has a solid pedigree in empirical research.
The fi rst large- scale investigation was published in 1917.
Children in grades 3, 5, 6, and 8 studied brief biographies from Who’s Who in America. Some of them were directed to spend varying lengths of the study time looking up from the material and silently reciting to themselves what it contained.
Those who did not do so simply continued to reread the material. At the end of the period, all the children were asked to write down what they could remember. The recall test was repeated three to four hours later. All the groups who had engaged in the recitation showed better retention than those who had not done so but had merely continued to review the material. The best results were from those spending about 60
percent of the study time in recitation.
A second landmark study, published in 1939, tested over three thousand sixth graders across Iowa. The kids studied six- hundred- word articles and then took tests at various times before a fi nal test two months later. The experiment showed a couple of interesting results: the longer the fi rst test was delayed, the greater the forgetting, and second, once a student
To Learn, Retrieve ê 31
had taken a test, the forgetting nearly stopped, and the student’s score on subsequent tests dropped very little.5
Around 1940, interest turned to the study of forgetting, and investigating the potential of testing as a form of retrieval practice and as a learning tool fell out of favor. So did the use of testing as a research tool: since testing interrupts forgetting, you can’t use it to mea sure forgetting because that “contami-nates” the subject.
Interest in the testing effect resurfaced in 1967 with the publication of a study showing that research subjects who were presented with lists of thirty- six words learned as much from repeated testing after initial exposure to the words as they did from repeated studying. These results— that testing led to as much learning as studying did— challenged the received wisdom, turned researchers’ attention back to the potential of testing as a learning tool, and stimulated a boomlet in testing research.
In 1978, researchers found that massed studying (cramming) leads to higher scores on an immediate test but results in faster forgetting compared to practicing retrieval. In a second test two days after an initial test, the crammers had forgotten 50 percent of what they had been able to recall on the initial test, while those who had spent the same period practicing retrieval instead of studying had forgotten only 13 percent of the information recalled initially.
A subsequent study was aimed at understanding what effect taking multiple tests would have on subjects’ long- term retention. Students heard a story that named sixty concrete objects. Those students who were tested immediately after exposure recalled 53 percent of the objects on this initial test but only 39 percent a week later. On the other hand, a group of students who learned the same material but were not tested at all until a week later recalled 28 percent. Thus, taking a single test boosted per for mance by 11 percent after a week.
Make It Stick ê 32
But what effect would three immediate tests have relative to one? Another group of students were tested three times after initial exposure and a week later they were able to recall 53
percent of the objects— the same as on the initial test for the group receiving one test. In effect, the group that received three tests had been “immunized” against forgetting, compared to the one- test group, and the one- test group remembered more than those who had received no test immediately following exposure. Thus, and in agreement with later research, multiple sessions of retrieval practice are generally better than one, especially if the test sessions are spaced out.6
In another study, researchers showed that simply asking a subject to fi ll in a word’s missing letters resulted in better memory of the word. Consider a list of word pairs. For a pair like foot- shoe, those who studied the pair intact had lower subsequent recall than those who studied the pair from a clue as obvious as foot- s_ _e. This experiment was a demonstration of what researchers call the “generation effect.” The modest effort required to generate the cued answer while studying the pairs strengthened memory of the target word tested later (shoe).
Interestingly, this study found that the ability to recall the word pair on later tests was greater if the practice retrieval was delayed by twenty intervening word pairs than when it came immediately after fi rst studying the pair.7 Why would that be?
One argument suggested that the greater effort required by the delayed recall solidifi ed the memory better. Researchers began to ask whether the schedule of testing mattered.
The answer is yes. When retrieval practice is spaced, allowing some forgetting to occur between tests, it leads to stronger long- term retention than when it is massed.
Researchers began looking for opportunities to take their inquiries out of the lab and into the classroom, using the kinds of materials students are required to learn in school.
To Learn, Retrieve ê 33
Studying the Testing Effect “In the Wild”
In 2005, we and our colleagues approached Roger Chamberlain, the principal of a middle school in nearby Columbia, Illinois, with a proposition. The positive effects of retrieval practice had been demonstrated many times in controlled laboratory settings but rarely in a regular classroom setting.
Would the principal, teachers, kids, and parents of Columbia Middle School be willing subjects in a study to see how the testing effect would work “in the wild”?
Chamberlain had concerns. If this was just about memorization, he wasn’t especially interested. His aim is to raise the school’s students to higher forms of learning— analysis, synthesis, and application, as he put it. And he was concerned about his teachers, an energetic faculty with curricula and varied instructional methods he was loath to disrupt. On the other hand, the study’s results could be instructive, and participation would bring enticements in the form of smart boards and
“clickers”— automated response systems—for the classrooms of participating teachers. Money for new technology is famously tight.
A sixth grade social studies teacher, Patrice Bain, was eager to give it a try. For the researchers, a chance to work in the classroom was compelling, and the school’s terms were accepted: the study would be minimally intrusive by fi tting within existing curricula, lesson plans, test formats, and teaching methods. The same textbooks would be used. The only difference in the class would be the introduction of occasional short quizzes. The study would run for three semesters (a year and a half), through several chapters of the social studies textbook, covering topics such as ancient Egypt, Mesopotamia, India, and China. The project was launched in 2006. It would prove to be a good decision.
Make It Stick ê 34
For the six social studies classes a research assistant, Pooja Agarwal, designed a series of quizzes that would test students on roughly one- third of the material covered by the teacher.
These quizzes were for “no stakes,” meaning that scores were not counted toward a grade. The teacher excused herself from the classroom for each quiz so as to remain unaware of which material was being tested. One quiz was given at the start of class, on material from assigned reading that hadn’t yet been discussed. A second was given at the end of class after the teacher had covered the material for the day’s lesson. And a review quiz was given twenty- four hours before each unit exam.
There was concern that if students tested better in the fi nal exam on material that had been quizzed than on material not quizzed, it could be argued that the simple act of reexposing them to the material in the quizzes was responsible for the superior learning, not the retrieval practice. To counter this possibility, some of the nonquizzed material was interspersed with the quiz material, provided as simple review statements, like “The Nile River has two major tributaries: the White Nile and the Blue Nile,” with no retrieval required. The facts were quizzed for some classes but just restudied for others.
The quizzes took only a few minutes of classroom time.
After the teacher stepped out of the room, Agarwal projected a series of slides onto the board at the front of the room and read them to the students. Each slide presented either a multiple choice question or a statement of fact. When the slide contained a question, students used clickers (handheld, cell-phone- like remotes) to indicate their answer choice: A, B, C, or D. When all had responded, the correct answer was revealed, so as to provide feedback and correct errors. (Although teachers were not present for these quizzes, under normal circumstances,
To Learn, Retrieve ê 35