5/7/2021
Bringing Literacy Policy to the Next Level with Effect Size
So far, 2021 is giving us reason for hope, with teachers getting vaccinated and educators focused on closing the gaps created by a year of dislocated learning.
In addition, Congress passed the American Rescue Plan in March. This legislation earmarked $5 billion in funds to support students of all ages who have been significantly impacted by COVID-19.
Over the course of my career, I have never seen the education system receive so much government largess at once—but I've never seen it faced with so much upheaval at once, either. Even with government funding, educators have a big task ahead of them when it comes to identifying and evaluating learning programs to help their students catch up (especially in core areas like literacy), and some are seizing the moment to advance educational initiatives already in progress.
For example, the Florida Department of Education (FDOE) updated its Coronavirus Aid, Relief, and Economic Security (CARES) Act and Elementary and Secondary School Emergency Relief Fund (ESSER) grant criteria for K–3 reading programs to require strong, moderate, or promising Every Student Succeeds Act (ESSA) evidence of effectiveness, AND an effect size of .20 at a minimum. Selecting literacy programs upon strength of evidence based on ESSA and effect size is a first for a state department of education, and it represents a potential seismic shift.
The fact that all research is not created equal has long been an issue for educators—one study does not constitute an evidence base, and research that has not been externally reviewed may be labeled by vendors as more rigorous than it actually is. In light of this, I hope and expect that other states will follow Florida’s lead, as it is important to understand how adding effect size to the mix will allow educators to make more informed decisions about the edtech literacy programs they choose.
Effectiveness as defined by ESSA
ESSA’s predecessor, the No Child Left Behind (NCLB) Act, defined a program of merit as being built upon “scientifically based research,” a definition that ended up being too vague to be of much use to educators.
In contrast, ESSA promotes evidence-based programs by ensuring their capacity to produce results and improve outcomes. ESSA levels of evidence reflect the quality, rigor, and statistical significance of research study designs and findings, and the kind of evidence described in ESSA has generally been produced through formal studies and research. Under ESSA, there are four tiers—or "levels"—of evidence:
-
Tier 1 – Strong Evidence: supported by one or more well-designed and well-implemented randomized control experimental studies
-
Tier 2 – Moderate Evidence: supported by one or more well-designed and well-implemented quasi-experimental studies
-
Tier 3 – Promising Evidence: supported by one or more well-designed and well-implemented correlational studies (with statistical controls for selection bias)
-
Tier 4 – Demonstrates a Rationale: practices have a well-defined logic model or theory of action; are supported by research; and have some effort underway by a state education agency (SEA), local education agency (LEA), or outside research organization to determine their effectiveness
Despite being an improvement over NCLB standards, these evidence requirements still put the onus on educators to determine whether a literacy program is grounded in up-to-date research, proven to be an effective teaching tool, and capable of addressing their students' unique needs. That’s why the addition of an effect size threshold is such a significant breakthrough.
The value of effect size
What is effect size? The Centre for Evaluation and Monitoring defines it as "simply a way of quantifying the size of the difference between two groups. It is easy to calculate, readily understood and can be applied to any measured outcome in Education…"
Placing the emphasis on the size of the effect helps educators understand the strength of the intervention and the potential impact on student outcomes. Effect size is especially helpful when evaluating two educational tools side by side.
As the
University of Michigan explains, “To know if an observed difference is not only statistically significant but also important or meaningful, you will need to calculate its effect size. Rather than reporting the difference in terms of, for example, the number of points earned on a test … effect size is standardized. In other words, all effect sizes are calculated on a common scale—which allows you to compare the effectiveness of different programs on the same outcome.”
How effect size is calculated and evaluated
Effect size is the standardized mean difference between the two groups:
Here’s an example: An effect size of 0.6 means an average student in the intervention group scores 0.6 standard deviations higher than an average student in the control group (that is, the scores of students in the intervention group exceed 73% of the scores of students who did not receive the intervention).
Although the equation is straightforward,
EdWeek has noted that effect-size studies “vary in quality, and many features of studies give hugely inflated estimates of effect sizes.”
Florida’s decision will move effect size into the spotlight, prompting more states to add it to their literacy program selection criteria and more vendors to include it in their marketing materials. With this in mind, EdWeek identified the following red flags for educators to consider before taking a vendor’s effect size claims at face value:
Flaws like these can easily produce effect sizes of +1.00 or more, and such studies should not be given weight by educators who are serious about knowing what does and doesn’t work in real classrooms. That’s the wisdom behind Florida’s innovation of combining ESSA standards with effect size; it reduces the risk of relying on either alone.
The characterization of effect size also calls for caution. While statistician
Jacob Cohen initially described an effect size of 0.20 as “small,” 0.50 as “medium,” and 0.80 as “large,” the Institute of Education Sciences has since challenged these characterizations, pointing out that in real-life educational experiments with broad measures of achievement and random assignment to treatments, effect sizes as large as +0.50—let alone +0.80—are hardly ever seen, except on occasion in studies of one-to-one tutoring.
This suggests that effect sizes up to 0.50 are a more appropriate range for evaluating educational tools.
The future of effect size
At Lexia, we applaud the Florida DOE for introducing effect size into the state's literacy tool evaluation criteria, and we hope to see other states follow suit.
Lexia also supports the work of
Evidence for ESSA, a website created by the nonprofit Center for Research and Reform in Education at Johns Hopkins University with the goal of identifying programs and practices that meet ESSA evidence standards. The website provides an impartial source for up-to-date, reliable information on programs that meet ESSA evidence standards; with effect size as an “apples-to-apples” measurement of the size of the intervention impact, educators can make more informed investments with greater confidence.
Research is the bedrock of Lexia’s educational mission. Founded in 1984 with a grant from the National Institute of Child Health and Human Development (NICHD), Lexia’s ongoing commitment to rigorous efficacy and learning outcomes research is at the center of our pedagogical approach.
Lexia’s Core5® Reading is one of the most rigorously researched, independently evaluated, and respected reading programs in the world. In an independent review of
Lexia Core5 Reading research, Evidence for ESSA awarded the program a “Strong” rating, concluding the following:
“The impact of Core5 was examined in a cluster-randomized study of five schools in the greater Chicago metropolitan area. The study focused on 116 students in grades K–5 receiving special education support for reading difficulties. Students received 'push-in' and/or 'pull-out' support from a special education teacher. After one year, students who used Core5 had significantly higher MAP scores compared to a control group (ES = +0.23), qualifying it for an ESSA 'Strong' rating.”
Lexia is hopeful that effect size will continue to be adopted by states and districts as a clear and reliable metric to evaluate literacy programs. We are confident that Core5 will continue to meet the high standards established by independent organizations such as the National Center on Intensive Intervention (NCII) and Evidence for ESSA.