Will EVAAS Make Wake Schools Better? Part II

Print More

This story is the second in a two-part series. Read Part 1 here.

The Wake County Public School System, like other districts in North Carolina, uses the Education Value-Added Assessment System (EVAAS) to rate its teachers.

But EVAAS has drawn fire from education academics, teachers and teacher organizations in the state and around the country.

Despite these criticisms, Wake Superintendent Tony Tata is already attempting to use EVAAS to staff struggling schools and classrooms with teachers deemed effective by the program.

Chief among EVAAS’s critics is Audrey Amrein-Beardsley, assistant professor at Arizona State University, who has published the most comprehensive studies articulating problems with the program.

The North Carolina Association of Educators (NCAE), the state’s largest teacher organization, has also heard complaints from educators across the state and documented abuses of EVAAS.

Critics say EVAAS:

  • Does not take into account a student’s background
  • Has been used as a punishment
  • Has not been sufficiently assessed in peer-reviewed scholarly journals
  • Is confusing for many teachers
  • Encourages teaching to the test

Student Background

Just before the last school board election in 2009, a report surfaced that criticized WCPSS’s in-house system for rating its teachers – the Effectiveness Index.

William Sanders, developer of EVAAS and senior director of the EVAAS K-12 division at SAS, helped author the report.

SAS criticized the Effectiveness Index for adjusting student performance data by considering socioeconomic factors.

“For a rich white kid and a poor black kid we should expect the same level of academic achievement and want the same expectation for both kids next year,” Sanders said in a phone interview with The Record.

The report seemed to lend some credence to conservative school board contenders — who went on to win that October — looking to do away with the policy of assigning students to schools based on socioeconomic status.

Of course, the new conservative board majority accomplished this goal in 2010. But many education academics believe socioeconomic status is essential to any study of teacher effectiveness.

Beardsley points to a preponderance of data showing socioeconomic background predicts a student’s performance in school. Poorer kids tend to do worse academically than their more affluent counterparts.

“The real question here is … do students from, let’s say, low socioeconomic backgrounds learn as fast as students who have two parents, who travel, who have tutors outside of the classroom perhaps, who have homework priorities, who read as part of their leisure time, etc.?” Beardsley wrote in an e-mail to The Record.

If so, she said, then EVAAS — “the only sophisticated analytical model for measuring student achievement that does not account for student background factors” — will show inherent biases against teachers of low-income students and in favor of teachers of affluent students.

The current school year marks the third full year that WCPSS has used EVAAS. Wake County Superintendent Tony Tata has repeatedly said that he wants to use EVAAS data to pair high-performing teachers with low-performing students.

William Sanders, developer of EVAAS, claims that he has evidence that teachers who move from one environment — affluent and high-achieving — to a radically different environment — low-income and underachieving — generally continue to do just as well in EVAAS terms.

[pullquote]“The real question here is … do students from, let’s say, low socioeconomic backgrounds learn as fast as students who have two parents, who travel, who have tutors outside of the classroom perhaps, who have homework priorities, who read as part of their leisure time, etc.?”~Audrey Amrein-Beardsley[/pullquote]

But Sanders offers no proof of this claim.

Whether teachers perform well with varying sets of students at new schools may depend on how the matches are made.

“There should be some kind of perk,” said Angela Farthing, manager of the Center for Teaching and Learning of the NCAE. “Just because a teacher is successful at Panther Creek High School and Enloe High School doesn’t mean they’ll be successful at Halifax County Schools. Involuntary transfer is not going to help. It takes all the staff at a school to be successful. If you transfer a teacher to a school, take a high-performing teacher from one school and put them in schools that are not high-performing, that may not be the only piece to help turn the school around.”

“Hammer Instead of a Carrot”

As potential data for teacher evaluations, EVAAS ratings are protected as confidential personnel information under state law. But the law has not always been respected.

Farthing cites Guilford County, where EVAAS teacher ratings were posted in school hallways, as an example of districts using EVAAS to harm instead of help.*

Also under state law, EVAAS ratings can be part of a teacher’s evaluation but cannot be used as the sole basis for termination.

Yet throughout the state, said Farthing, administrators imply or directly state that decisions about termination are linked to testing data.

“The reason that [EVAAS] has gotten a negative connotation is that principals say that if your scores are bad, you’re gone,” she said.

Using EVAAS data alone — apart from observations, artifacts and other effectiveness measures — to make decisions about a teacher’s future goes against scientific standards, Beardsley said.

“Given this standard [of multiple measures] was developed by the American Educational Research Association, American Psychological Association and the National Council on Measurement in Education, those arguing, not evidencing, that these models are so strong that they can indeed operate alone and in isolation are essentially giving the best measurement specialists and researchers in the field an ideological middle finger,” Beardsley wrote.

Sanders said EVAAS was never designed to be used in isolation.

Wake County uses EVAAS data as part of the teacher evaluation process, but Assistant Superintendent for Evaluation and Research David Holdzkom denied the district uses the program to punish teachers.

Farthing knows of no instances of misuse of EVAAS data in Wake County. But Tata has expressed his willingness to publish the effectiveness ratings of individual teachers.

“I know if my kid was in school and you were a highly effective teacher, I would want to know,” said Tata, after a press briefing in April.

“It’s being used as a hammer instead of a carrot,” Farthing said. “Students are more than test scores. We’re focusing more on the test scores than on helping students be learners and thinkers.”

Some education academics question the assumptions programs like EVAAS make about teachers.

“I’m not sure the discussion about good teachers is even a useful one,” said Jason Osborne, associate professor of educational psychology at North Carolina State University. “I’m sure the vast majority of teachers are doing their darnedest to do what’s right for students. It may be that it’s not the right environment. It may be that the students aren’t coming to the table and allowing teachers to help them.

“We need to get away from seeing teachers as heroes and villains,” he added. “Teachers are only half the story, or less than half.”

EVAAS is password-protected and accessed online. Other than board members and staff in the offices of the superintendent and the Evaluation and Research Department, the only school personnel that have access to EVAAS are principals and any designees that a principal chooses, such as department chairs. Teacher ratings are not visible to other teachers, but student ratings are, making it simple to pair a student’s rating with his or her teacher.


Peer Review

Multiple reports on EVAAS have noted the shortage of available data on the program in respected academic journals.

Sanders points to the U.S. Department of Education’s approval of Tennessee’s growth model, which uses a version of EVAAS, as evidence the program has been fully vetted and validated. But Beardsley and other researchers do not have findings that this evidence adequate.

“Because these [value-added models] certainly make sense, and nearly everyone agrees that these models are improving the current accountability system, our federal and state policymakers … have essentially gone ‘all in’ without nearly enough evidence that they work in the ways intended, and for which they are being oversold,” Beardsley wrote.

Moreover, researchers have questioned the reliability of EVAAS, complaining that in some cases teachers are rated effective one year and ineffective the next.

Sanders acknowledges that with less data EVAAS’s ratings become less reliable. He says that with only one or two years of data available for a teacher, the margin of error can be as high as 50 percent.

“This simply isn’t good enough to make consequential decisions,” Beardsley wrote. Even with three years of data, the district can only be 80 to 90 percent certain that the ratings are accurate.”

Teacher Confusion

Farthing believes that to use EVAAS correctly and to its full potential, teachers and principals have to completely understand it. Impossible, she said, as long as SAS maintains its algorithms as proprietary secrets.

“I have seen presentation after presentation [about EVAAS] and I couldn’t tell you how it all works,” Farthing said. “It’s something that statisticians understand, but the average educator and the average parent does not understand.”

Farthing believes EVAAS works best when principals allow access to all of their teachers.

Sanders counters that the complexity of EVAAS is its strength.

“The No. 1 huge disadvantage of more simple value-added models is that they throw away data that they don’t have for kids,” he said.

EVAAS compensates for missing data due to students entering the district late or missing tests, problems that disproportionately affect poorer children.

“I shudder to think how many places have adopted that simple approach [of other value-added models] without realizing how grossly inaccurate they are,” Sanders said.

Teaching to the Test

Since EVAAS uses end-of-grade and end-of-course test data, some argue the program pushes teachers to focus only on content and skills featured on those exams.

“Some policy makers say that’s what we should be doing,” said Farthing, who used to teach social studies. “But for a long time the curriculum was a mile wide and an inch deep.”

The North Carolina Standard Course of Study is the state’s curriculum for most subjects. This curriculum has been improving, Farthing said, but the standardized tests are still lackluster.

“The U.S. History EOC is an awful test,” she said. “Recall a constitutional amendment or a paper from The Federalist Papers. It’s just a recall test.”

A focus on testing makes for uninspired teachings, Farthing believes.

“To teach to the test means you have to practice the test,” Farthing said. “Students get tired and bored.”

Some teachers — those in grades K-2 and those not teaching core subjects — do not have to deal with the standardized tests. This puts unfair pressure on teachers of higher grades and subjects like reading and math, Farthing said.

She acknowledges that EVAAS has proved useful in student math placement.

“One thing I’ve heard [from teachers] is that it does help them if they know a student is going to need help in algebra,” she said.



Read more about EVAAS pros and cons in these related studies and published papers:

2 thoughts on “Will EVAAS Make Wake Schools Better? Part II

  1. The question whether EVAAS should control for socioeconomic factors is complex. The answer may be different when you are talking about using EVAAS to predict an individual student’s trajectory—where you might have a comprehensive set of scores that incorporate all the nonteacher effects that are consistent over time for that child—and when you are using EVAAS to evaluate a teacher’s “effectiveness” at any given point in time. In the latter case, what is labelled “teacher” effectiveness is really the effectiveness of the class’s overall educational environment, which includes anything outside of the teacher’s efforts that causes class test scores to change in some manner different from that of the average class. This might include things that happen in school but not due to the teacher and things that happen in the tremendous amount of time students spend away from the school environment altogether.

    Because of state mandates, the promises North Carolina has made in our Race to the Top grant applications, and our superintendent’s commitment to the tool, we will be using it for the latter purpose. As we do that, it would be prudent to bring on some academic researchers to look at interesting questions like the extent to which the ratings change when teachers are put in different settings, e.g. moved from low poverty settings to high poverty settings, or simply change over time in the same setting. Variability in the ratings would be good evidence that better tools are needed; consistency would be good evidence that we are on the right track, assuming we agree that pushing test scores up is an appropriate and important educational goal.

  2. Ms. Amrein-Beardsley has published the same content about value-added results in two different publications with only slight modification in the second publication. To my surprise, the second publication reflects no awareness on her part that SAS responded publicly to the concerns expressed initially. The following link is a newer SAS response to value-added modeling considerations.


    Response to Criticisms 4 and 5 in the publicly available paper are most relevant to the concerns raised in her recent interview.

    Simply put, Ms. Amrein-Beardsley wonders if teachers are disadvantaged when the statistical modeling makes no direct adjustment for student demographic variables. The research evidence provided at the above link shows these adjustments are typically unnecessary if the modeling is sufficiently robust, particularly at the student level. Because the EVAAS modeling includes so much historical testing information per student, statisticians consider it to have the necessary robustness.

    While we understand the spirit and intent of such adjustments, the unintended consequences often become a lesser expectation on the part of educators for appropriate academic progress for poor and minority students. Such expectations can become a self-filling prophecy for their students.

    Response to Criticism 4 offers evidence that contradicts Ms. Amrein-Beardsley’s concerns, and shows that comparably prepared students within the same classroom make comparable progress, regardless of poverty level or ethnicity. It is possible to include demographic variables in robust modeling, but policy makers are encouraged to carefully weigh the potential risks to students as well as the potential risks to educators before they make a final decision on value-added modeling requirements. It’s important that struggling students not have those struggles masked by unnecessary adjustments.

    Although Ms. Amrein-Beardsley is very concerned about ‘fairness’ to teachers, her positions regarding simplistic analyses and transparency actually encourage modeling that makes teachers more vulnerable to misclassification. In truth, the use of simplistic modeling is to the detriment of teachers. Response to Criticism 5 provides additional references on this topic.