The 21st Century Principal: VAM

Showing posts with label VAM. Show all posts

Saturday, November 15, 2014

9 Reminders for School Leaders When Reviewing Value-Added Data with Teachers

“A VAM (Value-Added Model) score may provide teachers and administrators with information on their students’ performance and identify areas where improvement is needed, but it does not provide information on how to improve the teaching.” American Statistical Association

Today, I spent a little time looking over the American Statistical Association’s "ASA Statement on Using Value-Added Models for Educational Assessment.” That statement serves as a reminder to school leaders regarding what these models can and cannot do. Here, in North Carolina and in other states, as school leaders begin looking at No Child Left Behind Waiver-imposed value added rankings on teachers, they would do well to remind themselves of the cautions describe by ASA last April. Here’s some really poignant reminders from that statement:

“Estimates from VAMs should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. These limitations are particularly relevant if VAMs are used for high-stakes purposes.”
“VAMs are generally based on standardized test scores, and do not directly measure potential teacher contributions toward other student outcomes.”
“VAMs typically measure correlation, not causation: Effects—positive or negative—attributed to a teacher may actually be caused by other factors that are not captured in the model.”
“Under some conditions, VAM scores and rankings can change substantially when a different model or test is used, and a thorough analysis should be undertaken to evaluate the sensitivity of estimates to different models.”
“Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions.
“Ranking teachers by their VAM scores can have unintended consequences that reduce quality.”
“The measure of student achievement is typically a score on a standardized test, and VAMs are only as good as the data fed into them.”
“Most VAMs predict only performance on the test and not necessarily long-range learning outcomes.”
“The VAM scores themselves have large standard errors, even when calculated using several years of data.”

In this season of VAM-viewing, it is vital that informed school leaders remind themselves of the limitations of this data. You can’t take the word of companies promoting these models as “objective” and “fool-proof” measures of teacher quality. After all, they have those multimillion dollar contracts or will lose them if one casts doubt about VAM use. Still, a 21st century school leader needs to have a more balanced view of VAM and its limitations.

Value-added ratings should never be used to inform school leaders about teacher quality. There are just too many problems. In the spirit of reviewing VAM data with teachers, here’s my top ten reminders or cautions about using value-added data in judging teacher quality:

1. Remember the limitations of the data. Though many states and companies providing VAM data fail to provide extensive explanations and discussion about the limitations of their particular value-added model, be sure those limitations are there. It is common to hide these limitations in statistical lingo and jargon, but as a school leader, you would do well to read the fine print, research for yourself, and understand value-added modeling for yourself. Once you understand the limitations of VAMs you will reluctantly make high stakes decisions based on such data.

2. Remember that VAMs are based on imperfect standardized test scores. No tests directly measure teacher contributions to student learning. In fact, in many states, tests used in VAMS were never intended to be used in a manner to judge teacher quality. For example, the ACT is commonly used in VAMS to determine teacher quality, but it was not designed for that purpose. As you review your VAM data, keep in mind the imperfect testing system your state has. That should give you pause in thinking that the VAM data really tells you flawlessly anything about a teacher’s quality.

3. Because VAMs measure correlation not causation, remind yourself as you look at a teacher’s VAM data that he or she alone did not cause those scores or that data. There are many, many other things that could have had a hand in those scores. No matter what promises statistics companies or policymakers make, remember that VAMs are as imperfect as the tests, the teacher, the students, and the system. VAM data should not be used to make causal inferences about the quality of teaching.

4. Remember that different VAM models produce different rankings. Even choosing one model over another reflects subjective judgment. For example, some state’s choose VAMs that do not control for other variables such as student demographical background because they feel to do so makes an excuse for lower performance for low-socioeconomic students. That is a subjective value judgment on which VAM to use. Because of this subjective judgment, they aren’t perfectly objective. All VAM models aren't equal.

5. Remind yourself that most VAM studies find that teachers account for about 1 to 14 % of variability in test scores. This means that teachers may not have as much control over test scores as many of those using VAMs to determine teacher quality assume. In a perfect manufacturing system where teachers are responsible for churning out test scores, VAMs make sense. Our schools are far from perfect, and there are many, many things out there impacting scores. Teaching is not a manufacturing process nor will it ever be.

6. Remind yourself that should you use VAMs in a high stakes manner, you may actually decrease the quality of student learning and harm the climate of your school. Turning your school into a place where only test scores matter, where teaching to the test is everybody’s business is a real possibility should you place too much emphasis on VAM data. Schools who obsess about test scores aren't fun places for anybody, teachers or students. Balance views of VAM data as well as test data is important.

7. Remember that all VAM models are only as good as the data fed into them. In practical terms, remember the imperfect nature of all standardized tests as you discuss VAM data. Even though states don’t always acknowledge the limitations of their tests, that doesn’t mean you can’t. Keep the imperfect nature of tests and VAMs in mind always. Perhaps then, you want use data unfairly.

8. Remember that VAMs only predict performance on a single test. They do not tell you thing about the long-range impact of that teacher on student performance.

9. Finally, VAMs can have large standard errors. Without getting entangled in statistical lingo, just let it suffice to say that VAMs themselves are imperfect. Keep that in mind when reviewing the data with teachers.

The improper use of VAM data by school leaders can downright harm education. It can turn schools into places where in-depth learning matters less than test content. It can turn teaching into a scripted process of just covering the content. It can turn schools from places of high engagement, to places where no one really wants to be. School leaders can prevent that by keeping VAM data in proper perspective, as the "ASA Statement on Using Value-Added Models for Educational Assessment" does.

Saturday, June 14, 2014

Value-Added Measures and 'Consulting Chicken Entrails' for High-Stakes Decision-Making

“Like the magician who consults a chicken's entrails, many organizational decision makers insist that the facts and figures be examined before a policy decision is made, even though the statistics provide unreliable guides as to what is likely to happen in the future.” Gareth Morgan, Images of Organization: The Executive Edition

Could it be that using Value-added data is the equivalent of consulting “chicken-entrails” before making certain high-stakes decisions? With all the voodoo, wizardry, and hidden computations that educators are just supposed to accept on faith from companies crunching the data, value-added data might as well be “chicken entrails” and the “Wizards of VAM” might as well be high-priests or magicians reading those innards and making declarations of effectiveness and fortune telling. The problem, though, is value-added measures are prone to mistakes, despite those who say “it’s best we have.” Such reasoning itself smells of simply accepting its imperfections. One only need hold their nose, and take the medicine.

What President Obama, Arne Duncan, down through our own North Carolina state education leaders do not get is that Value-added measures simply are not transparent. If anyone reads any of the current literature on these statistical models, you immediately see many, many imperfections. There’s certainly enough errors of concern to argue that VAMs have zero place in making high-stakes decisions.

As the “Wizards of VAM” prepare to do their number crunching and “entrails reading” in North Carolina, we await their prognostications and declarations of “are we effective or ineffective?” Let’s hope it doesn’t smell too bad.

Thursday, April 10, 2014

Let the VAM Lawsuits Begin: Issues and Concerns with Their High-Stakes Use

Lawsuits against states using value-added models in making teaching evaluation decisions has begun in earnest. There are now three lawsuits underway challenging the use of this controversial statistical methodology and the use of test scores to determine teacher effectiveness. This increase in litigation is both an indication of how rapidly states have adopted the practice, and how these same states failed to address so many issues and concerns with the use of VAMs in this manner.

Two lawsuits have now been filed in Tennessee against the use of value-added assessment, known as TVAAS as a part of teacher evaluation. The first lawsuit was filed against Knox County Schools in Tennessee by the Tennessee Education Association on behalf of an alternative school teacher who was denied a bonus because of her TVAAS ratings. (See “Tennessee Education Association Sues Knox County Schools Over Bonus Plan” ) In this case, the teacher was told she would receive system-wide TVAAS estimates because of her position at an alternative school, but 10 of her students were used anyway in her TVAAS score, resulting in a lower rating and no bonus. This lawsuit contests the arbitrariness of TVAAS estimates that use only a small number of teacher’s students to determine overall effectiveness.

In the second lawsuit, filed also against Knox County Schools, but also against Tennessee Governor Bill Haslam, state Commissioner of Education Kevin Huffman and the Knox County Board of Education, an eighth grade science teacher claims he was also denied a bonus unfairly after his TVAAS value-added rating was based on only 22 of his 142 students. (See “TEA Files Second Lawsuit Against KCS, Adds Haslam and Huffman as Defendents” ) Again, the lawsuit points to the arbitrariness of the TVAAS ratings.

A third lawsuit has been filed in Rochester, New York by the Rochester Teachers Association alleging that officials in that state “failed to adequately account for the effects of severe poverty, and as a result, unfairly penalized Rochester teachers on their Annual Professional Performance Review” or yearly teacher evaluations. (See “State Failed to Account for Poverty in Evaluations”). While it appears that this Rochester suit is disputing the use of growth score models not value-added, it also challenges the whole assumption and recent fad being pushed by politicians and policymakers of using test scores to evaluate teachers.

North Carolina jumped on the value-added bandwagon in response to US Department of Education coercion, and now the state uses its TVAAS version called EVAAS, or Educator Value Added Assessment System as part of teacher and principal evaluations. Fortunately, no districts have had to make high stakes decisions using the disputed measures so the lawsuit floodgate hasn't opened in our state yet, but I am sure once EVAAS is used to make decisions about employment, the lawsuits will begin. When those lawsuits begin, the American Statistical Association has perhaps outlined some areas of contention about the use of VAMs in educator evaluations in their ASA Statement on Using Value-Added Models for Educational Assessment. Here’s some points made by their position statement that clearly outlines the questions about the use of VAMs in teacher evaluations, a highly questionable statistical methodology.

“VAMs (Value-added models) are complex statistical models, and high-level statistical expertise is needed to develop the models and interpret their results.” States choosing to use these models are trusting third-party vendors to develop them, provide the ratings, and they are expecting educators to effectively interpret those results. Obviously, there’s so much that can go wrong with the interpretation of VAM results, the ASA is warning that there is a need of people who have the expertise to interpret those results. I wonder how many of these states who have implemented these models have spent time and money training teachers and administrators to interpret these results, other than subjecting educators to one-time webinars or "sit-n-gets"?
“Estimates of VAM should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. THESE LIMITATIONS ARE PARTICULARLY RELEVANT IF VAMS ARE USED FOR HIGH STAKES PURPOSES (Emphasis Mine).” I can’t speak for other states, but in North Carolina there has been little to no disclosure or discussion about the limitations of value-added data. There’s been more public relations, advertising, and promotion of the methodology as a new way of evaluating educators. They even have SAS promoting the methodology for them.The Obama administration has done this as well. The attitude in North Carolina seems to be, “We’re gonna evaluate teachers this way, so deal with it.” There needs to be discussion and disclosure about SAS’s EVAAS model and the whole process of using tests to evaluate teachers in North Carolina. Sadly, that’s missing. I can bet it’s the same in other states too.
“VAMs are generally based on standardized test scores, and do not directly measure potential teacher contributions toward other student outcomes.” In other words, VAMs only tell you how students do on standardized tests. They can’t tell you all the other many, many ways teachers contribute to students’ lives. The main underlying assumption with using VAMs in teacher evaluations is that only test scores matter, regardless of what supporting policymakers say. While its true that the North Carolina Evaluation model does include other standards, how long will it take administrators and policymakers to ignore those standards and zero in on test scores because they are seen as the most important? The adage, "What gets tested, gets taught!" is true and "What get's emphasized the most through media and promotion, matters the most" is also equally true. When standard 6 or 8 is the only standard on the educator evaluation where an educator is "In Need of Improvement" then you can bet test scores suddenly matter more than anything else.
“VAMs typically measure correlation, not causation: Effects---positive or negative---attributed to a teacher may actually be caused by other factors that are not captured in the model.” There are certainly many, many things----poverty, lack of breakfast, runny noses---that can contribute to a student’s test score, yet there’s a belief that a teacher directly causes a test score to happen, especially by those pushing VAMs in teacher evaluations. The biggest assumption by those promoting VAMs in teacher evaluations is that the teacher's sole job or part of their job is the production of test scores. In reality, teaching is so much more complex than that, and those reducing it to a test score have probably not spent much time teaching themselves.
“Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of the opportunities for quality improvement are found in system-level conditions.” Yet in most states, educational improvement falls almost entirely on the backs of educators in the schools in the form of VAM-Powered Teacher Evaluations. There's little effort to improve the system. There’s no effort to improve classroom working conditions, provide professional development funding/resources, adequate material/resource funding. Instead of looking at how the system prevents excellence and innovation with its top-down mandates and many other ineffective measures, many states, including North Carolina and the Obama administration place accountability entirely and squarely on the backs of educators in the classrooms and schools. If the education system is broken, you don't focus on parts, you improve the whole.
“Ranking teachers by their VAM scores can have unintended consequences that reduce quality.” If all learning that is important can be reduced to a one-time administered-bubble-sheet test, then all is well for VAM and the ranking of teachers. But every educator knows that tests measure only a minuscule portion of important learning. Many important learning experiences can't even be measured by tests. But, if you elevate tests in a high stakes manner, then those results become the most important outcome of the school and the classroom. The end result is teaching to the test and test-prep where the test becomes the curriculum. Getting high test scores becomes the goal of teaching. If that’s the goal of teaching, who would want to be teacher? Elevating test scores through VAM only will escalate the exit of teachers from the profession and discourage others from entering it. because there's nothing fulfilling about improving student test scores. We didn't become educators to raise test scores; we became educators because we wanted to teach kids.
“The measure of student achievement is typically a score on a standardized test, and VAMs are only as good as the data fed into them.” Ultimately, VAMs are only as good as the tests administered to provide the data that feeds the model. If tests don’t adequately measure the content, or if they are not standardized or otherwise of high quality, then the VAM estimates are equally of dubious quality. When states try to scramble to create tests on the fly and do not develop quality tests, then the VAM estimates are of dubious quality too. North Carolina scrambled to create multiple tests in many high school, middle and elementary subjects just to have data to feed their EVAAS model. Yet, those tests and the process of their creation and field testing, even how they’re administered makes them questionable candidates for serious VAM use. VAMs require high-quality data to provide high-quality estimates. The idea that "any-old-test-will-do" is an anathema to VAMs which require quality test data.

The American Statistical Association position statement on using value-added models in educational assessment makes some supporting statements about their use too. They can be effectively used as part of the data teachers use to adjust classroom teaching. But when a state does not return those scores until October or later, its impossible to use that data to inform teaching, three months into the school year. Also, just getting a rating does little to inform teaching. Testing provides an opportunity for policymakers to provide teachers with valuable data to improve teaching. Sadly, the current data provided is too little and too late.

As the VAM-fed teacher evaluation fad and craze continues and grows, it is important for all educators to inform themselves about the controversial statistical practice. It is not a methodology without issues despite what the Obama administration and state education leaders say. Being knowledgeable about it means understanding its limitations as well as how to properly interpret and use such data. Don't wait for states and the federal government to provide that information: They are too busy promoting its use. The points made in the American Statistical Association’s Statement on Using Value-Added Models for Educational Assessment are excellent points of entry for learning more.

Wednesday, November 27, 2013

Misplaced Faith in Value-Added Measures for Teacher Evaluations

Due to Race to the Top and the No Child Left Behind waivers, 41 states have now elected to use Value-Added Measures or VAMs as a part of teacher evaluations. This is done, without regard to the limitations these statistical models have and without any supporting research that says doing so will increase student achievement. What are those limitations? In a recent post, the authors of Vamboozled, provided this post entitled "Top Ten Bits of VAMmuniton" that educators can use to defend themselves with research-based data against this massive non-research-based shift toward a model of teacher evaluation that will most likely do more to damage education than No Child Left Behind or any other education "reforms" of modern times.

I recently uncovered a journal article entitled "Sentinels Guarding the Grail: Value-Added Measurement and the Quest for Education Reform." which describes a rhetorical study by Rachel Gabriel and Jessica Nina Lester which examined the discourse during a meeting of the Tennessee Teacher Evaluation Advisory or TEAC from March 2010 through April 2011. TEAC was a 15 member panel appointed by the governor of Tennessee to develop a new teacher evaluation policy. The authors of this study examined the language used by those on this panel as they deliberated through the various components of a teacher evaluation policy.

What is interesting about this study is that the language employed by those in this meeting betray some important assumptions and beliefs about teaching, learning, testing, and value-added measures that aren't entirely supported by research or common sense.

According to Gabriel and Lester, Value Added Measurement became a sort of "Sentinel of Trust" and sort of a "Holy Grail" in measuring teacher effectiveness during these meetings in spite of all the research and literature that points to its limitations. According to the author's of this study, here's some of the assumptions those in this TEAC meeting demonstrated through the language they used:

1) Value-added measures alone defines effectiveness.
2) Value-added measures are the only "objective" option.
3) Concerns about Value added measures are minimal and not worthy of consideration.

As far as I can see, there is enormous danger when those making education policy buy into these three mistaken assumptions about value added measures.

First of all, VAMs do not alone define effectiveness. They are based on imperfect tests and often a single score collected at one point in time. Tests can't possibly carry out the role of defining teacher effectiveness because no test is even capable of capturing all that students learn. Of course, if you believe by faith that test scores alone equal student achievement, then sure, VAMs are the "objective salvation" you've been waiting for. However, those of us who have spent a great deal of time in schools and classrooms know tests hardly deserve such an exalted position.

Secondly, even value added measures are not as objective as those who push them would like to be. For example, the selection of which value added model to use is riddled with subjective judgements. Which factors to include and exclude from the model is a subjective judgment too. Choices of how to rate teachers using these requires subjective judgment as well, not to mention that VAMs are not entirely based on "objective tests" either. All the decisions surrounding their development, implementation and use require subjective judgment based on values and beliefs. There is nothing totally objective about VAMs. About the only objective number that results from value-added measures is the amount of money states pay consulting and data firms to generate them.

Finally, those who support value added measures often just dismiss concerns about the measures as not a real problem. They use the argument that VAMs are the "best measures" we've got currently as flawed as they are. Now that's some kind of argument! Suppose I was your surgeon, and used "tapping on your head" to decide whether to operate for a brain tumor because "tapping" was the best tool I've got? The whole 'its-the-best-we-have' argument does not negate the many flaws and issues and the potential harm using value-added measures have. Instead of dismissing the issues and concerns about VAMs, those who advocate for their use in teacher evaluations need to address every concern. They need to be willing to acknowledge the limitations, not simply discard them.

I offer one major, final caution to my fellow teachers and school leaders: it is time to begin really asking the tough difficult questions about the use of VAMs in evaluations. I strongly suggest that we learn all we can about the methodology. If anyone uses the phrase, "Well, it's too difficult to explain" we need to demand that they explain anyway. Just because something looks complicated does not mean its effective. Sometimes we as educators are too easily dazzled by the "complicated" anyway. The burden is on those who support these measures to adequately explain them and to support their use with peer-reviewed research, not company white-papers and studies by those who developed the measures in the first place.