Showing posts with label value added measures. Show all posts
Showing posts with label value added measures. Show all posts

Sunday, November 12, 2017

Building a Better Teacher Through VAMs? Not So Fast According to Mark Paige's Book

As a part of my research explorations, I stumbled across a relatively new book published in 2016 about the problems with using value-added measures in teacher evaluations. This book entitled Building a Better Teacher: Understanding Value-Added Models in the Law of Teacher Evaluation is a short and concise read that any administrator who currently encounters the use of value-added data in teacher evaluations should read.

Paige's argument is rather straightforward. Value-added models have statistical flaws and are highly problematic, and should not be used to make high-stakes decisions about educators. Scholars across the board have made clear that are problems with VAMs, enough problems that they should only be used in research and to cautiously draw conclusions about teaching. Later, Paige also provides advice to opponents to using value-added models in teacher education as well. Attempting to challenge the use of value-added models in teacher evaluations through the federal courts may be fruitless. According to Paige:
"At least at the federal level, courts will tolerate an unfair law, so long as it may be constitutional." p. 24
In other words, our courts will allow the use of VAMs in teacher evaluations, even if used unfairly. Instead, Paige encourages action on the legislative side. Educator opponents of VAMs should inform legislators of the many issues with the statistical measures and push for laws that restrict their use. In states with teacher unions, he encourages teachers to use the collective bargaining process to ensure that VAMs are not used unwisely.

Throughout Paige's short read, there are reviews of legal cases that have developed around the use of VAMs to determine teacher effectiveness and lots of information about the negative consequences of this practice.

Here are some key points from chapter 1 of Mark Paige's book Building a Better Teacher: Understanding Value-Added Models in the Law of Teacher Evaluation.

  • VAMs are statistical models that attempt to estimate a teacher's contribution to student achievement.
  • There are at least (6) different VAMs, each with relative strengths and weaknesses.
  • VAMs rely heavily on standardized tests to assess student achievement.
  • VAMs have been criticized on a number of grounds as offending various statistical principles that ensure accuracy. Scholars have noted that VAMs are biased and unstable, for example.
  • VAMs originated in the field of economics as a means to improve efficiency and productivity.
  • The American Statistical Association has cautioned against using VAMs in making causal conclusions between a teacher's instruction and a student's achievement as measured on standardized tests.
  • VAMS raise numerous nontechnical issues that are potentially problematic to the health of a school or learning climate. These include the narrowing of curriculum offerings and a negative impact on workforce morale.
Throughout his book, Paige offers numerous key points that should allow one to pause and interrogate the practice of using VAMs to determine teacher effectiveness.


Using VAMs to Determine Teacher Effectiveness: Turning Schools into Test Result Production Factories

"But VAMs have fatal shortcomings. The chief complaint: they are statistically flawed. VAMs are unreliable, producing a wide range of ratings for the same teacher. VAMs do not provide any information about what instructional practices lead to particular results. This complicates efforts to improve teacher quality; many teachers and administrators are left wondering how and why their performance shifted so drastically, yet their teaching methods remained the same." Mark Paige, Building a Better Teacher: Understanding Value-Added Models in the Law of Teacher Evaluation
Mark Paige's book is a quick, simple view regarding the problems with using value-added models as a part of teacher evaluations. As he points out, the statistical flaws are a fatal shortcoming to using them to definitively settle the questions regarding whether a teacher is effective. In his book, he points to two examples of teachers where those ratings fluctuated widely. When you have a teacher who rates "most effective" to "not effective" within a single year, especially when that teacher used the same methods with similar students, there should be a pause of question and interrogation.

Now, the VAM proponents would immediately diagnose the situation thus, "It is rather obvious that the teacher did not meet the needs of students where they are." What is wrong with the logic of this argument? On the surface, arguing that the teacher failed to "differentiate" makes sense. But, if there exists "universal teaching methods and strategies" that foster student learning no matter the context, then what would explain the difference? The real danger of using VAMs in the manner suggested by the logic of "differentiation" invalidates the idea that there are universally, research-based practices to which teachers can turn in improving student outcomes. What's worse, teaching becomes a game of pursuit every single year, where the teacher simply seeks out, not necessarily the best methods for producing learning of value, but instead, becomes, in effective a chaser of test results. Ultimately, the school becomes a place where teachers are simply production workers whose job is to produce acceptable test results, in this case, acceptable VAM results.

The American Statistical Association has made it clear. VAMs do not predict "causation." They predict correlation. To conclude that "what the teacher did" is the sole cause of test results is to ignore a whole world of other possibilities and factors that has a hand in causing those test results. Administrators should be open to the possibility that VAMs do not definitively determine a teacher's effectiveness.

If we continue down the path of using test score results to determine the validity and effectiveness of every practice, every policy, and everything we do in our buildings, we will turn out schools in factories whose sole purpose is produce test scores. I certainly hope we are prepared to accept along with that the life-time consequential results of such decisions.


NOTE: This post is a continued series of posts about the practice of using value-added measures to determine teacher effectiveness based on my recently completed dissertation research. I make no efforts to hide the fact that I think using VAMs to determine the effectiveness of schools, teachers, and educators is poor, misinformed practice. There is enough research out there to indicate that VAMs are flawed, and that there application in evaluation systems have serious consequences.

Wednesday, November 11, 2015

Value-Added Models Aren’t Settled Science No Matter What Ed Leaders Say

The American Education Research Association (AERA) has released a new statement about the use of value-added models (VAMs) in educator evaluations and to evaluate educator preparation programs. It is no secret that as an experienced educator of twenty-six years, I do not find VAMs very useful or fair indicators of teacher effectiveness. The results are often only available three or four months into the school year, so even if they were in a useful form that could inform specific classroom instruction, they arrive much too late, at least for first semester students to be of use.

But the AERA clearly points out that using VAMs as indicators of teacher effectiveness are still too deeply flawed to be used in that manner. (See the AERA statement for yourself here.) Many states, including North Carolina, have charged full speed ahead after being blackmailed by the Obama administration into adopting VAMs. This has occurred in spite of concerns over the limitations and flaws with their use.

There is certainly no disagreement from me that there is always room for improvement in teacher effectiveness, but I also think our false faith in the objectivity of value-added models sees these statistical models as some kind of “savior of public education” for which they are not, nor will ever be. Their limitations are too great to be useful for anything except as a small piece of data schools can consider about how their students are doing.

Here’s limitations outlined by the AERA statement:

  • Current state tests are too limited to measure teacher effectiveness, and most were not designed for that purpose anyway. They cover only a limited amount of content teachers teach and they are too imprecise to be used in determining teacher quality. They also only measure grade-level standards so they fail to measure the growth of students above or below those standards.
  • VAM estimates have not been shown to effectively isolate estimates of teacher effectiveness from other school fators or outside of school factors. To expect VAMs to do this entirely is unrealistic and foolhardy.

As usual, the adoption of VAMs illustrates one very bad flaw education leaders and education policy makers have: they adopt what they see as “common sense” measures without conducting critical and empirical explorations about whether the policies will work as they intend them. The history of public education is littered with these actions, and you would think a wise education leader would learn that what just seems “common sense” or “conventional wisdom” is perhaps nothing of the sort.

Saturday, November 15, 2014

9 Reminders for School Leaders When Reviewing Value-Added Data with Teachers

“A VAM (Value-Added Model) score may provide teachers and administrators with information on their students’ performance and identify areas where improvement is needed, but it does not provide information on how to improve the teaching.” American Statistical Association
Today, I spent a little time looking over the American Statistical Association’s "ASA Statement on Using Value-Added Models for Educational Assessment.” That statement serves as a reminder to school leaders regarding what these models can and cannot do. Here, in North Carolina and in other states, as school leaders begin looking at  No Child Left Behind Waiver-imposed value added rankings on teachers, they would do well to remind themselves of the cautions describe by ASA last April. Here’s some really poignant reminders from that statement:
  • “Estimates from VAMs should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. These limitations are particularly relevant if VAMs are used for high-stakes purposes.”
  • “VAMs are generally based on standardized test scores, and do not directly measure potential teacher contributions toward other student outcomes.”
  • “VAMs typically measure correlation, not causation: Effects—positive or negative—attributed to a teacher may actually be caused by other factors that are not captured in the model.”
  • “Under some conditions, VAM scores and rankings can change substantially when a different model or test is used, and a thorough analysis should be undertaken to evaluate the sensitivity of estimates to different models.”
  • “Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions.
  • “Ranking teachers by their VAM scores can have unintended consequences that reduce quality.”
  • “The measure of student achievement is typically a score on a standardized test, and VAMs are only as good as the data fed into them.”
  • “Most VAMs predict only performance on the test and not necessarily long-range learning outcomes.”
  • “The VAM scores themselves have large standard errors, even when calculated using several years of data.”
In this season of VAM-viewing, it is vital that informed school leaders remind themselves of the limitations of this data. You can’t take the word of companies promoting these models as “objective” and “fool-proof” measures of teacher quality. After all, they have those multimillion dollar contracts or will lose them if one casts doubt about VAM use. Still, a 21st century school leader needs to have a more balanced view of VAM and its limitations.

Value-added ratings should never be used to inform school leaders about teacher quality. There are just too many problems. In the spirit of reviewing VAM data with teachers, here’s my top ten reminders or cautions about using value-added data in judging teacher quality:

1.  Remember the limitations of the data. Though many states and companies providing VAM data fail to provide extensive explanations and discussion about the limitations of their particular value-added model, be sure those limitations are there. It is common to hide these limitations in statistical lingo and jargon, but as a school leader, you would do well to read the fine print, research for yourself, and understand value-added modeling for yourself. Once you understand the limitations of VAMs you will reluctantly make high stakes decisions based on such data.

2. Remember that VAMs are based on imperfect standardized test scores. No tests directly measure teacher  contributions to student learning. In fact, in many states, tests used in VAMS were never intended to be used in a manner to judge teacher quality. For example, the ACT is commonly used in VAMS to determine teacher quality, but it was not designed for that purpose. As you review your VAM data, keep in mind the imperfect testing system your state has. That should give you pause in thinking that the VAM data really tells you flawlessly anything about a teacher’s quality.

3. Because VAMs measure correlation not causation, remind yourself as you look at a teacher’s VAM data that he or she alone did not cause those scores or that data. There are many, many other things that could have had a hand in those scores. No matter what promises statistics companies or policymakers make, remember that VAMs are as imperfect as the tests, the teacher, the students, and the system. VAM data should not be used to make causal inferences about the quality of teaching.

4. Remember that different VAM models produce different rankings. Even choosing one model over another reflects subjective judgment. For example, some state’s choose VAMs that do not control for other variables such as student demographical background because they feel to do so makes an excuse for lower performance for low-socioeconomic students. That is a subjective value judgment on which VAM to use. Because of this subjective judgment, they aren’t perfectly objective. All VAM models aren't equal.

5. Remind yourself that most VAM studies find that teachers account for about 1 to 14 % of variability in test scores. This means that teachers may not have as much control over test scores as many of those using VAMs to determine teacher quality assume. In a perfect manufacturing system where teachers are responsible for churning out test scores, VAMs make sense. Our schools are far from perfect, and there are many, many things out there impacting scores. Teaching is not a manufacturing process nor will it ever be.

6. Remind yourself that should you use VAMs in a high stakes manner, you may actually decrease the quality of student learning and harm the climate of your school. Turning your school into a place where only test scores matter, where teaching to the test is everybody’s business is a real possibility should you place too much emphasis on VAM data. Schools who obsess about test scores aren't fun places for anybody, teachers or students. Balance views of VAM data as well as test data is important.

7. Remember that all VAM models are only as good as the data fed into them. In practical terms, remember the imperfect nature of all standardized tests as you discuss VAM data. Even though states don’t always acknowledge the limitations of their tests, that doesn’t mean you can’t. Keep the imperfect nature of tests and VAMs in mind always. Perhaps then, you want use data unfairly.

8. Remember that VAMs only predict performance on a single test. They do not tell you thing about the long-range impact of that teacher on student performance.

9. Finally, VAMs can have large standard errors. Without getting entangled in statistical lingo, just let it suffice to say that VAMs themselves are imperfect. Keep that in mind when reviewing the data with teachers.

The improper use of VAM data by school leaders can downright harm education. It can turn schools into places where in-depth learning matters less than test content. It can turn teaching into a scripted process of just covering the content. It can turn schools from places of high engagement, to places where no one really wants to be. School leaders can prevent that by keeping VAM data in proper perspective, as the "ASA Statement on Using Value-Added Models for Educational Assessment" does.

Saturday, June 14, 2014

Value-Added Measures and 'Consulting Chicken Entrails' for High-Stakes Decision-Making

“Like the magician who consults a chicken's entrails, many organizational decision makers insist that the facts and figures be examined before a policy decision is made, even though the statistics provide unreliable guides as to what is likely to happen in the future.” Gareth Morgan, Images of Organization: The Executive Edition

Could it be that using Value-added data is the equivalent of consulting “chicken-entrails” before making certain high-stakes decisions? With all the voodoo, wizardry, and hidden computations that educators are just supposed to accept on faith from companies crunching the data, value-added data might as well be “chicken entrails” and the “Wizards of VAM” might as well be high-priests or magicians reading those innards and making declarations of effectiveness and fortune telling. The problem, though, is value-added measures are prone to mistakes, despite those who say “it’s best we have.” Such reasoning itself smells of simply accepting its imperfections. One only need hold their nose, and take the medicine.

What President Obama, Arne Duncan, down through our own North Carolina state education leaders do not get is that Value-added measures simply are not transparent. If anyone reads any of the current literature on these statistical models, you immediately see many, many imperfections. There’s certainly enough errors of concern to argue that VAMs have zero place in making high-stakes decisions.

As the “Wizards of VAM” prepare to do their number crunching and “entrails reading” in North Carolina, we await their prognostications and declarations of “are we effective or ineffective?” Let’s hope it doesn’t smell too bad.

Thursday, April 10, 2014

Let the VAM Lawsuits Begin: Issues and Concerns with Their High-Stakes Use

Lawsuits against states using value-added models in making teaching evaluation decisions has begun in earnest. There are now three lawsuits underway challenging the use of this controversial statistical methodology and the use of test scores to determine teacher effectiveness. This increase in litigation is both an indication of how rapidly states have adopted the practice, and how these same states failed to address so many issues and concerns with the use of VAMs in this manner.

Two lawsuits have now been filed in Tennessee against the use of value-added  assessment, known as TVAAS as a part of teacher evaluation. The first lawsuit was filed against Knox County Schools in Tennessee by the Tennessee Education Association on behalf of an alternative school teacher who was denied a bonus because of her TVAAS ratings. (See “Tennessee Education Association Sues Knox County Schools Over Bonus Plan” ) In this case, the teacher was told she would receive system-wide TVAAS estimates because of her position at an alternative school, but 10 of her students were used anyway in her TVAAS score, resulting in a lower rating and no bonus. This lawsuit contests the arbitrariness of TVAAS estimates that use only a small number of teacher’s students to determine overall effectiveness.

In the second lawsuit, filed also against Knox County Schools, but also against Tennessee Governor Bill Haslam, state Commissioner of Education Kevin Huffman and the Knox County Board of Education, an eighth grade science teacher claims he was also denied a bonus unfairly after his TVAAS value-added rating was based on only 22 of his 142 students. (See “TEA Files Second Lawsuit Against KCS, Adds Haslam and Huffman as Defendents” ) Again, the lawsuit points to the arbitrariness of the TVAAS ratings.

A third lawsuit has been filed in Rochester, New York by the Rochester Teachers Association alleging that officials in that state “failed to adequately account for the effects of severe poverty, and as a result, unfairly penalized Rochester teachers on their Annual Professional Performance Review” or yearly teacher evaluations. (See “State Failed to Account for Poverty in Evaluations”). While it appears that this Rochester suit is disputing the use of growth score models not value-added, it also challenges the whole assumption and recent fad being pushed by politicians and policymakers of using test scores to evaluate teachers.

North Carolina jumped on the value-added bandwagon in response to US Department of Education coercion, and now the state uses its TVAAS version called EVAAS, or Educator Value Added Assessment System as part of teacher and principal evaluations. Fortunately, no districts have had to make high stakes decisions using the disputed measures so the lawsuit floodgate hasn't opened in our state yet, but I am sure once EVAAS is used to make decisions about employment, the lawsuits will begin. When those lawsuits begin, the American Statistical Association has perhaps outlined some areas of contention about the use of VAMs in educator evaluations in their ASA Statement on Using Value-Added Models for Educational AssessmentHere’s some points made by their position statement that clearly outlines the questions about the use of VAMs in teacher evaluations, a highly questionable statistical methodology.
  • VAMs (Value-added models) are complex statistical models, and high-level statistical expertise is needed to develop the models and interpret their results.” States choosing to use these models are trusting third-party vendors to develop them, provide the ratings, and they are expecting educators to effectively interpret those results. Obviously, there’s so much that can go wrong with the interpretation of VAM results, the ASA is warning that there is a need of people who have the expertise to interpret those results. I wonder how many of these states who have implemented these models have spent time and money training teachers and administrators to interpret these results, other than subjecting educators to one-time webinars or "sit-n-gets"?
  • “Estimates of VAM should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. THESE LIMITATIONS ARE PARTICULARLY RELEVANT IF VAMS ARE USED FOR HIGH STAKES PURPOSES (Emphasis Mine).” I can’t speak for other states, but in North Carolina there has been little to no disclosure or discussion about the limitations of value-added data. There’s been more public relations, advertising, and promotion of the methodology as a new way of evaluating educators. They even have SAS promoting the methodology for them.The Obama administration has done this as well. The attitude in North Carolina seems to be, “We’re gonna evaluate teachers this way, so deal with it.” There needs to be discussion and disclosure about SAS’s EVAAS model and the whole process of using tests to evaluate teachers in North Carolina. Sadly, that’s missing. I can bet it’s the same in other states too.
  • VAMs are generally based on standardized test scores, and do not directly measure potential teacher contributions toward other student outcomes.” In other words, VAMs only tell you how students do on standardized tests. They can’t tell you all the other many, many ways teachers contribute to students’ lives. The main underlying assumption with using VAMs in teacher evaluations is that only test scores matter, regardless of what supporting policymakers say. While its true that the North Carolina Evaluation model does include other standards, how long will it take administrators and policymakers to ignore those standards and zero in on test scores because they are seen as the most important? The adage, "What gets tested, gets taught!" is true and "What get's emphasized the most through media and promotion, matters the most" is also equally true. When standard 6 or 8 is the only standard on the educator evaluation where an educator is "In Need of Improvement" then you can bet test scores suddenly matter more than anything else.
  • “VAMs typically measure correlation, not causation: Effects---positive or negative---attributed to a teacher may actually be caused by other factors that are not captured in the model.” There are certainly many, many things----poverty, lack of breakfast, runny noses---that can contribute to a student’s test score, yet there’s a belief that a teacher directly causes a test score to happen, especially by those pushing VAMs in teacher evaluations. The biggest assumption by those promoting VAMs in teacher evaluations is that the teacher's sole job or part of their job is the production of test scores. In reality, teaching is so much more complex than that, and those reducing it to a test score have probably not spent much time teaching themselves.
  • “Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of the opportunities for quality improvement are found in system-level conditions.” Yet in most states, educational improvement falls almost entirely on the backs of educators in the schools in the form of VAM-Powered Teacher Evaluations. There's little effort to improve the system. There’s no effort to improve classroom working conditions, provide professional development funding/resources, adequate material/resource funding. Instead of looking at how the system prevents excellence and innovation with its top-down mandates and many other ineffective measures, many states, including North Carolina and the Obama administration place accountability entirely and squarely on the backs of educators in the classrooms and schools. If the education system is broken, you don't focus on parts, you improve the whole.
  • “Ranking teachers by their VAM scores can have unintended consequences that reduce quality.” If all learning that is important can be reduced to a one-time administered-bubble-sheet test, then all is well for VAM and the ranking of teachers. But every educator knows that tests measure only a minuscule portion of important learning. Many important learning experiences can't even be measured by tests. But, if you elevate tests in a high stakes manner, then those results become the most important outcome of the school and the classroom. The end result is teaching to the test and test-prep where the test becomes the curriculum. Getting high test scores becomes the goal of teaching. If that’s the goal of teaching, who would want to be teacher? Elevating test scores through VAM only will escalate the exit of teachers from the profession and discourage others from entering it. because there's nothing fulfilling about improving student test scores. We didn't become educators to raise test scores; we became educators because we wanted to teach kids.
  • “The measure of student achievement is typically a score on a standardized test, and VAMs are only as good as the data fed into them.” Ultimately, VAMs are only as good as the tests administered to provide the data that feeds the model. If tests don’t adequately measure the content, or if they are not standardized or otherwise of high quality, then the VAM estimates are equally of dubious quality. When states try to scramble to create tests on the fly and do not develop quality tests, then the VAM estimates are of dubious quality too. North Carolina scrambled to create multiple tests in many high school, middle and elementary subjects just to have data to feed their EVAAS model. Yet, those tests and the process of their creation and field testing, even how they’re administered makes them questionable candidates for serious VAM use. VAMs require high-quality data to provide high-quality estimates. The idea that "any-old-test-will-do" is an anathema to VAMs which require quality test data.
The American Statistical Association position statement on using value-added models in educational assessment makes some supporting statements about their use too. They can be effectively used as part of the data teachers use to adjust classroom teaching. But when a state does not return those scores until October or later, its impossible to use that data to inform teaching, three months into the school year. Also, just getting a rating does little to inform teaching. Testing provides an opportunity for policymakers to provide teachers with valuable data to improve teaching. Sadly, the current data provided is too little and too late.

As the VAM-fed teacher evaluation fad and craze continues and grows, it is important for all educators to inform themselves about the controversial statistical practice. It is not a methodology without issues despite what the Obama administration and state education leaders say. Being knowledgeable about it means understanding its limitations as well as how to properly interpret and use such data. Don't wait for states and the federal government to provide that information: They are too busy promoting its use. The points made in the American Statistical Association’s Statement on Using Value-Added Models for Educational Assessment are excellent points of entry for learning more.

Saturday, January 4, 2014

More About Using Voodoo Value-Added Measures to Determine Teacher Candidate Quality

Two days ago, I posted about Teacher Match and Hanover Research, two companies that are now using value-added statistical modeling to predict the effectiveness of prospective teachers' abilities to raise test scores. ("Using Statistical Models to Predict Future Effectiveness of Teacher Candidates: A Snake Oil Approach") As I pointed out, there are some major flaws, especially flaws in the assumptions about teaching, in using this approach as even a part of a new teacher candidate selection process. Here's some more thoughts on this heinous practice:

1. It elevates standardized testing even higher in the decision-making processes for schools. This is using imperfect assessments to make decisions about whether a new teacher can raise test scores. States haven't done the validation studies to prove that the assumptions they make based on scores are valid. States develop tests on the cheap, or they purchase ready-made tests of questionable validity, and that were not designed for the purposes for which they are using them. Tests do not deserve this level of emphasis. This practice, by default, views raising test scores as the goal of good teaching.

2. It makes the hiring processes of schools and districts even more mysterious. In one district where Teacher Match is used, a source inside that district reported they have a major decrease in applicants because new candidates were being asked to submit to this mysterious process before being hired. I suspect this would be a big problem with any of these kinds of products. Besides, who wants to go into teaching to become the best test-score raiser in the business? These voodoo products will only make it harder to find teaching candidates not easier. It gives teachers the wrong message up front: your primary job is to raise test scores.

3. It is just another expensive drain on already short educational resources. One district contract with Teacher Match showed a district paying well over $30,000 per year for the service. In tight budgetary times when teachers are spending hundreds of dollars on their own school supplies, it amazes me that, morally, a district could justify spending this kind of money on a statistical gimmick. Districts are throwing more and more money into these statistical quackery schemes, when there are so many other pressing needs.

4. School districts, as I have witnessed many, many times in my 24 years as an educator, are purchasing products like Teacher Match, based entirely on the promises and marketing of the companies. Instead of accepting their word that their product will do what they say it will, they need to be forced to produce independent, peer-reviewed research. If they can't produce those studies, tell them to come back when they can. And, because I am not a firm believer that high test scores equals good teaching, they need to use measures other than test scores to prove their product is effective.

5. The fact that companies like Teacher Match and Hanover Research even exist in the education industry now is due to the Obama administrations' insistence of elevating test scores importance in everything a school does. This legacy will leave public education in worse shape than George Bush's No Child Left Behind. Arne Duncan and his Department of Education believe that data is data and any old data will do as long as it is "objective." This shows immediately that he and his cohorts do not have a clue about education. When you have non-educators like Duncan and half his Department of Education, you get these kind of detrimental approaches to education.

6. A major assumption behind Teacher Match and other statistical quackery products like it, is that schools can be operated like businesses, where their business is churning out high test scores. This assumption about public education is wrong. Because of current federal policy, public schools are being viewed even more like a business whose product is high test scores. That might be acceptable if your goal as an education system is to produce "high-quality test takers." What the education policy of President Obama and Arne Duncan is doing is destroying the culture of public education, test score by test score.

Teacher Match's Educator's Professional Inventory and Hanover Research's Paragon K12 are the latest in value-added voodoo products to be peddled to school districts. They will only serve to elevate the importance of test scores even higher than they already are. Districts even thinking about purchasing this snake oil should be ashamed of wasting limited education money on such products. There comes a time when you have to realize statistics aren't going to tell you everything what really need to know. Not everything can be reduced to numbers subject to statistical analysis. My fear is that some administrators who see test scores as the sole goal of their school are going to use this data to as the only basis of hiring someone. Can you imagine a profession where whether you can produce high test scores determines your entry, and whether you can keep producing those high test scores determine whether you can stay? That folks, is a factory model of educational delivery if I have ever heard of one!

Friday, December 20, 2013

Is EVAAS a 'Clear Path to Global Ed Excellence' or Product of Grandiose Marketing?

According to a recent post by Audrey Amrein-Beardsley on her blog VAMBoozled! "VAMs (Value-added measures) have been used in Tennessee for more than 20 years" and that they are the brainchild of William Sanders, who was an agricultural statistician/adjunct professor at the University of Knoxville when introduced. Sanders simply thought, according to Amrein-Beardsley, "that educators struggling with student achievement in the state could simply use more advanced statistics, similar to those used when modeling genetic reproductive trends among cattle, to measure growth, hold teachers accountable for that growth, and solve educational measurement woes facing the state at that time."

Sanders went on to develop the TVAAS (Tennessee Value-Added Assessment System) that later became EVAAS (Education Value-Added Assessment System) which is now owned and marketed by SAS Institute in North Carolina. Today, SAS EVAAS is the "most widely adopted and used, and likely the most controversial VAM in the country" according to Amrein-Beardsley. According to her post "What's Happening in Tennessee?" these are some of the lesser known and controversial aspects of SAS's EVAAS:

  • "It is a proprietary model (costly and used/marketed under the exclusive legal rights of the inventors/operators.)" EVAAS is the property of a private company whose responsibility is to profits, not necessarily to what's good for kids or teachers. Four states, Tennessee, North Carolina, Ohio, and Pennsylvania, pay millions for the ability to use this Value-added model.
  • EVAAS is "akin to a 'black box' model. It is protected by SAS with a great deal of secrecy and total lack of transparency. This model has not been independently validated, and Sanders has never allowed access for others to independently validate the model.
  • "The SAS EVAAS web site developers continue to make grandiose marketing claims without much caution or any research evidence to support these claims. 
  • "VAMs have been pushed  on American public schools by the Obama Administration and Race to the Top."
  • SAS makes this marketing claim on their web site: "Effectively implemented, SAS EVAAS for K-12 allows educators to recognize progress and growth over time, and provides a clear path to achieve the US goal to lead the world in college completion by the year 2020."
There's no doubt that EVAAS or some other VAM product has been foisted on states and school districts by direct mandate by the Obama administration. It is also true that one could argue that EVAAS is a "black-box" model. It hasn't been independently studied and the inferences our state is making using this model have not been independently validated. SAS keeps the model hidden behind claims of proprietary ownership.

Finally, are the marketing claims grandiose as Amrein-Beardsley indicates? I would have to agree that the claim that "EVAAS is a clear path to achieve the US goal to lead the world in college completion by the year 2020" is pretty out there. On what research do they make that claim? What studies have they used to validate that claim? No research studies are provided. The SAS web site does employ a number of statements that do not offer any supporting research. But, then again, its about "marketing" a product, not about making a case for its validity. But the problem, is, SAS does not make those research-based claims anywhere else either.

But I set aside the concerns about the technical aspects of the model. For me, the whole problem behind EVAAS is that it elevates test scores to a level they do not deserve. North Carolina's state testing system is haphazardly assembled, and is far from being trustworthy enough to base any kind of high stakes decisions upon. I also fundamentally find something a bit inequitable in using EVAAS to determine any kind of rating for educators. I do think educators deserve to understand how those ratings are derived, down to the decimal points and computations. If the formula can't be explained so that educators can understand all aspects of it, it has no place in evaluations. 

But it seems there are issues surfacing in the birthplace of EVAAS. Interestingly, Amrein-Beardsley points out that Tennessee is having some trouble with its use. School boards across the state are increasingly opposing the use of TVAAS in high stakes decisions. Some of the reasons? According to Amrein-Beardsley:
  • TVAAS is too complex to understand.
  • Teachers' scores are highly and unacceptably inconsistent from one year to the next which makes them invalid.
  • Teachers are being held accountable for things that are out of their control, such as what happens to students outside the school building.
North Carolina has jumped on the VAM bandwagon and is holding on for dear life. To make the whole system work, our state has implemented the largest number of state tests in history. Let's just hope all this emphasis on test scores doesn't destroy our schools. I certainly hope we don't have to live with this for 20 years!

Wednesday, November 27, 2013

Misplaced Faith in Value-Added Measures for Teacher Evaluations

Due to Race to the Top and the No Child Left Behind waivers, 41 states have now elected to use Value-Added Measures or VAMs as a part of teacher evaluations. This is done, without regard to the limitations these statistical models have and without any supporting research that says doing so will increase student achievement. What are those limitations? In a recent post, the authors of Vamboozled, provided this post entitled  "Top Ten Bits of VAMmuniton" that educators can use to defend themselves with research-based data against this massive non-research-based shift toward a model of teacher evaluation that will most likely do more to damage education than No Child Left Behind or any other education "reforms" of modern times.

I recently uncovered a journal article entitled "Sentinels Guarding the Grail: Value-Added Measurement and the Quest for Education Reform." which describes a rhetorical study by Rachel Gabriel and Jessica Nina Lester which examined the discourse during a meeting of the Tennessee Teacher Evaluation Advisory or TEAC from March 2010 through April 2011. TEAC was a 15 member panel appointed by the governor of Tennessee to develop a new teacher evaluation policy. The authors of this study examined the language used by those on this panel as they deliberated through the various components of a teacher evaluation policy.

What is interesting about this study is that the language employed by those in this meeting betray some important assumptions and beliefs about teaching, learning, testing, and value-added measures that aren't entirely supported by research or common sense.

According to Gabriel and Lester, Value Added Measurement became a sort of "Sentinel of Trust" and sort of a "Holy Grail" in measuring teacher effectiveness during these meetings in spite of all the research and literature that points to its limitations. According to the author's of this study, here's some of the assumptions those in this TEAC meeting demonstrated through the language they used:

1) Value-added measures alone defines effectiveness.
2) Value-added measures are the only "objective" option.
3) Concerns about Value added measures are minimal and not worthy of consideration.

As far as I can see, there is enormous danger when those making education policy buy into these three mistaken assumptions about value added measures.

First of all, VAMs do not alone define effectiveness. They are based on imperfect tests and often a single score collected at one point in time. Tests can't possibly carry out the role of defining teacher effectiveness because no test is even capable of capturing all that students learn. Of course, if you believe by faith that test scores alone equal student achievement, then sure, VAMs are the "objective salvation" you've been waiting for. However, those of us who have spent a great deal of time in schools and classrooms know tests hardly deserve such an exalted position.

Secondly, even value added measures are not as objective as those who push them would like to be. For example, the selection of which value added model to use is riddled with subjective judgements. Which factors to include and exclude from the model is a subjective judgment too. Choices of how to rate teachers using these requires subjective judgment as well, not to mention that VAMs are not entirely based on "objective tests" either. All the decisions surrounding their development, implementation and use require subjective judgment based on values and beliefs. There is nothing totally objective about VAMs. About the only objective number that results from value-added measures is the amount of money states pay consulting and data firms to generate them.

Finally, those who support value added measures often just dismiss concerns about the measures as not a real problem. They use the argument that VAMs are the "best measures" we've got currently as flawed as they are. Now that's some kind of argument! Suppose I was your surgeon, and used "tapping on your head" to decide whether to operate for a brain tumor because "tapping" was the best tool I've got? The whole 'its-the-best-we-have' argument does not negate the many flaws and issues and the potential harm using value-added measures have. Instead of dismissing the issues and concerns about VAMs, those who advocate for their use in teacher evaluations need to address every concern. They need to be willing to acknowledge the limitations, not simply discard them.

I offer one major, final caution to my fellow teachers and school leaders: it is time to begin really asking the tough difficult questions about the use of VAMs in evaluations. I strongly suggest that we learn all we can about the methodology. If anyone uses the phrase, "Well, it's too difficult to explain" we need to demand that they explain anyway. Just because something looks complicated does not mean its effective. Sometimes we as educators are too easily dazzled by the "complicated" anyway. The burden is on those who support these measures to adequately explain them and to support their use with peer-reviewed research, not company white-papers and studies by those who developed the measures in the first place.