0% found this document useful (0 votes)
34 views

Standard Setting What Is It Why Is It Important

Grupo de estándares

Uploaded by

laucan24
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Standard Setting What Is It Why Is It Important

Grupo de estándares

Uploaded by

laucan24
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

No.

7 • October 2008

Standard Setting: What Is It? Why Is It Important?


By Isaac I. Bejar In tests used for certification and licensing
purposes, test takers are typically classified into

S
tandard setting is a critical part of two categories: those who “pass”—that is, those
educational, licensing, and certification who score at or above the cutscore—and those
testing. But outside of the cadre of who “fail.”
practitioners, this aspect of test development is
not well understood. These types of tests, therefore, require a single
cutscore. In tests of educational progress, such
Standard setting is the methodology used to as those required under the No Child Left
define levels of achievement or proficiency and Behind Act (NCLB), students are typically
the cutscores corresponding to those levels. A classified into one of three or four achievement
cutscore is simply the score that serves to levels, such as below basic, basic, proficient,
classify the students whose and advanced (United States
score is below the cutscore Congress, 2001). As a result,
into one level and the students Unless the cutscores are with four achievement levels,
whose score is at or above the appropriately set, the three cutscores need to be
cutscore into the next and determined. 1
higher level.
results of the assessment
could come into question. In a K-12 context, decisions
Clearly, unless the cutscores based on cutscores affect not
are appropriately set, the only individual students, but
results of the assessment could come into also the educational system.
question. For that reason, standard setting is a
critical component of the test development In the latter case, group test results are
process. summarized at the school, district, or state level
to determine the proportion of students in each
This brief article does not address the proficiency category. As part of NCLB
technicalities of the process, for which readers legislation, for example, a school’s progress
can consult several references (Cizek & Bunch, toward educational goals is expressed as the
2007; Hambleton & Pitoniak, 2006; Zieky, proportion of students classified as proficient.
Perie, & Livingston, 2008). Instead, this article
illustrates the importance of standard setting So, how do we know if the cutscores for a
with reference to accountability testing in K-12 given assessment are set appropriately? The
and suggests that some of the questions that
have emerged concerning standard setting in that 1
NCLB is an example of standards-based reform. It differs
context can be addressed by considering significantly from previous attempts at educational reform
standard setting as an integral aspect of the test characterized by “minimum competency.” According to Linn and
Gronlund (2000) standards-based reform is characterized by the
development process, which has not been adoption of ambitious educational goals; the use of forms of
standard practice in the past. assessment that emphasize extended responses, rather than only
multiple-choice testing; making schools accountable for student
achievement; and, finally, including all students in the assessment.

www.ets.org
R&D Connections • No. 7 • October 2008

“right” cutscores should be both consistent with The study by Braun and Qian (2007) used the
the intended educational policy and National Assessment of Educational Progress
psychometrically sound. (NAEP 2 ) as the common yardstick for
comparing states’ proportions of students
The Standards for Educational and classified into the different levels of reading and
Psychological Testing (American Educational mathematics proficiency against the NAEP
Research Association, American Psychological results for each state.
Association, & American Council on
Measurement in Education, 1999) suggest several NAEP covers reading and mathematics, just as
soundness criteria, such as: “When proposed score all states do with their NCLB tests, but NAEP
interpretations involve one or more cutscores, the has its own definition of proficiency levels and
rationale and procedures used for establishing its own approach to assessing reading and
cutscores should be clearly documented” (p. 59). mathematics, which differs from each state’s
The accompanying comment further states that own approach. For example, NAEP includes a
“Adequate precision in regions of score scales significant portion of items requiring constructed
where cut points are established is prerequisite to responses—that is, test questions that require
reliable classification of examinees into test takers to supply their own answers, such as
categories.” (p. 59). essays or fill-in-the-blank answers, rather than
choosing from standard multiple-choice options.
Differing State Policies
Nevertheless, NAEP provides as close as we
A further criterion in judging the meaning of can get to a common yardstick by virtue of the
the different classifications, especially the fact that a representative sample of students
designation of proficient, involves an audit or from each state participates in the NAEP
comparison with an external test (Koretz, 2006). assessment.
Two recent reports (Braun & Qian, 2007;
Cronin, Dahlin, Adkins, & Kingsbury, 2007) The conclusion in the Braun and Qian (2007)
took that approach by examining proficiency and Cronin et al. (2007) reports was that the
levels across states against a national differences in the levels of achievement across
benchmark. Both studies found that states states seemed to be a function of each state’s
differed markedly in the proportion of students definition of proficiency—that is, the specific
designated as proficient. cutscores they each used to define achievement
levels. The differences in levels of achievement
Is one state’s educational system really that were not necessarily due to variability in the
much better than the other? It is difficult to say quality of educational systems from state to
by simply looking at the proportions of students state.
classified as proficient because each state is free
to design its own test and arrive at its own In short, standard setting matters: It is not
definition of proficient through its own standard- simply a methodological procedure but rather an
setting process. opportunity to incorporate educational policy
into a state’s assessment system. Ideally, the
However, by comparing the results of each standard-setting process elicits educational
state against a common, related, nationwide policy and incorporates it into the test
assessment, it is possible to judge whether the development process to ensure that the cutscores
variability in states’ proportions of proficient that a test eventually produces not only reflect a
students is due to some states having better or state’s policy but also are well-supported
worse educational systems rather than being due psychometrically.
to the states inadvertently applying different
standards.
2
http://nces.ed.gov/nationsreportcard/

2 Copyright © 2008 by Educational Testing Service. All rights reserved. ETS, the ETS logo and LISTENING. LEARNING. LEADING. are registered trademarks of Educational Testing Service (ETS) www.ets.org
in the United States of America and other countries throughout the world.
R&D Connections • No. 7 • October 2008

Cutscores that do not represent intended (Bejar, Braun, & Tannenbaum, 2007). In fact,
policy or do not yield reliable classifications of Cizek and Bunch (2007, p. 247) proposed that
students can have significant repercussions for “standard setting be made an integral part of
students and their families; fallible student-level planning for test development.”
classifications can provide an inaccurate sense
of an educational system’s quality and the The integration of standard setting into the test
progress it is making towards educating its development process becomes more crucial in
students. light of NCLB. 3 As part of NCLB legislation,
schools test adjacent grades every year. Because
the legislation calls for all students to reach the
Setting Standards
level of proficient by 2014, inferences about the
As mentioned earlier, the standard setting proportion of students in different achievement
process has been well documented in several categories in adjacent grades, or in the same
sources (Cizek & Bunch, grade in subsequent years,
2007; Hambleton & We cannot expect a test are inevitable because they
Pitoniak, 2006; Zieky et al., are prima facie evidence
2008). In this section, we that does not cover the about the progress, or lack
emphasize the relationship appropriate content or of progress, the educational
of the standard setting system is making towards
process to test is not at the appropriate the 2014 goal. More likely
development. level of difficulty to lead to than not, there will be
While setting standards appropriate decisions— variability in the rates of
appropriately is critical to proficiency in adjacent
making sound student- and
regardless of how the grades.
policy-level decisions, it is process of setting cutscores For example, one
equally important that the is carried out. explanation for the
content of the test and its variability in observed
difficulty level be achievement levels across
appropriate for the grades is that the standards
decisions to be made based on the test results. across grades are not comparable. The cutscores
We cannot expect a test that does not cover the that define a proficient student in two adjacent
appropriate content or is not at the appropriate grades could, inadvertently, not be equally
level of difficulty to lead to appropriate demanding.
decisions—regardless of how the process of
setting cutscores is carried out. This can occur if the standard-setting process
for each grade is done in isolation without taking
Producing a test that targets content and the opportunity to align the results across grades
difficulty toward the decisions to be made (see Perie, 2006, for an approach to the
requires that item writers have a strong working problem.) Similarly, failure to make the scores
understanding of those decisions. When themselves comparable across years could
developers design a test in this fashion, it is generate variability in the proportion of students
more likely that the cutscores will lead to classified as proficient (Fitzpatrick, 2008).
meaningful and psychometrically sound
categorizations of students.
This means, however, that standard setting 3
In light of the upcoming national elections in the United States, it
must be done in concert with the test will be necessary to monitor how federal educational policy will
evolve, but there is reason to believe standard setting will continue
development process and not be treated as a last to be part of the American educational landscape (Ryan &
or separate step independent of the process Shepard, 2008, p. xii).

www.ets.org Copyright © 2008 by Educational Testing Service. All rights reserved. ETS, the ETS logo and LISTENING. LEARNING. LEADING. are registered trademarks of Educational Testing Service (ETS)
3
in the United States of America and other countries throughout the world.
R&D Connections • No. 7 • October 2008

An alternative explanation for the variability setting is likely to continue to play a critical role
in the different rates of achievement across in the future.
grades is that higher proportions of students
classified as proficient do, in fact, accurately Conclusions
reflect a better or improving quality of education Standard setting should be seen as a critical
in some grades. To reach that conclusion, it is aspect of the test development process best
necessary to rule out the first explanation. carried out in concert with all other aspects of
However, compensating for incomparable the development process. Far from being a
standards is a complex and unfamiliar process. purely methodological process, standard setting
In contrast, the process of equating—making ideally involves policy makers, test developers
scores from different forms comparable and measurement specialists early on to ensure
(Holland & Dorans, 2006)—has a long history. that the test results will be useful and defensible.

The process of compensating for differing References


standards is unfamiliar because
American Educational Research Association,
psychometricians in the United States had not
American Psychological Association, &
dealt with the issue prior to NCLB legislation,
American Council on Measurement in
which introduced the testing of students in
Education. (1999). Standards for
adjacent grades. Since that time, the field of
educational and psychological testing.
educational measurement has recognized the
Washington, DC: American Psychological
issue and proposed solutions (e.g., Lissitz &
Association.
Huyhn, 2003).
Bejar, I. I., Braun, H., & Tannenbaum, R.
However, attempting to compensate for
(2007). A prospective, predictive and
incomparable standards after the fact—that is, as
progressive approach to standard setting.
a prior step to releasing results—risks the
In R. W. Lissitz (Ed.), Assessing and
possibility that satisfactory compensation may
not be feasible. For that reason, it would be modeling cognitive development in
school: Intellectual growth and standard
preferable to develop the assessments for
setting (pp. 1-30). Maple Grove: MN: Jam
different grades with comparable standards
Press.
across grades as an explicit criterion from the
start to avoid the problem as much as possible. Betebenner, D. W. (2008). Toward a normative
understanding of student growth. In K. E.
The foregoing complexities have motivated
Ryan & L. A. Shepard (Eds.), The future
the formulation of alternative accountability
models, as has a general dissatisfaction with the of test-based educational accountability
(pp. 155-170). New York: Routledge.
“status model” approach to accountability
promoted by NCLB (Linn, 2005). Under this Braun, H. I., & Qian, J. (2007). An enhanced
current accountability model, results at a single method for mapping state standards onto
point in time are the basis for decisions as to the NAEP scale. In N. J. Dorans, M.
whether a school is making adequate progress. Pommerich & P. W. Holland (Eds.),
Linking and aligning scores and scales
Several states have proposed alternative
“growth models” (U.S. Department of (pp. 313-338). New York: Springer.
Education, 2005). An analysis of the different
features is available (Dunn & Allen, 2008). If
growth models go forward, standard setting will
be equally relevant to decide what growth rate is
adequate (Betebenner, 2008). In short, standard

4 Copyright © 2008 by Educational Testing Service. All rights reserved. ETS, the ETS logo and LISTENING. LEARNING. LEADING. are registered trademarks of Educational Testing Service (ETS) www.ets.org
in the United States of America and other countries throughout the world.
R&D Connections • No. 7 • October 2008

Cizek, G. J., & Bunch, M. B. (2007). Standard Linn, R. L., & Gronlund, N. E. (2000).
setting: A guide to establishing and Measurement and assessment in teaching
evaluating performance standards on (8th ed.). Upper Saddle River, NJ:
tests. Thousand Oaks, CA: Sage. Prentice-Hall.
Cronin, J., Dahlin, M., Adkins, D., & Lissitz, R. W., & Huynh, H. (2003). Vertical
Kingsbury, G. G. (2007). The proficiency equating for state assessments: Issues and
illusion. Retrieved September 20, 2008, solutions in determining adequate yearly
from the Thomas B. Fordham Institute progress and school accountability.
Web site: http://edexcellence.net/doc/ Practical Assessment, Research, &
The_Proficiency_Illusion.pdf Evaluation, 8. Retrieved September 20,
2008, from http://pareonline.net/
Dunn, J. L., & Allen, J. (2008, March). The getvn.asp?v=8&n=10
interaction of measurement, model, and
accountability: What are the NCLB Perie, M. (2006). Convening an articulation
growth models measuring? Paper panel after a standard setting meeting: A
presented at the annual meeting of the how-to guide. Retrieved September 20,
National Council on Measurement in 2008, from the Center for Assessment
Education, New York. Web site: http://www.nciea.org/
publications/ RecommendforArticulation_
Fitzpatrick, A. R. (2008, March). The impact of MAP06.pdf
anchor test configuration on student
proficiency rates. Paper presented at the Ryan, K. E., & Shepard, L. A. (Eds.). (2008).
annual meeting of the National Council on The future of test-based educational
Measurement in Education, New York. accountability. New York: Routledge.
Hambleton, R. K., & Pitoniak, M. (2006). U.S. Department of Education. (2005). Secretary
Setting performance standards. In R. L. Spellings announces growth model pilot
Brennan (Ed.), Educational measurement study [Press release]. Washington, DC:
(4th ed., pp. 433-470). Westport, CT: Author. Retrieved September 20, 2008
Praeger. from http://www.ed.gov/news/
pressreleases/2005/11/11182005.html
Holland, P., & Dorans, N. (2006). Linking and
equating. In R. L. Brennan (Ed.), United States Congress. (2001). No Child Left
Educational measurement (4th ed., pp. Behind Act of 2001: Conference report to
187-220). Westport, CT: Praeger. accompany H.R. 1, report 107-334.
Washington, DC: Government Printing
Koretz, D. (2006). Testing for accountability in Office.
K-12. In R. Brennan (Ed.), Educational
measurement (4th ed., pp. 531-578). Zieky, M. J., Perie, M., & Livingston, S. (2008).
Westport, CT: Praeger. Cutscores: A manual for setting standards
of performance on educational and
Linn, R. L. (2005). Conflicting demands of No occupational tests. Available from
Child Left Behind and state systems: http://www.amazon.com/Cutscores-
Mixed messages about school Standards-Performance-Educational-
performance. Education Policy Analysis Occupational/dp/1438250304/
Archives, 13.
Retrieved September 20, 2008, from
http://epaa.asu.edu/epaa/v13n33/

www.ets.org Copyright © 2008 by Educational Testing Service. All rights reserved. ETS, the ETS logo and LISTENING. LEARNING. LEADING. are registered trademarks of Educational Testing Service (ETS)
5
in the United States of America and other countries throughout the world.
R&D Connections • No. 7 • October 2008

Acknowledgements
I am grateful to Rick Tannenbaum, Mike
Zieky, and, especially, to Dan Eignor. Their
comments, I believe, have improved the article.
Of course, I’m solely responsible for any
remaining problems.

R&D Connections is published by


ETS Research & Development
Educational Testing Service
Rosedale Road, 19-T
Princeton, NJ 08541-0001
Send comments about this publication to the above address or via
the Web at:
http://www.ets.org/research/contact.html
Copyright © 2008 by Educational Testing Service. All rights reserved.
Educational Testing Service is an Affirmative Action/Equal
Opportunity Employer.

ETS, the ETS logo, and LISTENING. LEARNING. LEADING. are


registered trademarks of Educational Testing Service (ETS).

6 Copyright © 2008 by Educational Testing Service. All rights reserved. ETS, the ETS logo and LISTENING. LEARNING. LEADING. are registered trademarks of Educational Testing Service (ETS) www.ets.org
in the United States of America and other countries throughout the world.

You might also like