Testing in The Classroom and Its Effectiveness in Predicting Student Achievement and Understanding
Testing in The Classroom and Its Effectiveness in Predicting Student Achievement and Understanding
Ecaroh Jackson
Abstract
This paper explores the validity of summative assessments in K-12 classrooms. It includes a
discourse that branches off of the familiar standardized testing conversation. In many ways,
standardized testing and classroom assessments are similar, but the sheer frequency of classroom
assessments is enough to make it a focal point deserving of its own research. While summative
assessments aren’t usually utilized in determining accountability through the state, they can be
used for school or district level monitoring. Considering the flexibility allowed when testing
students informally, data received from these tests may or may not be as legitimate as it could be.
Additionally, other factors such as dishonesty, test anxiety, and human error are common when
testing in the classroom environment. This paper examines the test scores of students in a high
achieving eighth-grade class and compares those results to their performances in class.
Through the years, testing in schools has been subject to praise and criticism that has
molded the current testing model. Although criticism for testing is probably at an all-time high,
testing has never been more prevalent. Starting in the third grade, students are subjected to
standardized testing that holds the key to many of their futures. Additionally, because of the No
Child Left Behind Act, students are being tested for classroom placement, disabilities, and
general performance. High school students rely on ACT and SAT scores to land them a position
at the top universities and undergraduate students focus on acing the GRE for admission into
graduate programs. Then, after all of this time spent in school, students must test again to gain
certification in their specific fields. Countless studies have been done regarding the validity of
standardized testing scores, and while this aspect of testing is a crucial focal point of testing in
schools, it is also important to consider the testing involved in classrooms. This testing involves
summative assessments ranging from unit tests to benchmarks and are given more frequently
than standardized testing. Similar to standardized testing, summative testing in classrooms may
not be as indicative of student success and understanding as once thought and should be viewed
more critically.
Background
The students involved in this study are 8th grade students taking Algebra I at the high
school level. They were individually picked for this class by their Algebra I teacher who
observed them in the previous grades to see if they had the skills needed to make the jump from
7th grade math to Algebra I. The demographics of the class somewhat resemble those of the
school with the school being 42.7% White, 30.2% African American, 24.7% Hispanic, and 2.4%
TESTING IN THE CLASSROOM 4
Two or More Races (Murphy & Daniel, 2015). The 8th grade Algebra I class is 50% White, 42%
Black, and 8% Hispanic (Murphy & Daniel, 2015). The class logistics are slightly skewed
because of the small sample size. This was unavoidable since the secondary school, grades 6th-
12th, only consists of 255 students, with an average of 8 students per teacher (Murphy & Daniel,
2015).
During this study, I had the opportunity to observe the twelve students twice a day: once
in science and once in math. The data presented was entirely taken from the math class, but the
anecdotal evidence was gathered from observations garnered throughout the day.
Methodology
Two types of measurements were taken over the course of twenty-four weeks. For the
purpose of analysis, the data collected was separated into four, six week categories. Summative
assessments, which included unit tests, were tracked and formative assessments, consisting of
homework, worksheets, and quizzes, were documented. The data was compiled at the
conclusion of the study and used to identify trends pertaining to testing and achievement.
Additionally, the students’ test anxiety, dishonesty, and likeliness to make minute ‘human errors’
was evaluated. To determine the students’ levels of test anxiety, they were given a questionnaire
created by Nist and Diehl (see Appendix A for the questionnaire given). I created three
categories for human error mistakes and dishonesty instances – Low, Medium, and High. I
defined human error mistakes as mistakes due to a calculation or transfer error. This does not
include mistakes that were due to a lack of understanding. Low meant that a student rarely (less
than 10 instances) made human error mistakes. Medium included 10 to 19 instances, and high
signified 20 or more human error mistakes. These values were derived from the students’ turned
in work and from work that was done in class that I assisted with. This same scale was used for
TESTING IN THE CLASSROOM 5
the dishonesty instances. Dishonesty instances in this study are defined as instances where a
sheet of homework was copied or each instance where the student was caught cheating or
Results
Discussion
The overall average is higher than the overall assessment average because bonus points
are given for signed progress reports and the lowest two minor grades each six weeks are
dropped. Additionally, students have a binder check each six weeks which serves as a major
grade; therefore, as long as they take their notes, they will receive a 100 as a major grade. The
overall formative assessment average was 4.65625 points higher than the overall summative
assessment average. This means that the students consistently performed better on their
homework assignments than on their tests. The increase in performance could be attributed to
TESTING IN THE CLASSROOM 6
multiple factors including that the lowest two homework grades are dropped each 6 weeks,
homework is easier to cheat on than tests, and we help the students frequently when they do their
homework. For the few students that had higher summative assessment averages than formative
assessment averages, the discrepancy can most likely be attributed to the students not turning in
their assignments or turning them in partially finished. Another interesting aspect of this data is
that students with the most instances of dishonesty tended to have the lower formative
assessment averages. This is intriguing since it would seem as if the students that tended to cheat
or attempted to cheat still received lower grades on their assignments than their peers who did
not cheat.
I was slightly shocked when I discovered the substantial difference between the students’
major and minor grade averages. The students receive an ample amount of help when taking
their tests and are able to ask an unlimited amount of questions. Although not all of the students
take advantage of our willingness to help, the majority do seek out assistance when in need. This
is where the human error aspect comes in. The students who frequently made minute errors that
caused them to miss questions rarely asked questions about those specific problems because they
had done the steps correctly and were confident that they hadn’t made any mistakes. Therefore,
it was not their understanding that was lacking, but more so their inattentive nature.
The text anxiety survey I administered to the students was brief so that I would get
genuine answers from the students. I had students rank in all three categories of test anxiety
(low, medium, high). There was no correlation between students’ test anxiety scores and their
summative assessment scores. Only one student scored high on the test anxiety questionnaire.
This particular student also had one of the largest differentials (5.125) between her formative and
summative assessment averages, which could be indicative of her test anxiety impacting her
TESTING IN THE CLASSROOM 7
performance on the tests. The student with the highest averages in all categories and the lowest
differential (0.375) between his formative and summative assessment averages scored low on the
test anxiety questionnaire. While this could signify that Bob’s test grades are a decent
representation of his understanding and achievement, there were too many contradictions in the
Other Studies
Interestingly, as I scoured the internet for information pertaining testing in the classroom,
I ran across few scholarly articles on this topic. On the other hand, the more infamous form of
testing, standardized testing, has an overabundance of data available to review. While I agree
that standardized testing should be the focal point of our current studies because it not only
affects our students temporarily, but also informs the curriculum that our educators use, I think it
is also imperative that we pay closer attention to the summative assessments our students are
Action Plan which wanted to reexamine how tests are utilized in school (Arnett, 2016). At one
point in the announcement, President Obama said that students "should only take tests that are
worth taking — tests that are high quality, aimed at good instruction, and make sure everyone is
on track" (Arnett, 2016). He also mentioned that assessments shouldn’t consume the student’s
classroom time and should only be one of many tools to identify student progress (Arnett, 2016).
The administration went as far as saying that some tests that need to be eliminated are low-
quality, redundant, and unnecessary. The administration listed seven crucial points assessments
• *They must be worth taking: “Testing should be a part of good instruction, not a
departure from it.”
• *They must be high-quality: “High-quality assessment results in actionable, objective
information about student knowledge and skills.
• *They must be time-limited.
• *They must be fair: “Assessments should be fair, including providing fair measures of
student learning for students with disabilities and English learners. Accessibility features
and accommodations must level the playing field so tests accurately reflect what students
really know and can do.
• *They must be “fully transparent” to students and parents: “States and districts should
ensure that every parent gets understandable information about the assessments their
students are taking.”
• *They must be just one evaluation measure: “Assessments provide critical information
about student learning, but no single assessment should ever be the sole factor in making
an educational decision about a student, an educator, or a school.”
• *They must be “tied to improved learning: While some tests are for accountability
purposes only, the vast majority of assessments should be tools in a broader strategy to
improve teaching and learning.” (Strauss, 2015)
This plan was teacher led and had a four-pronged approach that included “financial
support for states to develop and use better, less burdensome assessments, expertise to states and
school districts looking to reduce time spent on testing, flexibility from federal mandates and
greater support to innovate and reduce testing, and reducing the reliance on student test scores
through our rules and executive actions” (U.S Department of Education, 2015). While this plan
was positively ambitious, reversing a culture of testing is easier said than done. Therefore, this
plan was not deemed as successful and standardized testing, as well as frequent classroom
Although Obama’s Testing Action Plan wasn’t carried out fully, I think the ideas in it
were sound and could serve as the foundation of a new plan that takes the focus away from test
At my current school, students are tested at least three times every six weeks. That is an
average of one test every two weeks. This is not necessarily problematic, but may be
TESTING IN THE CLASSROOM 9
unnecessary if the results aren’t indicative of the students’ actual levels of understanding.
Additionally, for students, the term “test” already has a negative connotation. Every time a new
test is announced, the announcement is contested and met with groans. Students do not want to
be assessed at all, so over testing them is not beneficial to the students or the teachers.
According to Shepard, Penuel, and Pellegrino, classroom assessments should be used to support
learning rather than as a “business-as-usual” model (Shepard, Penuel, & Pellegrino, 2018). A
shared curriculum is not in place for every school in every state and therefore, state standards
and assessments cannot possibly be fully aligned with classroom assessments. The main concern
with this is that students may excel on classroom assessments and fail to perform on state
assessments. Neither assessment alone can verify a student’s understanding of the material and
are inaccurate with providing a holistic report on a student’s achievement. Although neither
state assessments alone are used for district, school, and teacher accountability. If there is such a
discrepancy between the state and local assessments, then neither assessment should be used to
Students are well aware of the accountability systems in place which unfortunately
probably affects the way they view and perform on tests. Since state assessments carry more
weight and can affect a student’s future, students will likely take them more seriously than a
classroom assessment that will only be viewed by their teacher and administer. This
Applied to large-scale testing, expectancy-value theory states that a test taker’s motivation to engage in
activities related to large-scale testing depends on their belief about experiencing success on the test and
the value that they place on the content, process, and/or outcomes of the test. That is, if a test taker
TESTING IN THE CLASSROOM 10
believes they will experience success on the large-scale test and they value it, they are more likely to be
motivated and engage with the tasks to the best of their ability. (Barneveld & Brinson, 2017)
The amount of effort put into a test can predict the results. As educators, we need results based
on learning rather than on effort given on a particular day. This is one reason why testing
assessments and formative assessments. Typically, formative assessments are used by teachers
to inform their teaching practices and help them to determine what to modify and what to keep,
while summative assessments are mostly used for data collection by school districts and states.
Although each type of test has its niche, both tests can be used interchangeably in the classroom.
By mixing up the type of assessment given to the student, the burn out often seen with typical
tests may be reduced. Dixson and Worrell discuss two types of formative assessments in their
article: spontaneous and planned (Dixson & Worrell, 2016). Spontaneous formative assessments
aren’t reasonable data collecting assessments, but formative assessments such as quizzes and
homework can be if done correctly. Formative assessments are usually given frequently
throughout the school year and can provide a more in-depth look into student success than one
standardized test can. Some examples of formative assessments that can be taken as grades are
major projects, portfolios, worksheets, quizzes, homework assignments, and exit tickets. These
assessments, while valuable, cannot and will not completely replace testing in classrooms.
Instead, hopefully teachers can find a balance between formative and summative assessments so
that the students’ achievements won’t be completely dependent on tests they don’t want to take.
TESTING IN THE CLASSROOM 11
I am well aware that testing in the classroom is not going anywhere anytime soon due to
the accountability constraints created by school districts and the state, but there are steps we can
take to lessen it. First and foremost, testing in the classroom needs to be evaluated and studied
further. Similar to standardized testing, in class testing can also carry high stakes and therefore
cause unnecessary stress to the students and teachers. Additionally, with more studies, the
accuracy and validity of classroom testing can be analyzed to see how informative it actually is
regarding students’ achievement and understanding. If classroom testing is not serving its
purpose, then educators need to reevaluate their assessment methods and determine if the
constant testing is essential to the students’ success. More specifically, educators should make
sure that their assessment strategies are benefiting students rather than hindering them.
TESTING IN THE CLASSROOM 12
References
Arnett, A. A. (2016, April 18). Why testing prevails in K-12 education. Retrieved from
https://www.educationdive.com/news/why-testing-prevails-in-k-12-education-1/417294/
Barneveld, C. V., & Brinson, K. (2017). The Rights and Responsibility of Test Takers when
Education,40(1), 1-22.
Dixson, D. D., & Worrell, F. C. (2016). Formative and Summative Assessment in the
Murphy, R., & Daniel, A. (2015, December 08). Snook Secondary School. Retrieved from
https://schools.texastribune.org/districts/snook-isd/snook-secondary-school/
Shepard, L. A., Penuel, W. R., & Pellegrino, J. W. (2018). Classroom Assessment Principles to
Support Learning and Avoid the Harms of Testing. Educational Measurement: Issues
Strauss, V. (2015, October 27). Why Obama's new plan to cap standardized testing won't work.
sheet/wp/2015/10/27/why-obamas-new-plan-to-cap-standardized-testing-wont-
work/?utm_term=.673fdcff2119
U.S Department of Education. (2015, October 24). Fact Sheet: Testing Action Plan. Retrieved
from https://www.ed.gov/news/press-releases/fact-sheet-testing-action-plan
TESTING IN THE CLASSROOM 13
Appendix A
4. _____ I read through the test and feel that I do not know any of the
answers.
7. _____ I remember the information that I blanked once I get out of the
testing situation.
Tables
Table 1
Note: This table is a view of all of the students’ overall data taken from the first 24 weeks of
school. The last 12 weeks of school are not included in this data. Pseudonyms were assigned to
Tables 2-4
2nd 6 Weeks
3rd 6 Weeks
4th 6 Weeks
Table 5
Anecdotal Evidence
This table contains data pertaining to the students’ test anxiety, human error mistakes, and
dishonesty instances. These items were tracked over a period of thirteen weeks and aside from