Test, Measurement, Evaluation, and Assessment
Test, Measurement, Evaluation, and Assessment
Test : A procedure designed to elicit a certain behavior from which one can make inferences about certain characteristics of an individual. A procedure (an instrument acomplished by instructions) to reveal the students communicative competence through their performance. A method used to measure the level of achievement or performance. : Process of quantifying individuals characteristics according to specific rules & procedure. Process of assigning numbers (quantifying) to qualities or characteristics of an object or person according to some rule or scale and analyzing that data based on psychometric and statistical theory.
B. Measurement
C. Assessment
: An ongoing process and a kind of measurement which encompasses a wider ___domain than a test and is carried out in direct and indirect ways. Process of making judgments based on criteria and evidence by examining information about many components being evaluated (e.g., student work, schools, or a specific educational program) and comparing or judging its quality, worth or effectiveness in order to make decisions.
D. Evaluation
: The systematic gathering of information for the purpose of making decisions. Process of gathering, describing, or quantifying information about performance which is done by documenting knowledge, skills, attitudes and beliefs,
usually in measurable terms. It is used to make improvements. In an educational context, assessment is the process of describing, collecting, recording, scoring, and interpreting information about learning.
3. The steps in developing a test and a non test: A. The steps in developing a test a. Clarify the purpose of the test The teacher can start developing a good test by deciding what decisions need to be made based on the test results. The quality of a test is determined by the extent to which the test makes appropriate or right decisions.
b. Define the construct The construct is what the test measures. It is particular knowledge, skill, or ability the test must measure to enable it to make the right decisions. c. Design the test The design of the test is a document called Test Specification. It is an operational definition of the construct. d. Create the items (or tasks) Test items are very complex, and item writers cannot always imagine the many ways test takers can respond to their items. So after they have been written, items need to be reviewed. e. Pilot every item (or task) The test items must be tried out on a representative sample of target test takers, and the sample must be large enough for the particular statistical procedures to be carried out. f. Select the measurement model The teacher need to take the items and turn them into a measurement instrument. There are a number of different ways to do that, using Classical Test Theory, or using Item Response Theory (IRT). g. Create the IRT scale The first step of the process is to create an IRT scale and calibrate all items on that scale, using the data from the pilot administration. The Rasch scale is a probabilistic scale, with both item difficulty and test taker ability expressed on same scale. The units of the Rasch scale are logits. h. Evaluate the items Based on this analysis, items are evaluated for quality and appropriacy. This usually involves looking at their difficulty, their fit to the Rasch model, and correlations with other items or parts of the test. Good items are kept, while poor ones are thrown out, or sent for revision.
i. Assemble the test forms Versions of the test are then built, based on the Test Specifications, item content and the statistical qualities of the items. They should be parallel in structure, in layout, in the number of items of each type, and in content. They should be designed to measure the same skills. j. Create a reporting scale and equate the forms A reporting scale needs to be created that score users feel more comfortable with. This must be a linear transformation of the Rasch scale, and can be anything that is acceptable. As part of this process, the various test forms are equated. Since the different test forms contain items of different difficulties, they will vary in difficulty. One form will be harder than another and a certain number of items correct on one form will not represent the same ability level as the same number correct on a different form. Thus each form will have a different conversion to the reporting scale, to take account of this. k. Set performance standards A standard setting study will be needed if the tests are to be used for certification purposes, or if passing scores need to be set. l. Write up documentation Any testing system needs to be accompanied by many different documents: Item development manuals, Administration manuals, Test taker guidelines, Score interpretation guides, Technical manuals, Validation reports, and Research studies. m. Field test and validate the system For many large scale testing systems it is normal to carry out largescale field testing of the system. This has a number of purposes: it tests the operational aspects of the system to see how well things are working, it provides normative data for specific groups of interest, and it provides evidence of the validity of the test.
n. Ongoing test development, review and evaluation During the useful life of the test, new items will need to be written, and new test forms will need to be developed. Test performance needs to be monitored on a regular basis. Test takers data needs to be analyzed on a regular basis, and revised technical reports need to be created. Validation is an ongoing requirement, and any highstakes testing system needs to accumulate a variety of research studies and validation reports, which continue to explore the test and the meaning of the test scores.
B. The steps in developing a non test Developing performance tasks or performance assessments seems reasonably straightforward, for the process consists of only three steps. 1. Listing the skills and knowledge you wish to have students learn as a result of completing a task As tasks are designed, one should begin by identifying the types of knowledge and skills students are expected to learn and practice. These should be of high value, worth teaching to, and worth learning. In order to be authentic, they should be similar to those which are faced by adults in their daily lives and work. Herman, Aschbacher, and Winters (1992, pp. 25-26) suggest that educators need to ask themselves five questions as they identify what is to be learned or practiced by completing a performance task. Their questions, with examples, follow: a) What important cognitive skills or attributes do I want my students to develop? (e.g., to communicate effectively in writing; to analyze issues using primary source and reference materials; to use algebra to solve everyday problems). b) What social and affective skills or attributes do I want my students to develop? (e.g., to work independently, to work
cooperatively with others, to have confidence in their abilities, to be conscientious). c) What metacognitive skills do I want my students to develop? (to reflect on the writing process they use; to evaluate the effectiveness of their research strategies, to review their progress over time). d) What types of problems do I want them to be able to solve? (to undertake research, to understand the types of practical problems that geometry will help them solve, to solve problems which have no single, correct answer) e) What concepts and principles do I want my students to be able to apply? (e.g., to understand cause-and-effect relationships, to apply principles of ecology and conservation in everyday lives).
2. Designing a performance task which requires the students to demonstrate these skills and knowledge The performance tasks should motivate students. They also should be challenging, yet achievable. That is, they must be designed so that students are able to complete them successfully. In addition, one should seek to design tasks with sufficient depth and breadth so that valid generalizations about overall student competence can be made. Herman, Aschbacher, and Winters (p. 31) have a list of questions which are helpful in guiding the process of developing performance tasks.Those questions, with their recommendations, follow: a) How much time will it take students to develop or acquire the skill or accomplishment? The authors recommend that assessment tasks should take at least one week for students to complete. Others recommend that worthwhile tasks require far more time.
b) There are no rules regarding the appropriate length or complexity of a task; however, there are problems associated with developing overly complex and creative performance tasks (Cronin,1993). To begin with, relatively modest performance tasks are easier to develop. Furthermore, if they are well crafted and reasonably short (a few days rather than a few weeks), they are more likely to hold the interest of students. Finally, if a task fails to accomplish its purposes, it is best if the task is limited in duration. c) How does the desired skill or accomplishment relate to other complex cognitive, social, and affective skills? Priority should be given to those which apply to a variety of situations. d) How does the desired skill or accomplishment relate to longterm school and curricular goals? Skills or accomplishments which are integral to long-range goals should receive the most attention. e) How does the desired skill relate to the school improvement plan? Priority should be given to those which are valued in the plan. f) What is the intrinsic importance of the desired skills or accomplishment? Emphasis should be given to those which are important, while others should be eliminated. g) Are the desired skills and accomplishments teachable and attainable for your students? Priority should be given to tasks which represent realistic goals for teaching and learning.
3. Developing explicit performance criteria which measure the extent to which students have mastered the skills and knowledge It is recommended that there be a scoring system for each performance task. The performance criteria consist of a set of score points which define in explicit terms the range of student
performance. Well-defined performance criteria will indicate to students what sorts of processes and products are required to show mastery and also will provide the teacher with an "objective" scoring guide for evaluating student work. The performance criteria should be based on those attributes of a product or performance which are most critical to attaining mastery. It also is recommended that students be provided with examples of high quality work, so they can see what is expected of them. Additional Recommendations for Developing Performance Tasks a) Keep in mind that the concepts of performance /authentic assessments are not new. Teachers always have assigned tasks which require their students to perform or develop products. b) If possible, groups of educators should work together to design performance tasks. Tasks are more likely to be interdisciplinary. In addition, this process allows for discussion and exchange of ideas. c) Develop tasks which are fair and free of bias. Tasks should not give particular advantage to certain students. d) Develop tasks which are interesting , challenging, and achievable. This means that the tasks should be neither too complex and demanding, nor too simple or routine. e) Develop tasks which are maximally self-sustaining, with clear, step-by-step directions and with the record-keeping responsibilities placed mostly on the students. If this is done, the teacher need not guide activity every step of the way and record massive amounts of information throughout the process.