Performance-based assessments fall into the category of alternative or authentic assessment (Sweet, 1993). They rely on “authentic tasks that assess what a student knows and can do” (Caffrey, 2009). On such assessments students are required “to perform a task rather than select an answer from a ready-made list” (Sweet, 1993). Oftentimes, assessments such as the Scholastic Aptitude Test (SAT), American College Testing (ACT), Advanced Placement (AP), International Baccalaureate (IB) tests, and even some state exams, include a performance-based assessment portion; one in which a student must, for example: write an essay, solve a problem, or explain how he or she would perform a hypothetical experiment. Proponents of this type of assessment believe that “because they require students to actively demonstrate what they know, performance-based assessments may be a more valid indicator of students’ knowledge and abilities” (Sweet, 1993).
Performance-based assessments go beyond measuring students’ acquisition of knowledge—ideally they demand far more than memorization of rules or facts and “good test taking strategies”. These authentic assessments aim to determine if students know how to apply their knowledge, demonstrating what they have learned through a variety of tasks (Adamson & Darling Hammond, 2010). They can take many forms: essays, speeches, projects, exhibitions, open-ended or extended response exercises, extended tasks, or even portfolios (Tung 2010; Sweet, 1993). In Sound Performance Assessments in the Guidance Context (1984), Richard Stiggins notes three critical components of performance assessments:
(1) Specification of a performance to be evaluated;
(2) Development of exercises or tasks used to elicit that performance;
(3) Design of a scoring and recording scheme for results.
Benefits of using performance-based assessments
Performance-based assessments are able to provide teachers with more detailed information than standard multiple-choice tests. They serve both a summative and formative purpose; they can tell teachers about what content a student has or has not mastered, and additionally offer insight into what concepts students are struggling with or where they get lost in a process. There are many benefits to utilizing performance-based assessments:
…research shows that well-designed performance assessments yield a more complete picture of students’ abilities and weaknesses, and can overcome some of the validity challenges of assessing English language learners and students with disabilities. (Adamson & Darling Hammond, 2010)
Seeing a student’s work, rather than simply an aggregate score enhances the formative use of performance-based assessments, providing teachers with the opportunity to “engage students more in their own learning and interests” through the inclusion of “reflection and demonstration of thinking processes” (Tung, 2010). Furthermore, this form of assessment “encourag[es] schools to build professional collaborative cultures through integrating curriculum, instruction, and assessment” (Tung, 2010).
The involvement of teachers as performance-based assessment evaluators serves as a professional development opportunity, helping teachers to “become more knowledgeable about how to evaluate and teach to challenging standards” (Adamson & Darling Hammond, 2010). Research has shown that the use of performance measures “increase[s] intellectual challenge in classrooms and support[s] higher-quality teaching” (Adamson & Darling Hammond, 2010). Moreover, performance-based assessments provide an opportunity to measure our nation’s progress in teaching students to be college and career ready as this form of assessment is “better suited for measuring “21st Century Skills,” such as critical thinking and problem solving skills, collaboration, and creativity and innovation” (Caffrey, 2009).
With regards to the effects performance-based assessments have on teaching and learning, Edward Haertel (1999) notes:
These new forms of assessment would promote active engagement both in learning and in demonstrating what had been learned. They would serve as models of sound instructional activities…As the line between teaching and testing blurred, classroom time would be better employed.
Using large-scale, performance-based assessments can also have positive effects on the curriculum used in the classroom, resulting in a greater focus on both higher-level skills and application of knowledge. These tests promote the development of 21st Century skills, and by giving teachers, administrators, schools, etc. a clearer picture of what students have or have not learned, these assessments facilitate changes in instruction that can better prepare students and ensure mastery of content. In Performance Assessment and the New Standards Project: A Story of Serendipitous Success, Elizabeth Spalding notes the potential for performance-based assessments to positively influence curriculum and instruction; she believes
…performance-based assessments and portfolios look like our best hope for providing meaningful information about the performance capabilities of students and for bringing about institutional change. (Spalding, 2000)
While assessment must not be mistaken for reform, it is clear that the effects of performance-based assessment on teaching culture and the classroom would certainly make great strides in systemic education reform.
Shortcomings of performance-based assessments
While “performance assessments can measure some important kinds of outcomes that multiple choice tests cannot”, their high cost of implementation and questionable validity and reliability are significant barriers to large-scale implementation of these assessments (Haertel, 1999). In the 1950s, the invention of the high-speed optical scanner revolutionized multiple-choice testing; “…multiple-choice tests could now be scored at a fraction of the cost, with better apparent objectivity and increased reliability” (Madaus & O’Dwyer, 1999). This advancement led to their largely unchallenged, widespread use. As Madaus and O’Dwyer (1999) point out in A Short History of Performance Assessment: Lessons Learned, performance assessments are generally less efficient, difficult to administer, more time-consuming, not easily standardized, and additionally test a considerably smaller sample from the overall knowledge domain. Furthermore, because these assessments allow significant latitude in student interpretation and response and require expert judgment for evaluation, it is very difficult for performance-based assessments to meet “criteria related to such validity issues as reliability, generalizability, and comparability of assessments—at least as they are typically defined and operationalized” (Moss, 1992). The primary concerns with performance assessment are:
1. Time and Content: Performance-based assessments are not able to test as much material as multiple-choice tests in the same amount of time. Therefore, performance-based assessments usually require additional time to administer—this takes away from instructional time. There is also an inherent trade-off between time and content; the more content the performance-based assessment attempts to cover, the more time it will take to design, administer, and score.
2. Reliability: The main threat to reliability comes from the necessity of having ‘experts’ score the performance-based assessments. Even with a set rubric, there may be variation in scoring among different raters. In particular, the scoring of performance-based assessment is “susceptible to two general classes of measurement error: random and systematic” (Raymond & Viswervaran, 1993). Random error is inter-rater reliability. Systematic error, also called leniency error, “is present when the mean of a rater summed over candidates differs from the mean of all raters” (Raymond & Viswervaran, 1993).
3. Validity: Internal validity is the extent to which each question (or task) on the test measures the objective, skill, etc. that it intends to measure. External validity, also referred to as generalizability, is the extent to which a student’s performance on a question (or an entire test) represents their overall ability, or the extent to which their performance on one question can be generalized to the domain of knowledge the task represents, or even the extent to which various performance assessments, with the same domain, can be compared. Performance-based assessments are vulnerable to both internal and external validity threats because it is difficult to design performance-based assessments, and also because time and money constrains the sample one is able to test from the domain (Madaus & O’Dwyer, 1999).
The aforementioned concerns contributed to the demise of the performance-based assessment movement in the 1990s. If we want to implement performance-based assessments in the United States on a large-scale, all these issues will need to be adequately addressed.
Adamson, F., & Darling-Hammond, L. (2010, April). Beyond Basic Skills: The Role of Performance Assessment in Achieving 21st Century Standards of Learning. Stanford, CA: Stanford Center for Opportunity Policy in Education.
Caffrey, E. D. (2009). Assessment in Education and Secondary Education: A Primer.Washington: Congressional Research Service.
Stiggins, R. J. (1995). Sound Performance Assessments in the Guidance Context. ERIC Digest.Retrieved from: ERIC.ed.gov
Sweet, D. (1993, September). Performance Assessment. (Office of Research, Office of Educational Research and Improvement (OERI) of the U.S. Department of Education.) Retrieved from Education Research Consumer Guide: http://www2.ed.gov/pubs/OR/ConsumerGuides/perfasse.html
Tung, R. (2010). Including Performance Assessments in Accountability Systems: A Review of Scale Up Efforts. Center for Collaborative Education and the Nellie Mae Education.