Seizing the Opportunity for Performance Assessment: Resources and State Perspectives


Laura Gutmann is a researcher at the Stanford Center for Assessment, Learning and Equity. Christina Jean is the director of Next Generation Learning at the Colorado Education Initiative. Joey Hunziker is the interim director of partnerships at the Innovation Lab Network at the Council of Chief State School Officers.

This article reports from Stanford University’s Innovative Assessments Institute on the development of performance assessment at scale, along with implementation recommendations.

Author photosIn recent years, as states adopted higher standards to define student success in college and careers, they discovered a significant challenge: how to assess higher-order thinking and complex skills. For the most part, standardized tests typically in use did not sufficiently answer that challenge.

Assessment systems such as the Partnership for Assessment of Readiness for College and Careers (PARCC) and Smarter Balanced attempted to go beyond basic skills and look for deeper learning – that is, not just the right answer, but also evidence that the student understood and could apply knowledge. Open-ended and “constructed response” questions allow students to show how they thought critically and analytically, how they organized their thinking, and how they constructed a coherent argument.1 But limitations remained to the depth and function of the information about student learning gleaned from these assessments.

This is not a new challenge. For years, while researchers, policy makers, superintendents, principals, and teachers have understood the limitations of standardized tests, they wracked their brains trying to develop ways to better assess students’ harder-to-measure higher-order skills. One potential solution that many state and local leaders have been exploring emerges from the significant advances over the last two decades in the fields of performance assessment and student portfolios. Across the country, state education agencies (SEAs) and local education agencies (LEAs, or districts) have begun to collaborate with teachers on the ground to develop performance-based assessment systems that increase our understanding of how students process complex texts, respond to challenging prompts, and employ important skills to solve real-world challenges.

An accountability system built on the implementation of performance assessments has the potential to foster deeper and more authentic learning for students and more agency and assessment literacy for educators and school leaders. By investing in educators’ capacity for performance assessment, states and districts support the development of better-prepared, more-empowered educators who use curriculum-embedded performance assessments as part of an instructional cycle.

Real Assessment for Real Students

During the autumn of 2016, 120 teacher leaders, coaches, district coordinators, professional development providers, and administrators from 16 states and a number of districts gathered at Stanford University to build their capacity to measure deeper learning. They had been selected to participate in Stanford’s Innovative Assessments Institute in order to further leverage their roles in facilitating the effective implementation of K-12 performance assessments within their local contexts. Stanford Center for Assessment, Learning and Equity (SCALE) assessment experts from across the core content areas had organized the conference with support from the Hewlett Foundation and the Stanford Center for Opportunity Policy in Education (SCOPE), with the goal of demonstrating how newly developed resources to guide the design and development of performance tasks could be utilized to bolster a growing interest in performance assessment across the country.

Conference attendees listened to students enthusiastically describe how engaged they were with the meaningful instruction and assessment at their schools, which all had a history of developing assessments that ask students to perform, create, or produce something that would authentically demonstrate their learning.

Yet, familiar themes began to emerge as the conversation turned towards the practical. Participants wondered how to vet existing tasks for quality, better link such assessments to instructional units, and align their efforts to broader accountability standards. Teachers were providing students with opportunities to apply their knowledge, skills, and content understandings to novel problems and issues in the world, but lacked a consistent, coherent assessment system. States, districts, and teachers needed resources to build, scale, grade, and validate performance-based assessments on a large scale.

Responding to the Growing Need for Performance Assessment Resources: PARB

One tool for implementing performance-based assessments at scale is the Performance Assessment Resource Bank (PARB), a selection of performance assessment tasks, supplemented by related design and implementation resources, launched in October 2016. PARB was created by SCALE and SCOPE, in collaboration with members of the Council of Chief State School Officers’ Innovation Lab Network (ILN). PARB serves as a complement to emerging assessment policies and practices that foster deeper learning experiences for all students (Cook-Harvey & Stosich, 2016; Darling-Hammond, et al., 2016).

The resource bank was built over the course of three years, spurred by the need to fill a gap in readily accessible performance assessment tools that had been vetted for quality. The partnership between SCALE, SCOPE, and the ILN was formed as a coordinated response to address this issue, based on feedback from state leaders who recognized that their local capacity to establish quality tools was lacking. Although there were pockets of progress where educators had developed their own ground-level assessments, they could not adequately respond to the increased demand for performance-based tasks that met uniform learning and design standards. Instead, a mixed bag of "home-cooking" had left consumers looking online for tasks with no way to distinguish between high and poor-quality tasks or to easily find assessments that aligned with their instructional objectives.

SCALE solicited contributions from dozens of like-minded organizations at the forefront of performance assessment, such as the Literacy Design Collaborative (LDC), Educational Policy Improvement Center (EPIC), and Center for Collaborative Education (CCE) in order to collect and curate curriculum-embedded tasks that allow K-12 students to more authentically demonstrate their learning in ELA, math, science, and history/social studies. PARB staff standardized a review process and calibrated reviewers according to set quality metrics in order to create a user-friendly bank of searchable rubrics, tasks, tools, and relevant policy research. Individual educators also played an important part in developing PARB materials, both as beta testers of the bank and as authors of performance assessments that were then reviewed according to the bank’s criteria. PARB community members can also submit resources and tasks for potential inclusion in the bank and receive feedback according to quality criteria. States can take advantage of PARB’s existing performance assessment resources while submitting their own tasks to the bank for validation.

Lessons Learned about Implementing Performance Assessments

Educators and assessment experts from states, districts, and education organizations worked hard to make high-quality performance assessments and exemplars widely available by contributing resources to PARB. However, pilot testing during the beta phase of bank development leading up to its launch revealed that building a collection of resources is only the first step towards promoting effective use of those materials. To facilitate effective use of performance assessments to drive instructional decisions, we must also develop deeper district and network-level partnerships within the field and offer customized support to educators trying out bank resources that considers their varied contexts for teaching and learning and makes use of their on-the-ground expertise. The work samples and benchmarks provided through the PARB are most useful when positioned as a starting point for a deeper process.

For example, one day of the Stanford event was dedicated to examining sample student work products generated by performance assessment tasks, so participants could see firsthand the connection between the implementation of PARB resources and the impact on student learning. Facilitators emphasized to the educators the importance of examining student work to check the assessment’s instructional effectiveness and identify adjustments to the task that would increase its effectiveness in eliciting high quality student work. A participant noted that “having teachers review samples of student work, apply rubrics, and develop shared understandings of intended learning outcomes (as well as inter-rater reliability) is vitally important” and emphasized that these practices should be incorporated early in the process of developing performance-based assessments.

One enthusiastic state leader pointed out that discussing student work products relative to assessment expectations makes the most impact when multiple teachers are able to come together to share the results of a common task that was applied across a network of schools or classrooms. When that process is established, “examining student work and calibrating with colleagues is paramount in helping identify misconceptions and can transform the use of assessment to inform instruction and to provide useful feedback.” Educators can further analyze performance assessment results to hone in on student needs that might otherwise be overlooked, such as providing scaffolds for English learners that address the language demands embedded within performance task assignments.

Thus, teachers are most effective when they go beyond the bank to consider the application of performance tasks to their particular classrooms, within a structure for assessment implementation that provides time and space for collaboration, conversation, and adaptation. Ideally, as the participants in the Stanford event hoped for, “analyzing the content of the student work” will reveal “patterns, trends, strengths, and needs” that teachers can then respond to, rather than placing “too much of an emphasis on the grade.”

Recommendations for Leaders

As LEA and SEA leaders consider their next steps related to performance assessments and accountability, we offer the following considerations for shaping future work:

  • Balance maintaining high standards for both academic and non-academic leaning with flexibility to allow for local innovation. If we are to continue to drive gains in equity for all students, LEAs and SEAs must build systems of accountability and assessment that ensure readiness for college and career with regard to both academic and non-academic skills. This balance will require a reorientation away from the “traditional” role of the district and state focused on compliance and toward allowing local innovation to flourish in the development and implementation of assessment and accountability policies.
  • Support educators and school and district leaders in developing assessment and data literacy. To engage networks of educators in developing quality assessments across a region or state, SEAs and LEAs must invest in supporting practitioners systemically and systematically in order to enable reliable, valid, comparable, and equitable performance assessment implementation and scoring, as has been the case in New Hampshire’s PACE pilot.
  • Seek out established resources to guide performance assessment goals. The creation of digital libraries like PARB with a range of formative assessment instruments, curriculum resources, and instructional modules has the potential to provide educators with robust tools for assessment implementation, while moving away from a one-size-fits-all approach to measuring learning.
  • Make connections with educators and leaders engaged in similar efforts. States and districts building systems of quality performance assessment can learn from each other’s work as they share best practices, contribute to open-ended educational resources, and receive coordinated support.


Educators from across the globe can access the Performance Assessment Resource Bank (PARB) without charge at Thousands of educators are already engaged in implementing PARB resources, and the bank’s creators continue to collaborate to meet the needs of the field and create equitable access to tools for developing meaningful instruction and assessment practices, especially for English language learners and other students who have been traditionally underserved.

Related topics: 

1 PARCC and Smarter Balanced were developed through collaborations between groups of states and educators in response to new, more rigorous Common Core academic standards adopted by most states in 2010 and 2011. See and A “constructed response” question is one that requires students to supply their own response, rather than pick among multiple-choice answers. 

Cook-Harvey, C. M. & Stosich, E. L. (2016). Redesigning school accountability and support: Progress in pioneering states. Stanford, CA: Learning Policy Institute and Stanford Center for Opportunity Policy in Education.

Darling-Hammond, L., Bae, S., Cook-Harvey, C., Lam, L., Mercer, C., Podolsky, A., & Stosich, E. L. (2016). Pathways to new accountability through the Every Student Succeeds Act. Stanford, CA: Learning Policy Institute and Stanford Center for Opportunity Policy in Education.

Wei, R.C., Pecheone, R.L. Pecheone, & Wilczak, K.L. (2014). Performance assessment 2.0: Lessons from large-scale policy and practice. Stanford, CA: Stanford University.

Wei, R. C., Pecheone, R. L., & Wilczak, K. L. (2015). Measuring what really matters. Phi Delta Kappan, 97(1), 8-13.