Full Program Listing*

The ATP ECPS Summit will broadcast live on June 4th and 5th, 2024. All sessions will be recorded and available to registered attendees for 6 months following the Summit. All attendees will receive more information shortly after the summit with details on how to access the recordings.

Monday | June 3, 2024

11:00 AM - 1:00 PM EDT | Pre-Summit Workshop: Automated Scoring and Feedback using Transformer Models

Cambium Assessment, Inc. (CAI) has been using transformer models, which are the basis for generative AI, in its automated scoring engine since 2020. This approach has led to high-quality scoring for our state-level clients, particularly when combined with human scoring. This approach also led to two grand prize awards on the NAEP reading comprehension automated scoring challenge. Our work in transformer architectures for automated scoring has laid the foundation for the development of a writing feedback tool that highlights for students the structure of the argumentative essays and provides feedback aligned with k-12 standards in conventions. This work has also led to instruction fine-tuning and reinforcement learning of large language models such as LLAMA 2, Mistral, and Claude models for scoring feedback, generic scoring, and item passage development. In this session, we will provide an overview of the architecture of our engines, describe the benefits and challenges in its development and use, present the promising and less promising results of our work, and recommendations around the skills and resources needed to build, deploy and use these complex systems.

Speaker: Sue Lottrige, Cambium Assessment

Tuesday | June 4, 2024

11:00 AM - 12:15 PM EDT | Summit Welcome and Opening Keynote: Unlocking Creativity in the Age of AI: Innovative Approaches to Assessment, Scoring, and Learning

Creativity is universally recognized as a pivotal skill in the era of generative AI. However, the absence of robust assessment and learning tools has hindered our ability to foster this crucial competency. This presentation unveils approaches developed through collaboration between The LEGO Foundation, OECD (PISA 2022), and BrainPOP. We introduce innovative assessment techniques, AI-powered scoring systems, and engaging learning-through-play activities designed to cultivate and measure creative thinking throughout the K-Career continuum. Through empirical evidence and real-world examples, we demonstrate the effectiveness of these approaches in assessing creativity, unlocking its potential, and preparing individuals for the challenges and opportunities presented by the AI revolution and the evolving world of education and workforce.

Speakers: Yigal Rosen, BrainPOP

With over 20 years of experience in product development and educational research Dr. Yigal Rosen is the Chief Academic Officer at BrainPOP, a leading provider of engaging and effective digital learning products for K-8 students, teachers and administrators. He leads a team of AI Engineers, learning and assessment designers, and efficacy research scientists who create and evaluate the impact of BrainPOP's innovative learning solutions that leverage human-centered AI/ML technologies, formative assessments, and data-driven insights. Yigal is also a recognized leader and contributor in the field of educational assessment, especially in the areas of creative thinking, holistic skills, and learning progressions. He led the development of the first global assessment of creative thinking for the OECD's PISA 2022, and a research project at The LEGO Foundation that aims to advance research and practice on playful assessments for the development of holistic skills in children. He has also led several research and innovation projects for the U.S. NAEP, taught assessment design at Harvard University Graduate School of Education and gained 10 years of experience in teaching Math, Physics and Computer Science in K-12. Yigal's mission is to improve learning outcomes and academic gains for learners of all ages and backgrounds, and to empower educators with effective tools and strategies.

12:15 PM - 12:30 PM EDT | Check out our Sponsors

12:30 PM - 1:15 PM EDT | Leveraging AI to Transform Learning: Insights from the Product Development Journey

In the rapidly evolving landscape of educational technology (EdTech), artificial intelligence (AI) has emerged as a pivotal force in reshaping the modalities of learning. Many companies have embarked on a pioneering journey to integrate AI technologies into their product ecosystem, thereby offering a highly adaptive, personalized, and effective learning experience to users worldwide. This panel presents an in-depth conversation of the methodologies and innovations implemented by various companies, highlighting the transformative impact of AI on education. Furthermore, ATP’s commitment to inclusivity and accessibility is reflected in many member organizations’ product design, which leverages AI to access and inclusivity. This panel offers valuable insights into the challenges and opportunities presented by AI in educational products, serving as a blueprint for future advancements in the sector.

Speakers: Oscar Santiago, Duolingo; Paul Muir, Surpass; Ashok Sarathy, GMAC; Brodie Wise, ITS

Moderator: Ada Woo, Ascend Learning

1:15 PM - 1:45 PM EDT | Break

1:45 PM - 2:30 PM EDT | Securing the Integrity of Examinations: Legal Implications and Testing Security

Test security encompasses the policies, procedures, and practices employed to protect the integrity of examinations from unauthorized access, fraud, cheating, and other forms of misconduct. This concept is crucial across various domains, including education, certification, and professional licensing, to ensure that test outcomes accurately reflect the knowledge and abilities of the examinees. The legal implications of test security are multifaceted and significant. They can include breaches of copyright law, when unauthorized individuals copy or distribute test materials; violations of contract law, when examinees or administrators fail to adhere to the terms set forth by testing organizations; and potential civil and criminal penalties for individuals or entities engaging in fraudulent activities. Additionally, inadequate test security measures can lead to challenges of test validity and reliability, potentially resulting in legal disputes over the fairness and equity of the testing process. Ensuring robust test security is thus not only an ethical and professional obligation but also a legal necessity to maintain the credibility and legitimacy of the testing process.

Speakers: Marc Weinstein, Marc J. Weinstein PLLC; Rose Hastings, Duolingo

Moderator: Gary Behrens, Fifth Theory

2:30 PM - 2:45 PM EDT | Break

2:45 PM - 3:30 PM EDT | Computational Psychometrics And Digital-First Assessments

Join us for a session from Duolingo English test, revisiting this topic. The term ‘computational psychometrics’ was introduced in 2015 to describe a new approach to measurement that includes machine learning algorithms to take advantage of the rich and diverse data types generated from innovative digital assessments. The first published work using computational psychometrics examined process data, which is associated with the way test takers respond to questions (such as navigational patterns and response time).

Since its initial implementation, computational psychometrics has defined the way in which test development work has evolved alongside technology. For instance, this includes the transition to the development of the AI-based automatic item generation and automatic scoring, and their integration into a psychometric framework, the management of database alignment, multimodal data, ancillary data, and multiple data sources, the development of interactive and collaborative assessments, and last but not least, the transition to embedding assessments into personalized learning systems. In this presentation, the speaker will give an overview of computational psychometrics and how it contributes to assessment innovation.

Speaker: Alina Von Davier, Duolingo

Wednesday | June 5, 2024

11:05 AM - 11:45 AM EDT | GIs Tech Passing the Test? The Turbo Transformation of our Trade and Industry

Education and learning markets, especially the assessment industry, have always been slow to change. Most work is human, manual, and repetitive. There are still big warehouses where they process paper forms from large-scale standardized tests. It will take six months before kids get a raw score report which is well into the next school year.

But technology is catching up with our industry fast. It’s coming like an avalanche. AI, large language and computational models are automating each link of the assessment value chain, from development to delivery, from measuring to methodology. Quantum, cloud and edge computing make it possible to analyze in real-time, even in intricate and immersive learning and assessment systems and in extended realities.

The new technological models are not only changing the way we teach, the way we tutor and the way we assess, they also change what people learn: skills that are relevant for future jobs. The models also remodel our markets and industries, causing disruption, shake-outs, and consolidations. But competition turns into collaboration when portable learner information becomes interoperable and integrated to support a seamless lifelong learning journey. Fasten your seatbelts; there will be turbulence.

Speaker: Marten Roorda, Bill & Melinda Gates Foundation

Marten Roorda is a pioneer in educational technology, leading innovations in assessment, learning and education since 2000. Prior to joining the Bill & Melinda Gates Foundation as a full-time senior advisor, he was running a global consultancy firm, Edtech Consult LLC. Before that, he was the CEO of ACT in Iowa City for five years, and the CEO of Cito in the Netherlands for thirteen years. In 2006, he co-founded Kryterion, the first company in the world to offer online proctoring. In an earlier career, Marten held several management positions in the publishing industry, at VNU and Reed Elsevier, after being a reporter and editor for a number of years. He has a master’s degree in literature from Utrecht University. Marten Roorda has been a member of the Board of Directors of the Association of Test Publishers for ten years and was their Chair in 2010. He also served on the Board of directors of 1Edtech (former IMS Global) and as their interim Chair. He is a former member of the Reach Higher Advisory Board, chaired by Michelle Obama.

11:45 AM - 12:00 PM EDT | Break

12:00 PM - 1:00 PM EDT | Invited Sessions | Moderator: Hong Jiao

Invited Session 1: Training Language Models on Sequence-Based Process Data in Large-Scale Assessments

Increased use of computer-based assessments brings a great opportunity to track process data with the aim to gain a deeper insight about respondents’ test-taking behavioral patterns and problem-solving strategies. The fine-grained process data are often in complex and multidimensional forms that call for data mining methods in addition to classical psychometric models. In this talk, I will give a brief overview of why and how to use process data in digital-based large-sale assessments with a variety of language models and sequence mining methods, such as n-grams model, sequence similarity measures, BERT and large language models, and latent sequence modeling. The goal of these studies is to leverage sequential process data in large-scale assessments to assist in understanding how respondents interact with the items administered, thus supporting test construction, enhancing latent ability estimation, improving the validity of conclusions, and facilitating cross-national comparisons. A new trend of incorporating process data in adaptive testing and quality assurance will also be discussed.

Speaker: Qiwei He, Georgetown University

Invited Session 2: Qualitative Coding in Partnership with ChatGPT

Qualitative coding is an intensive, time-consuming process that relies upon multiple trained individuals. In this work, we investigate the possibility that ChatGPT can assist in multiple steps in this process, including developing coding categories, refining codebooks, and conducting the actual coding. We investigate several configurations of human-ChatGPT partnership, comparing in terms of inter-rater reliability, the similarities and differences in coding categories developed using different approaches, and the perceived scientific usefulness of the codes developed. We are conducting this work using multiple data sets, including tutor-student dialogues and political speeches.

Speaker: Ryan Baker, University of Pennsylvania

Invited Session 3: Enhancing Fairness and Trust in AI-Driven Educational Assessment Scoring

Modern AI capabilities, including large language models (LLMs), significantly enhance the efficiency of scoring educational tests and, in some cases, surpass human accuracy. However, ensuring fairness and trustworthiness in AI-scored assessments remains a critical challenge. I will present a psychometric framework for evaluating and mitigating bias in AI-generated scores, as well as approaches for explaining what contributes to scoring decisions. The approaches will be demonstrated using data from a large-scale assessment.

Speaker: Matt Johnson, ETS

1:00 PM - 1:15 PM EDT | Break

1:15 PM - 2:00 PM EDT | Advancing Assessment Validity in Professional and Workforce Education through AI Applications

While AI has revolutionized learning and assessment, it has also introduced significant challenges. This session will focus on the potential implementation of AI in assessment, the associated challenges, and strategies to address them. There are three presentations in this session.

The first presentation explores emerging AI applications in medical education and assessment, including natural language processing (NLP) and AI tools designed to enhance undergraduate and graduate training. It will also discuss how assessment may evolve to keep pace with educational advancements, citing developments at the National Board of Medical Examiners (NBME) in AI scoring for skills beyond multiple-choice questions, as well as ethical and legal considerations for AI in licensure assessments.

The second presentation addresses the impact of large language models (LLMs) on educational assessment, highlighting concerns about validity and the need to reassess which skills are essential for evaluation in an AI-enhanced world. It will examine strategies to mitigate these challenges and ensure relevant skill assessment.

The third presentation will explore various open-source LLMs and their ability to generate high-quality test items. It will also explore the potential of item embeddings, derived from LLMs, to improve item evaluation and identify enemy items.

Speaker: Kimberly A. Swygert, NBME; Yanyan Fu, GMAC; Paulius Satkus, GMAC

Moderator: Jinghua Liu, Collaborative Assessment

2:00 PM - 2:15 PM EDT | Break

2:15 PM - 2:45 PM EDT | Interview with Jean Hammond: How Accelerators Assess AI-centric Startups

In this interview we will explore the mechanisms through which accelerators evaluate ed-tech business ideas purported to leverage artificial intelligence (AI). Understanding the data environment in which the startup will operate and the range of output metrics available over time allows the accelerator team to assess potential. This interview will look into the criteria and methodologies utilized by early stage investors to sift through AI-driven business proposals, examining factors such as technological feasibility, market potential, scalability, and team expertise. By understanding how accelerators vet AI-centric startups, entrepreneurs and investors gain insights into the evolving landscape of AI innovation and entrepreneurship.

Speaker: Jean Hammond, Learn Launch Accelerator; Brodie Wise, ITS

Jean Hammond is General Partner of the LearnLaunch Fund + Accelerator. She is a co-founder of LearnLaunch and the fund has invested in 80 startups in K-12, higher education and workforce training and up-skilling. The milestone-driven two-phase investment system increases company success, improves development of impact metrics and reduces risks. The portfolio companies are diverse; led by 66% women and BIPOC. and active are now reaching 50+M learners. These companies have reached over 50m learners. Jean has been a supporter of startup ecosystems serving as an entrepreneur-in-residence at MIT Trust Entrepreneurship Center and as an active angel investing in hundreds of starts and training hundreds of new angel investors. Jean was a serial tech entrepreneur cofounding AXON Networks (exited to 3Com) and Quarry Technologies. Jean has BA from Boston University and MS from Massachusetts Institute of Technology, Sloan School. She has served on many boards including: Corporation of Massachusetts Institute of Technology, 2015-2020, MIT Sloan School Industry Advisory Board, 2014- present, MA Innovation Index Board MA Governor’s AI for Education Task Force.

2:45 PM - 3:00 PM EDT | Break

3:00 PM - 4:00 PM EDT | Five Editions of ECPS at ATP: Transforming Learning and Assessment One Summit at a Time

The five editions of the EdTech and Computational Psychometrics Summit at the Association of Test Publishers have been informative, blending cutting-edge AI advances with psychometrics and educational technologies. This Summit emphasized the crucial role of human creativity in shaping AI-driven education and assessment. Sessions highlighted how educators and test developers collaborated with AI systems to design innovative assessment formats, blending quantitative data analysis with qualitative feedback and holistic evaluation methods. Steve Shapiro, a veteran technologist and entrepreneur, will look back on AI advances from the past five years and share his insight on how the technology is transforming education. This session will be moderated by Alina von Davier, the founder of computational psychometrics. Join as for this engaging conversation between two leading experts about the transformative possibilities of amplifying human creativity and leveling up the way we learn.

Speaker: Steve Shapiro, Finetune Learning; Alina Von Davier, Duolingo

* Please note: Schedule is tentative. Check back regularly for any updates. All times listed are EDT.

Full Program Listing*

Monday | June 3, 2024

11:00 AM - 1:00 PM EDT | Pre-Summit Workshop: Automated Scoring and Feedback using Transformer Models

Tuesday | June 4, 2024

11:00 AM - 12:15 PM EDT | Summit Welcome and Opening Keynote: Unlocking Creativity in the Age of AI: Innovative Approaches to Assessment, Scoring, and Learning

12:15 PM - 12:30 PM EDT | Check out our Sponsors

12:30 PM - 1:15 PM EDT | Leveraging AI to Transform Learning: Insights from the Product Development Journey

1:15 PM - 1:45 PM EDT | Break

1:45 PM - 2:30 PM EDT | Securing the Integrity of Examinations: Legal Implications and Testing Security

2:30 PM - 2:45 PM EDT | Break

2:45 PM - 3:30 PM EDT | Computational Psychometrics And Digital-First Assessments

Wednesday | June 5, 2024

11:05 AM - 11:45 AM EDT | GIs Tech Passing the Test? The Turbo Transformation of our Trade and Industry

11:45 AM - 12:00 PM EDT | Break

12:00 PM - 1:00 PM EDT | Invited Sessions | Moderator: Hong Jiao

1:00 PM - 1:15 PM EDT | Break

1:15 PM - 2:00 PM EDT | Advancing Assessment Validity in Professional and Workforce Education through AI Applications

2:00 PM - 2:15 PM EDT | Break

2:15 PM - 2:45 PM EDT | Interview with Jean Hammond: How Accelerators Assess AI-centric Startups

2:45 PM - 3:00 PM EDT | Break

3:00 PM - 4:00 PM EDT | Five Editions of ECPS at ATP: Transforming Learning and Assessment One Summit at a Time

About the Summit

Sponsorship Opportunities