Addressing Bias in Automated Child Language Assessment

Spoken language systems deployed in automated child language-based assessments can encode and amplify biases against children from underrepresented demographic groups, particularly along axes of dialect, race, and socioeconomic background. This work synthesizes evidence from across the assessment pipeline — from speech data collection and ASR model training to downstream scoring with language models — and traces how design choices at each stage produce disparate outcomes for children who speak varieties such as African American English. We propose evaluation protocols that explicitly measure performance gaps across demographic subgroups rather than reporting only aggregate accuracy, and discuss data, modeling, and deployment-time mitigation strategies for building equitable assessment systems suitable for use in real classrooms.

This work was published in the Journal of Educational Measurement, and can be accessed here