Collision patterns and reporting blind spots in 970 California autonomous vehicle crash reports

Alam, M. S., Zhang, L., Li, J., Dou, F., Bazilinskyy, P.

Submitted for publication.

ABSTRACT Autonomous vehicle collision reports offer a rare view of how autonomous driving systems perform in mixed traffic, but they are difficult to analyse at scale because they combine structured form fields, checkbox-marked elements, and free-text narratives. We analysed 970 publicly available California Department of Motor Vehicles collision reports dated between October 2014 and March 2026, using a ChatGPT 5.4 Thinking extraction pipeline to derive a structured dataset for scenario-level analysis. Three recurring patterns dominated the corpus: rear-end crashes involving stopped AV (266, 27.4%), intersection lateral conflicts (180, 18.6%) and lane change or merge conflicts (156, 16.1%). The reports supported coarse scenario structure much better than fine-grained interaction reconstruction, with mean coarse and fine context scores of 0.97 and 0.48, respectively. Taken together, the findings suggest that many reported AV collisions occur in mixed traffic situations where prediction, coordination, and road user expectations are difficult to align.