I am currently engaged in research on communication between multiple automated vehicles and multiple vulnerable road users. Most of the experiments in current research feature 1, maybe 2, human participants. But, think about driving around a town in real life. Traffic situations are very complicated. Being able to perform experiments with 3, 4, …, 16 participants is essential for understanding the mechanics of communication of situation awareness/collaborative decision making/collaboration in both modern and future traffic. To enable such research, I present an open-source simulator supporting a virtually unlimited number of human participants and fine-tuned for high precision data logging. It is aimed at, but not limited to, academic research.
Demo of coupled simulator with 3 agents in the same traffic scene. One agent is wearing a motion suit and one has a head-mounted display.
During my PhD on human factors, I was astonished to realise that the majority of research features features a sample of “10 white male highly educated students from Europe with an average age of 21.5 years”. I thought that it was not a right way to design and develop systems, which primary function is to save lives and be used by the public (especially in developing countries, which are impacted the most by unsafe traffic). I launched my first study using a novel method of crowdsourcing already in the 2nd month of my PhD project. 1,205 respondents from 91 countries expressed their opinions about current and hypothetical auditory interfaces. In another study, I implemented synthesised speech in a crowdsourcing survey—an innovative approach, as most researchers in the domain focus on non-speech feedback. I developed a new framework to reliably present stimuli to participants online and replicated several well-established studies but with a much larger sample size of 2669 participants from 95 countries. Then, I developed a JavaScript framework based on the jsPsych project for accurate online measurements of reaction times and asked 20000 participants to react to 176 trials featuring auditory, visual, and multimodal stimuli. Then, I adapted the TurkEyes library to receive accurate measurement of eye gazes from a browser without any eye trackers (see video on the left for a demonstration of animated heatmaps of 2000 pedestrians looking at 107 traffic scenes with different exposure times in a recent study). I also published the source code with an extendable framework for crowdsourced recording of eye gazes. Such code can be used in combination with accurate measurement of keypresses. Being able to conduct crowdsourced research proved to be especially useful during the pandemic.
The interactions between future cars and pedestrians should be designed to be understandable and safe worldwide. Crowdsourcing helps to go beyond WEIRD (Western, educated, industrialised, rich, developed) studies. But we can go beyond that and cover the whole world to have a true understanding of how people behave around the globe and get a better grasp of how future cars and transportation infrastructure should be designed. We live in the 21st century, where the internet has become ubiquitous and universally adopted. Accessibility to technology created a phenomenon of ASMR driving videos. I established work on the population of the Pedestrians in YouTube (PYT) dataset, which includes 2051 hand-picked hours of day and night urban dashcam footage from 1268 towns and cities in 216 sovereign states and dependent territories. Using YOLO, we analysed aggregated pedestrian behaviour both on the city and country levels. And we are investing in going beyond YOLO to have a more precise and complete understanding of what exactly happens on the streets of cities on all continents.
The logical next step is to question whether we even need human participants to conduct (basic) human factors research. Of course, some hypothesis must be tested in a very controlled environment with expensive eye trackers. But, some research questions can maybe be answered through the “wisdom of humanity up until a certain point in time”, which arguably what AI (LLM) is. In this study, we used the GPT-4V vision-language models to compare LLM-based assessment of risk with findings from a crowdsourced study with 1378 participants. The conclusion was that the population-level human risk can be predicted using AI with a high degree of accuracy. We also explored the use of 11 LLMs to evaluate external human-machine interfaces (eHMIs) in automated vehicles.
During my work at SD-Insights, I developed a portable sensor to collect information on the state of the environment called NEXTeye. It is based on Mapbox Vision SDK and NVIDIA Jetson Nano. The sensor is plug-n-play, retrieves vehicle dynamics data, and performs real-time scene segmentation and object detection. The portability of NEXTeye allows its use not only inside of a car but also as a wearable by vulnerable road users. Multiple such sensors can be connected and synchronised.
My intrinsic motivation to do a PhD stemmed from the fact that automated vehicles have the potential to prevent virtually all road fatalities. To achieve that, automated vehicles must collaborate with humans inside and outside the vehicle. During my PhD I focused on auditory feedback for automated driving. With on-road and driving simulator studies, I showed that multimodal feedback that takes the urgency of the traffic situation into account could support AV-driver collaboration effectively.