Auditory interface for automated driving

Bazilinskyy, P.

PhD thesis (2018)

ABSTRACT Automated driving may be a key to solving a number of problems that humanity faces today: large numbers of fatalities in traffic, traffic congestions, and increased gas emissions. However, unless the car drives itself fully automatically (such a car would not need to have a steering wheel, nor accelerator and brake pedals), the driver needs to receive information from the vehicle. Such information can be delivered by sound, visual displays, vibrotactile feedback, or a combination of two or three kinds of signals. Sound may be a particularly promising feedback modality, as sound can attract a driver’s attention irrespective of his/her momentary visual attention. Although ample research exists on warning systems and other types of auditory displays, what is less well known is how to design warning systems for automated driving specifically. Taking over control from an automated car is a spatially demanding task that may involve a high level of urgency, and warning signals (also called ‘take- over requests’, TORs) need to be designed so that the driver reacts as quickly and safely as possible. Furthermore, little knowledge is available on how to support the situation awareness and mode awareness of drivers of automated cars. The goal of this thesis is to discover how the auditory modality should be used during automated driving and to contribute towards the development of design guidelines. First, this thesis describes the state-of-the-art (Chapter 2) by examining and improving the current sound design process in the industry, and by examining the requirements of the future users of automated cars, the public (Chapter 2). Next, the thesis focuses on the design of discrete warnings/TORs (Chapter 3), the use of sound for supporting situation awareness (Chapter 4), and mode awareness (Chapter 5). Finally, Chapters 6 and 7 provide a future outlook, conclusions, and recommendations. The content of the thesis is described in more detail below. Chapter 2 describes state of the art in the domain of the use of sound in the automotive industry. Section 2.1 presents a new sound design process for the automotive industry developed with Continental AG, consisting of 3 stages: description, design/creation, and verification. An evaluation of the process showed that it supports the more efficient creation of auditory assets than the unstructured process that was previously employed in the company. To design good feedback is not enough, it also needs to be appreciated by users. To this end, Section 2.2 describes a crowdsourced online survey that was used to investigate peoples’ opinion of 1,205 responses from 91 countries on auditory interfaces in modern cars and their readiness to have auditory feedback in automated vehicles. The study was continued in another crowdsourced online survey described in Section 2.3, where 1,692 people were surveyed on auditory, visual, and vibrotactile TORs in scenarios of varying levels of urgency. Based on the results, multimodal TORs were the most preferred option in scenarios associated with high urgency. Sound-based TORs were the most favored choice in scenarios with low urgency. Auditory feedback was also preferred for confirmation that the system is ready to switch from manual to automated mode. Speech-based feedback was more accepted than artificial sounds, and the female voice was more preferred than the male voice as a take-over request. To understand better how sound may be used during fully automated driving, it is crucial to acknowledge the opinion of potential end users of such vehicles on the technology. Section 2.4 investigates anonymous textual comments concerning fully automated driving by using data from three Internet- based surveys (including the surveys described in Sections 2.2 and 2.3) with 8,862 respondents from 112 countries. The opinion was split: 39% of the comments were positive towards automated driving and 23% were seen as such that express negative attitude towards automated driving. Chapter 3 focuses on the use of the auditory modality to support TORs. Section 3.1 describes a crowdsourcing experiment on reaction times to audiovisual stimuli with different stimulus onset asynchrony (SOA). 1,823 participants each performed 176 reaction time trials consisting of 29 SOA levels and three visual intensity levels. The results replicated past research, with a V- shape of mean reaction time as a function of SOA. The study underlines the power of crowdsourced research, and shows that auditory and visual warnings need to be provided at exactly the same moment in order to generate optimally fast response times. The results also indicate large individual differences in reaction times to different SOA levels, a finding which implicates that multimodal feedback has important advantages as compared to unimodal feedback. Then, in Section 3.2 focus was given to speech-based TORs. In a crowdsourced study, 2,669 participants from 95 countries listened to a random 10 out of 140 TORs, and rated each TOR on ease of understanding, pleasantness, urgency, and commandingness. Increased speech rate results in an increase of perceived urgency and commandingness. With high level of background noise, the female voice was preferred over the male voice, which contradicts the literature. Furthermore, a take-over request spoken by a person with Indian accent was easier to understand by participants from India compared to participants from other countries. The results of the studies in Chapter 2 and Sections 3.1 and 3.2 were used to design a simulator-based study presented in Section 3.3. 24 participants took part in three sessions in a highly automated car (different TOR modality in each session: auditory, vibrotactile, and auditory-vibrotactile). TORs were played from the right, from the left, and from both left and right. The auditory TOR yielded comparatively low ratings of usefulness and satisfaction. Regardless of the directionality of the TOR, almost all drivers overtook the stationary vehicle on the left. Section 3.4 summarizes results from survey research (Sections 2.2, 2.3, 3.1, 3.2) and driving simulator experiments (including Section 3.3) on TORs executed with one or multiple of the three modalities. Results showed that vibrotactile TORs in the driver’s seat yielded relatively high ratings of self- reported usefulness and satisfaction. Auditory TORs in the form of beeps were regarded as useful but not satisfactory, and it was found that an increase of beep rate yields an increase of self-reported urgency. Visual-only feedback in the form of LEDs was seen by participants as neither useful nor satisfactory. Chapter 4 draws attention to the use of auditory feedback for the situation awareness during manual and automated driving. Section 4.1 investigates how to represent distance information by means of sound. Three sonification approaches were tested: Beep Repetition Rate, Sound Intensity, and Sound Fundamental Frequency. The three proposed methods produced a similar mean absolute distance error. These results were used in three simulator-based experiments (Sections 4.2–4.4) to examine the idea whether it is possible to drive a car blindfolded with the use of continuous auditory feedback only. Different types of sonification (e.g., volume-based, beep-frequency based) were used, and the auditory feedback was provided when deviating more than 0.5 m from lane center. In all experiments, people drove on a track with sharp 90-degree corners while speed control was automated. Results showed no clear effects of sonification method on lane-keepign performance, but it was found that it is vital to not give feedback based on the current lateral position, but based on where the car will be about 2 seconds into the future. The predictor algorithm should consider the velocity vector of the car as well as the momentary steering wheel angle. Results showed that, with extensive practice and knowledge of the system, it is possible to drive on a track for 5 minutes without leaving the road. Drivers benefit from simple auditory feedback and additional stimuli add workload without improving performance. Chapter 5 examines the use of sound for mode awareness during highly automated driving. An on-road experiment in a heavy truck equipped with low- level automated is described. I used continuous auditory feedback on the status of ACC, lane offset, and headway, which blends with the engine and wind sounds that are already present in the cabin. 23 truck drivers were presented with the additional sounds in isolation and in combination. Results showed that the sounds were easy to understand and that the lane-offset sound was regarded as somewhat useful. However, participants overall preferred a silent cabin and expressed displeasure with the idea of being presented with extra sounds on a continuous basis. Chapter 6 provides an outlook on when fully automated driving may become a reality. In 12 crowdsourcing studies conducted between 2014 and 2017 (including the studies described in Sections 2.2, 2.3, 3.1, 3.2), 17,360 people from 129 countries were asked when they think that most cars will be able to drive fully automatically in their country of residence. The median reported year was 2030. Over the course of three years respondents have moderated their expectations regarding the penetration of fully automated cars. The respondents appear to be more optimistic than experts. Chapter 7 presents a discussion and conclusions derived from all chapters in the thesis. • The most preferred way to support a TOR is an auditory instruction in the form of a female voice. • The preferences of people depend on the urgency of the situation. • Reaction times are fastest when an auditory and a visual stimulus are presented at the same moment rather than with a temporal asynchrony. • An increase of beep rate yields an increase of self-reported urgency. • An increase in the speech rate results in an increase of perceived urgency and commandingness. • If the goal is for drivers to react as quickly as possible, multimodal feedback should be used. • It is important to use a preview controller (look-ahead time) for supporting drivers’ situation awareness in a lane keeping task. • Truck drivers are not favorable towards adding additional continuous feedback to the cabin, even though the feedback is easy to understand. In summary, in this thesis I evaluated the use of sound as discrete warnings, but also as a means of continuous/spatial support for situation/mode awareness.