Using fNIRS to Verify Trust in Highly Automated Driving

Trust in automation is crucial for the safe and appropriate adoption of automated driving technology. Current research methods to measure trust mainly rely on subjective scales, with several intrinsic limitations. This empirical experiment proposes a novel method to measure trust objectively, using functional near-infrared spectroscopy (fNIRS). Through manipulating participants’ expectations regarding driving automation credibility, we have induced and successfully measured opposing levels of trust in automation. Most notably, our results evidence two separate yet interrelated cortical mechanisms for trust and distrust. Trust is demonstrably linked to decreased monitoring and working memory, whereas distrust is event-related and strongly tied to affective (or emotional) mechanisms. This paper evidence that trust in automation and situation awareness are strongly interrelated during driving automation usage. Our findings are crucial for developing future driver state monitoring technology that mitigates the impact of inappropriate reliance, or over trust, in automated driving systems.


I. INTRODUCTION
R ESEARCH has identified trust in automation (TiA) as a critical human factor for the acceptance and correct usage of automated driving systems [1]. According to the current state-of-the-art, TiA has several layers. Whereas dispositional trust can be static during an adult lifespan, situational trust fluctuates with experience [2]. This is known as trust calibration [3]. Several subjective scales exist to measure dispositional and situational TiA but these cannot measure TiA objectively and in real-time.
Lee and See described three processes for trust judgements at a contextual level: affective, analogic or analytic [4]. The stronger emotional -in the former-or executive -in the latter-component define their nature. The occurrence of trust judgements with greater emotional content or rationallycalculated outcomes will depend on several factors, such as time availability, experience or expertise with the automated system. These processes could conceivably be a way to measure situational TiA using modern wearable neurophysiology equipment in realistic lab setups.
II. BACKGROUND Previous experimental research in economics has explored the neural correlates of interpersonal trust and distrust using neurophysiology. These studies have explored reciprocal social exchanges [5], seller profile's trustworthiness [6], and trustworthiness evaluations of online offers [7] using fMRI. Overall, these studies evidence that: (1) Trust and distrust are distinct yet related constructs regarding the nature and type of neural responses involved and the timescale required for their development [6]. (2) The neural mechanisms of trust and distrust involve emotional and cognitive structures. Distrust is more dependent upon autonomic emotional processes, whereas trust is more dependent on intentional, calculated decision-making [5], [6], [7]. The overall trust process is similar for humans and automated agents, yet significant differences exist between interpersonal trust and TiA [8].
An emerging perspective that tackles TiA from a neuroergonomics approach [9], [10] offers the potential to apply known neural correlates of decision-making, theory of mind and anticipation of rewards or losses, to the current frameworks of TiA. Research in neuroergonomics has explored the neural correlates of TiA, often with wearable electroencephalogram (EEG) and functional near-infrared spectroscopy (fNIRS) devices.
Work in this domain identified event-related potential (ERP) components from the EEG signal in the anterior cingulate cortex as neural markers for error monitoring [10]. These ERP components were used to infer miscalibrated trust while participants with opposing algorithm credibility expectations (expected performance) monitored the algorithm's reliability (actual performance). Results indicated that greater attentional orienting responses to unexpected errors from a reliable algorithm were positively correlated with self-reported trust. Thus, participants quickly calibrated their trust toward the actual algorithm performance, ignoring the credibility expectation provided.
A similar experiment in neuroergonomics used EEG while participants monitored algorithm reliability and rated their trust levels [11]. The authors examined how credibility and reliability affect the causal relationships among different brain regions. That is, the way brain regions are influenced by credibility and reliability in the context of human-automation interaction. Their findings corroborated those from [10] in that initial credibility modulates the formation of initial This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ trust, with automation reliability derived from the experience being the main factor influencing the calibration of TiA. Furthermore, their results also agree with previous literature evidencing that trust and distrust elicit different connectivity patterns in the brain, and thus, these are two distinct cognitive processes. In particular, distrust elicits a complex quick and episodic top-down response involving several neural networks (i.e., prefrontal cortex, posterior cingulate cortex, and the temporoparietal junction) requiring additional cognitive resources compared to trust, which instead consists of a slower, cumulative and deliberate process based upon the long-term experience [6].
Hopko and Mehta used fNIRS to measure cortical activation and functional connectivity associated to trust during a human-robot collaboration task [12]. Their participants had to complete a surface finishing task in collaboration with a robot with varying reliability conditions. Results indicate increased cortical activation during unreliable robot behaviour within the dorsolateral prefrontal cortex (DLPFC) the anterior prefrontal cortex, primary motor cortex, and primary visual cortex.
In the context of automated driving, to the authors' knowledge very limited research has attempted to explore the neural correlates of trust. Seet and colleagues used EEG power spectral density and functional connectivity analysis, combined with behavioural parameters and self-reports in a driving simulator study [13]. The simulation involved conditionally automated driving (SAE Level 3) and fully automated driving (SAE Level 5). Participants were exposed to several system failures across the study. Results indicated a significant reduction in right-frontal alpha band activity during system malfunction in fully automated driving, supported by decreased self-reported trust in this condition. The authors argued that the reduction of right frontal power derives from a lateralised left-frontal power increase, and thus brain activity in the left-frontal area would have increased as the motivation to re-engage manual control. In other words, during a fully automated driving malfunction, participants reported lower trust and were motivated to take-over control. These findings would only align partly with [11]. While they agree that distrust increased cognitive load due to increasing attention, vigilance and cognitive control, those from [11] observed connectivity increments directed towards the right prefrontal cortex instead.
What has been observed for distrust also seems aligned with studies exploring other constructs with certain similarities to distrust and low trust. For example, suspicion induced from a computer malware manipulation was associated with participants' increased oxygenated haemoglobin (HbO), measured with fNIRS, in Broca's area, the DLPC, the frontopolar region, and the orbitofrontal cortex and the anterior cingulate cortex [14]. These HbO variations were mainly localised in the left hemisphere, aligned with the findings discussed in the previous paragraph from [13] but opposed to those from [11]. Hirshfield and colleagues argued that suspicion is a construct related to low trust and distrust because it also leads to a higher engagement in monitoring the automation [14], as noted by [13]. Indeed, the relationship of state-level suspicion with distrust was extensively described in [15]. According to the authors, distrust in automation increases mental workload and emotional arousal. This finding is important because it shows the main difference between interpersonal and human-automation trust. Higher interpersonal trust involves social engagement, whereas higher TiA implies disengagement from the driving tasks, and hence, conversely, low trust increases engagement with the driving task [16], [17].
The use of fNIRS in the domain of TiA is still scarce but promising. Work by Palmer et al. monitored participants with fNIRS while supervising aerial and ground uncrewed vehicles under varying levels of integrity and control during a military-related experimental task [18]. Uncrewed vehicles were controlled using a visual interface allowing three levels of automated support -i.e. assisted manual, assisted automated and fully automated-and their integrity was manipulated to generate correct and incorrect behaviours. Results indicated that the uncertainty of judging the reliability of the uncrewed vehicle's abilities under assisted manual and assisted automated increased oxygenation in the orbitofrontal cortex, specifically in Brodmann Area (BA) 10, and the right DLPC in BA 46. This finding could be associated with those from [12] who found increased activation in the DLPC and the anterior prefrontal cortex during robot malfunction. As well as to those from [6], who found the orbitofrontal and anterior cingulate cortex to play a crucial role in intentional engagement and hence in the intentional decision of trusting. Both orbitofrontal and cingulate cortex are concomitant areas and could be part of the neural network of active trust judgements.
Similarly, results from Palmer et al. also found that the ventrolateral prefrontal cortex (VLPC) (BAs 44, 45, 47) could be implicated in the development of distrust due to poor decision making [18]. The VLPC is very proximal to the insular cortex, an area triggered by intense emotions such as fear, the anticipation of losses and distrust [6]. Palmer and colleagues discuss the application of their findings with trust in automated vehicles [18], but their study used a military task context and should be interpreted cautiously concerning its transferability to highly automated driving (HAD, i.e., SAE levels 3-4).
Sibi et al. compared mental workload levels derived from several automated driving modes using fNIRS in a driving simulator study [19]. They observed that the DLPC activation during lane changes performed with partially automated mode was comparable to that during a manual lane change, suggesting that partially automated driving is as cognitively demanding for drivers as manual driving. In addition, they also decided to evaluate self-reported trust from each mode of automated driving control, but the results were inconclusive.
Although EEG has been the preferred technique for exploring the neural correlates of TiA, the use of fNIRS has been increasing over the last few years, as it has become a viable solution for realistic and naturalistic setups. Arguably, they are entirely different measures of different parameters. FNIRS offers a good spatial resolution allowing the localisation of specific functional regions with montages of roughly up to 80 sources, but EEG montages with over 256 electrodes are possible. Additionally, in terms of temporal resolution fNIRS is not comparable to EEG, which is near-instantaneous (i.e. there is no lag in responses) and can have sampling rates over 20,000Hz.
Its use in human factors automotive research has also been growing, e.g., for detecting drivers' braking intentions [20], [21], mental workload [22], [23], [24], [25], attentional levels [26], responses to changing vehicle dynamics [27], inhibitory control [28], fatigue [29], and drowsiness [30]. However, none of these studies has focused on TiA yet. Hence, it remains unclear whether the findings described in this section relating to TiA would also be transferable to the HAD context and whether fNIRS would be a reliable technique for the real-time measurement of situational TiA.
Different trust levels can be induced through credibility expectations to naïve participants [10], [12], [31], [32], [33]. In other words, the automated system's performance credibility can be manipulated through previous information and induce trust/distrust in naïve participants. Thus, it could be argued that if naïve participants lack expertise with a given automated system, they would not be able to make analytic trust judgements, according to Lee & See's framework [4]. Arguably, they could only use preconceived expectations regarding automated systems, and thus affective or analogic processes would be controlling situational trust judgements in such a context. If so, experimentally inducing such expectations to naïve users should trigger either affective or analogic trust calibration processes when driving across varying scenarios.

III. RESEARCH HYPOTHESES
We designed a HAD simulator experiment to investigate TiA levels based on induced automation credibility. The experimenter verbally provided two opposing vehicle performance descriptions regarding automation credibility, which served as a grouping factor for our two groups of participants, -i.e. low credibility vs high credibility. Automation credibility was expected to induce opposing levels of TiA. However, because vehicle dynamics and performance were equal for both groups, we expected to find a trust calibration similar to that in [10] and [11] during lower traffic complexity scenarios. Traffic complexity refers to the combination of several environmental features such as traffic volume, flow and lane change presence among other road users [34].
This research aims to measure variations of situational TiA during highly automated driving, using fNIRS in a realistic high-fidelity driving simulator setup. We aim to provide a methodological basis for real-time objective measurement of situational trust in highly automated vehicles, from which further research exploring this construct should benefit. The hypotheses proposed are: 1) Trust will quickly calibrate for the low credibility group (LC) aligned with vehicle performance, and thus, no group differences for brain activity will be observed during low complexity traffic conditions. 2) As the driving scenarios become more complex and risky, participants will recalibrate their TiA. Because the only information they will have available will be the credibility expectations induced, we expect group differences during complex traffic conditions, with the LC group showing greater brain activation across the orbitofrontal, ventrolateral and dorsolateral prefrontal cortex.

IV. METHOD A. Participants
Thirty-four participants were recruited within the University of Warwick (UK) including undergraduate students, postgraduate students, university staff and other professionals. All of them held a UK-EU driving license and had no previous experience with HAD. Seven participants withdrew due to motion sickness and their data was excluded from analysis. A total of twenty-seven participants completed the trials and were included for data analysis (20 male and 7 female). Recruitment and data collection methods received approval from the Biomedical and Scientific Research Ethics Committee from the University of Warwick. Participants voluntarily agreed to participate in this experiment and were free to withdraw at any point. They all received a £10 voucher after the experiment.
Participants were randomly assigned to two groups of HAD credibility expectations. HAD performance was described to the low credibility group (N = 12) as a not-entirely-reliable, early prototype system capable of self-driving and adapting to road conditions still under development. Conversely, the HAD system was described to the HC group (N = 15) as a fully reliable HAD system, capable of driving through any scenario and adjusting to all road conditions effectively. Importantly, vehicle-driving performance was equal for both groups across all driving conditions, only induced reliability expectations were manipulated.
Eleven males and one female were randomly assigned to the LC group (11-1), whilst nine males and six females were assigned to the HC group (9-6). Consistent gender differences have not been described as a modulator of trust during driving conditions yet, and thereby there is no reason to suspect these would influence our results. Participants were mostly aged between 18 and 35 years old (85.19%). Twenty were students, and seven were professionals or in managerial roles. The distribution per group was: LC = 10 students + 2 professional/ managerial; HC = 10 students + 5 professional/managerial. Despite their young age, participants were relatively experienced, with seventeen of them (63%) holding a driving license for more than six years, and thirteen of them (48%) driving an average of more than 10k miles a year. Both groups were instructed not to attempt to take control of the vehicle under any circumstances to generate the vulnerability required for TiA [2], [4].

B. Apparatus
The trials were conducted using the 3xD driving simulator at the University of Warwick Fig. 1. The 3xD is a fixed-base high-fidelity driving simulator, equipped with a whole-body Range Rover Evoque and eight projectors generating a 360 • image, projected into a cylindrical screen eight meters in diameter and three meters in height (for technical details see [35]). The simulated driving automation is capable of lateral and longitudinal control, adapting to speed limits, queuing leading vehicles, maintaining safe distances, emergency braking, and overtaking slower/stopped vehicles for predefined use cases, and also generated road motion vibration through the seats and environmental sound with the in-vehicle sound system.
Neurophysiological data was obtained from the prefrontal cortex with a NIRSport CW-NIRS device (NIRx Medical Technologies LLC, USA) (Fig. 2). Data were extracted using NIRStar acquisition software (CA, USA; version 15.0). NIRSport is a non-invasive wearable device consisting of eight sources and seven detectors sampling at a frequency of 7.8125 Hz. The sources simultaneously emit infrared signals of two distinct wavelengths, 760 nm and 850 nm, allowing quantification of oxygenated  haemoglobin (HbO), deoxygenated haemoglobin (HbR), and total haemoglobin (HbT = HbO + HbR). Both chromophores can be differentiated when light attenuation is measured at two or more wavelengths due to their differential absorption spectra in the near-infrared spectrum (600-950 nm).
Plastic spacers located at a distance of 3 cm between each source and detector pair constitute a recording channel, thus resulting in 22 recording channels. Channels were mounted within the Montreal Neurological Institute (MNI) coordinate space for consistency across head size variation [36]. These coordinates allow subsets of fNIRS channels down to those directly measuring particular regions of interest (ROIs) (TABLE I).
Self-reported trust was collected using the Trust in Automated Systems Scale [37]. This scale is comprised of 12 items with a 7-point Likert scale. Items 1 to 5 assess the construct of distrust, and items 6 to 12 assess trust. A total score can also be obtained by reverse scoring those items corresponding to distrust. This is an established scale widely used in research to measure operators' trust in automated systems [38], [39], [40].

C. Automated Driving Scenarios
Drivers during HAD are expected to be engaged in nondriving related tasks (NDRTs) since monitoring the driving task is not required during predefined HAD use cases. Hence, our first experimental condition involved performing a verbal 2-back task for 2 minutes. A 2-back is a working memory task involving speech and is well established for generating mental workload. Previous neuroergonomics studies have used fNIRS to measure mental workload elicited by an N-back task, both in flight and driving simulators [41], [42], [43]; as well as human factors studies measuring mental workload with other physiological devices [44], [45], [46], [47]. This NDRT was carried out while the highly automated vehicle (HAV) was driving across a highway scenario, and we expect this condition to be a control condition for mental workload, as the 2-back task should not affect TiA (Fig. 3).
HAD use cases will entail a wide range of different road layouts and traffic conditions. Each scenario involves different vehicle dynamics which could increase the driver's mental workload when assessing actual vehicle performance against expected automated system reliability. Such compensating behaviours are well known to reallocate cognitive resources depending on road layout changes [27]. Similarly, traffic conditions and road type cause higher stress. It was found that urban scenarios generated higher stress compared to highwaymotorway driving [23], [47], [48]. Higher stress in such road layouts is influenced by the increased amount of contextual information and stimuli to process, which require cognitive resources allocation for increased attention and monitoring [23]. Indeed, traffic and road complexity have been found modulators of trust in the context of automated driving [49]. Vehicle users showed higher trust in an automated system interface displaying recognised traffic objects in augmented reality. Participants were more confident when they could ensure the vehicle was fully aware of the situation on their behalf. Since situational TiA may calibrate according to contextual changes [2], [50], it could be expected that high traffic density and urban environments will affect TiA in line with the credibility assigned to each group, and this would be observable with fNIRS (see Fig. 4 for a summary of this process).
HAVs will have to cope with unexpected events resulting from other road users when driving across complex traffic conditions on busy urban roads, prone to hazardous situations. The rationale of including a risky event in our simulated driving scenarios lies in that HAV users also perceive such scenarios as potentially risky [33], [51], and risk perception plays a crucial role in the calibration of TiA [2], [4]. According to [33]: "A highly reliable automated system or a driver's trust in the system will mitigate the perceived relational risk level  even with a high level of situational risk on the road. On the other hand, if a driver distrusts the system, the perceived relational risk level will be high no matter whether situational risk is present (p. 181)." With this in mind, we designed a risky scenario (Fig. 3) where the HAV follows a van, and immediately after a left bend, both encounter a cyclist and proceed to overtake while approaching a junction with the right-of-way. Right after the van passes the junction, and while the HAV overtakes the cyclist, an ambulance with emergency lights and a siren moves into view at high speed from the left side of the junction. The HAV performs an emergency braking and evasive manoeuvre to avoid crashing against the ambulance, and immediately after, a police vehicle follows the ambulance, so the HAV has to brake again.
As discussed in hypothesis 2, if participants trust the automated system's capabilities, they should be less vigilant and engaged with the driving task than those who distrust. If so, we should expect a reduced cognitive load for the HC group and increased brain oxygenation across the pre-frontal cortex for the LC group during a risky scenario.

D. Procedure
Upon their arrival, participants were guided into the simulator control room, briefed on lab safety procedures and advised to follow the experimenter's instructions at all times. Consent forms and demographics questionnaires were filled in the week before the trial, so participants only had to complete the first TiA scale at the start of the experiment. Participants were instructed on the 2-back task and performed a short practice session. After the 2-back training, they were guided inside the driving simulator and asked to remain seated in the driver's seat while the NIRSport headband was attached to their forehead without causing pain or discomfort. They were instructed to be particularly careful not to apply any pressure to the sensors or stretch the cables to avoid signal spikes and artefacts. Driving simulator lights were switched off to achieve optimal signal quality during calibration. The signal was calibrated using the NIRStar acquisition software (version 15.0) until achieving excellent quality from all channels. Following, we recorded participants' baseline for 2 minutes with the lights switched off and without projecting the driving scenario.
Participants started with a 5 minutes familiarisation trial consisting of driving manually across empty rural roads, which minimised motion sickness impact [52]. Participants were instructed to drive cautiously to gain familiarity and up to 20 mph, respecting UK Highway Code rules. The vehicle had an automatic gearbox, so they only used the accelerator, brakes and steering wheel. The manual driving trial eventually led to a roundabout connecting to a highway. Here participants were instructed to engage in automated driving by pressing a button on the centre console after hearing the appropriate audio cue.
Experimental scenarios began once HAD was engaged. Two minutes after engaging HAD, participants heard an audio cue announcing they were about to start a 2-back task and providing the instructions concerning the task again. This was the first experimental condition and lasted four sets of 30 seconds each. After performing the 2-back, the highway HAD scenario continued for five more minutes until reaching a highway exit. A two-minute epoch was extracted from this period forming the second experimental condition, namely highway scenario. The vehicle stopped at a red traffic light in the highway exit roundabout.
At this point, the simulation paused as longer exposures to driving simulators tend to increase the risk of simulator sickness [52]. Participants left the vehicle and went into the control room to fill in the second TiA scale. The signal was calibrated once again before resuming the subsequent scenarios.
Upon resuming, the scenario began with HAD engaged from the same stopping point and leading to an interurban drive with low traffic complexity for 2 minutes -i.e. third experimental condition. After this, the vehicle entered the suburbs, where traffic complexity slightly increased throughout the scenario -fourth experimental condition. Two minutes later, traffic density increased, leading to a 2 minute city centre scenario -fifth experimental condition. The experiment ended with the HAV performing an evasive manoeuvre, the risk scenario. After this, participants left the driving simulator and filled in the third TiA scale.

E. Data Pre-Processing
Raw fNIRS data were pre-processed using HomER 3 [53] scripts running on MATLAB R2019a (Mathworks Inc.) and followed the current recommendations for pre-processing fNIRS data [54] (IV-F). For fNIRS current best practices and publication guidelines, see Yücel et al. [55]. Corrected optical density data were then converted to HbO, HbR and HbT concentrations using the modified Beer-Lambert law. Once calculated optical density concentrations, data was block averaged and exported as Hemodynamic Response Function (HRF) means.

F. Data Analysis
Block averaged HbO and HbR values from HomER 3 were exported in excel files containing HRF means for each channel, condition and participant. The underlying ROIs were determined using the NIRS Brain AnalyzIR toolbox [56] to calculate the corresponding anatomical labels for each position. The toolbox creates a variable that lists the channels and BAs covered by the probe and relative 'weights' for each channel and BA. The weights for each BA add up to 1. The channel with the most sensitivity to a BA has the highest weight for that area. The relative weight is a helpful metric, but it does not give the complete picture, so we also extracted a 'depth' value for each channel and BA. Depth values represent the distance on average between the channel and the BA -i.e., the further the distance, the lower the likelihood that the channel captures that BA. Therefore, we selected up to three channels accounting for at least a combined relative weight of 0.80 (i.e. covering at least 80% of a particular ROI) and for the lowest combined depth value (i.e. the smallest combined distance on average).
The rationale for not averaging all channels together with a relative weight greater than 0 for a given BA is that some of these values are far too low, and if too many channels are averaged together, the response will be negated. Following [57], we established averaging together only up to 3 channels. The most sensitive channels of each ROI were grouped. This led to 10 ROIs: Bilateral BAs 08, 09, 10 and 46, and left BA44 and 45 (Fig. 5). Having grouped the relevant channels into ROIs, values were averaged within each ROI for each experimental condition. That results in seven means (one per experimental condition) per participant for each ROI and each chromophore (TABLE I). Each single mean concentration value was then transformed into Z-scores (M = 0; SD = 1) against the mean group baseline value and its standard deviation (i.e., Z = (X -baseline mean) / baseline SD) to enable interindividual and intra-individual comparisons (see TABLE III for details). Data standardisation is a common procedure among fNIRS studies to allow for inter-individual comparisons in parametrical statistical analysis using block averaged values [29], [41], [43], [58], [59], [60], [61].
The General Linear Model is the standard approach for analysing and interpreting hemodynamic responses [54], [62].  Among the range of possibilities this approach offers, the wellknown analysis of variance (ANOVA) is a common technique to determine localised brain activation based on changes in simultaneous HbO and HbR concentrations in repeated measures designs [63]. Although it is common in the related literature to report only HbO, HbR or HbT -i.e. the combination of both-the hemodynamic is a bi-dimensional response and both chromophores, HbO and HbR, usually correlate negatively during brain stimulation. The rationale underlying this correlation is that increased blood flow produces an increase in oxygenated haemoglobin and a decrease in deoxygenated haemoglobin [64], [65], [66], [67]. Nonetheless, since these features may not necessarily be always reciprocal, several authors have argued that interpretations based exclusively on one chromophore would be incomplete and advocate in favour of reporting both features in tandem [68], [69], [70]. Therefore, following these recommendations, we will perform 2 (low credibility/high credibility) by 7 (baseline, 2-back, highway, interurban, suburbs, city centre, and risk) mixed ANOVAs to determine changes in haemoglobin concentrations on each HbO, HbR and HbT mean HRF concentrations grouped in ROIs were imported and analysed with IBM SPSS Statistics 26 software. The Shapiro-Wilk's test (p ≥ 0.05) was used to assess normality assumption violations, and Mauchly's test was used to assess the assumption of sphericity. Thus, mixed ANOVAs were conducted for each ROI individually (i.e. BAs 8,9,10 and 46 bilateral, plus BAs 44 and 45 on the left hemisphere). Main effects and interactions were followed-up by pair-wise comparisons corrected by Bonferroni.

V. RESULTS
This experiment investigated variations in TiA during highly automated driving by inducing two opposing automation credibility expectations within our groups of participants. Participants sat in the driving simulator and experienced different scenarios during the trial (see Fig. 3 for details). We expected that these credibility expectations would inversely affect their trust calibration when being driven across complex traffic conditions and would trigger different neural responses for each group. The Trust in Automated Systems Scale was rated three times during the experiment (i.e. pre-, mid-and after the driving trial) and was used to explore whether credibility expectations had the hypothesised effect on self-reported TiA. These ratings were then used to infer the neural correlates of TiA from each credibility group using the data collected with fNIRS.
These findings indicate that TiA levels were aligned with credibility as expected. Distrust increased for the LC group whilst Total trust increased for the HC group. Henceforth, in the next section, we will discuss how brain activity within the HC group could be associated with trust, whilst that from the LC group could be inferred to distrust.

1) Oxygenated Haemoglobin Concentrations (HbO): HbO levels varied between participants
Similar effects between groups were also observed in the right DLPC (i.e., BA46-R) revealing higher HbO concentrations for the LC group (1.488 ± 1.572, p = 0.003) during the risk event compared to the HC group (−0.765 ± 1.732) (Fig. 7). A group by condition interaction supported these findings (TABLE V).
The main results from this experiment can be summarised as follows: • The Low Credibility group (LC) reported higher Distrust and lower Total trust (Fig. 6). • The LC group showed a greater brain oxygen metabolism than the HC group towards variations among the driving scenarios ( Fig. 7 and Fig. 8). • The LC group showed increased brain oxygen metabolism during the complex driving scenarios (i.e., inter-urban, city centre and risk) ( Fig. 7 and Fig. 8). • The High Credibility (HC) group reported a higher Total trust (Fig. 6). • The HC group showed significantly lower brain oxygen metabolism during the risk event ( Fig. 7 and Fig. 8).

VI. DISCUSSION
This empirical research measured different levels of trust in highly automated driving (HAD) between two groups of participants with induced opposing automation credibility expectations for simulated driving scenarios with varying traffic complexities. We expected that inducing low automation credibility (LC) would increase participants' distrust whilst high credibility (HC) would increase trust. Assuming that trust and distrust affect drivers' monitoring and engagement with the driving task, we hypothesised that under simpler traffic conditions, trust in automation (TiA) would calibrate equally for both groups, and no main differences between brain activity in the prefrontal cortex would be observed. However, under more complex and challenging traffic conditions, TiA would recalibrate according to the credibility expectations initially induced, thus the LC group would be more engaged with the driving task than the HC group, consequently showing greater brain activity across the prefrontal cortex.

A. Hypothesis 1 -Calibration of TiA
Our first hypothesis predicted that trust would initially calibrate in line with actual vehicle performance (equal for both groups) in low traffic conditions, and thus, brain activity would not differ between groups during the 2-back, highway, interurban and suburban conditions. Self-reports indicated that initial TiA did not calibrate according to vehicle performance as in [10], and that participants reported trust/distrust levels according to the credibility expectations initially induced as in [31], [32], and [33]. That is, during the mid-study pause, the LC group was reporting a significant distrust increase. On the contrary, the HC group reported increased Total trust scores by the mid-study pause. These results are informative regarding initial TiA calibration since they were taken after participants had experienced the highway scenario which included performing the 2-back, cognitive workload inducing task. Considering that vehicle performance was equal for both groups, and that highway was a simplified driving layout, we expected the LC group to calibrate their TiA according to the vehicle performance, but they did not. Instead, they reported distrusting even though the vehicle was driving reliably. Perhaps participants judged the vehicle reliability upon their pre-existing expectations instead of the actual driving performance, which could be inferred as an analogic trust judgement [4].
Such self-reported distrust was supported by unique variations in the right orbitofrontal (i.e., BA10-R) for the LC group. This finding within the LC group was consistent among all three chromophores in two different driving scenarios (i.e., suburbs and city centre), thus suggesting the right orbitofrontal might be involved in assessing the driving context to calibrate trust, as suggested in previous work [12], [18]. That being the case, this would indicate incremented monitoring towards the changes in the driving environment for distrusting participants.
Aligned with our hypothesis, no group effects in haemoglobin concentrations were observed for highways and suburbs. However, between-subjects effects were observed in the left VLPC (BA44) during 2-back for HbO levels (Fig. 7). The increment observed for the LC group could be attributed to meeting the task demands, as found in previous research [42]. However, for the HC group, HbO levels decreased unexpectedly. Because we did not compare participants' performance, potentially a difference in performance might explain this phenomenon. Nonetheless, we ensured that participants from both groups were engaged in the 2-back task by verbally encouraging them to continue with the task. Some authors from fMRI research have associated this mirrored trend with neural suppression and blood flow redistribution during task execution from the reallocation of cognitive processing resources, also known as the "steal effect" [68], [69], [71]. Remarkably, this mirrored trend occurred in Broca's area, mainly known for language processing [69], during a verbal working memory task. Hence, the observed localised deactivation in BA44 during the 2-back task for the HC group might be due to a reallocation of cognitive resources towards other neural regions rather than a signal of poor task performance. A similar reallocation effect was also found in [13] during a highly automated vehicle (HAV) malfunction. The authors reported a reduction of right frontal brain activity deriving from a lateralised left-frontal increase, which would have increased due to the motivation to re-engage manual control and distrust.
In addition, group differences during the interurban driving scenario for HbO and HbT concentrations in BA44 and HbR concentrations in BA10-R were not aligned with our hypothesis either. These findings are likely indicative of the greater engagement in the driving task resulting from the distrust among the LC group. Indeed, [18] also found that increased HbO in the VLPC (including BA44) could be implicated in the development of distrust because of poor decision-making, as earlier noted by [7]. This area has also been associated with suspicion during computer malfunctions [14], deliberate deception -lying- [72], frustration during automated driving [73], and even a predictor of emotional valence levels in a previous fNIRS study [74]. Furthermore, BA44 is anatomically proximal to the insular cortex, a region triggered by intense negative emotions, fear and anticipation of losses, associated with distrust [6], [7].
Overall, these results suggest that situational TiA did not calibrate according to vehicle performance for the LC group. Hence, the situational TiA calibration process was strongly biased by the initial distrust induced by poor expectations of reliability. Given the strong emotional component of distrust, it could be argued that this calibration was possibly framed on the affective process of trust calibration described in [4].
In contrast, the high credibility (HC) expectations provided to the other group matched with the actual vehicle performance, thus increasing trust and reducing the engagement with the driving task. Similar findings were also observed in [12] with human collaborative robots. Our participants in the HC group based their trust calibration on the heuristics (i.e., the mental model concerning the HAV reliability) generated by the vehicle capabilities provided, thus indicating an analogic process of trust calibration according to [4].

B. Hypothesis 2 -Recalibration of TiA
The second hypothesis predicted a recalibration of TiA as driving scenarios become more complex and hazardous. Thus, group differences were expected, particularly during the city centre and risk conditions. Substantial evidence in favour of this hypothesis was found in self-reported data. Towards the end of the experiment, distrust had significantly increased for the LC group compared to the pre-and mid-study stages. This agrees with previous studies which have also provided information regarding the driving automation as an independent variable to manipulate TiA [10], [31], [32], [33], [75], [76].
Cortical haemodynamic concentrations were also in favour of our hypothesis. Increased oxygenated haemoglobin (HbO) concentrations in the orbitofrontal cortex (i.e., BA10 right) have been associated with the uncertainty of judging the credibility of an uncrewed vehicle [18]; as well as during unreliable conditions of human-robot collaboration [12]. The orbitofrontal and anterior cingulate cortex have been found to play a critical role in intentional engagement [6]. Hence suggesting that our participants in the LC group were possibly judging the credibility of the driving automation, calibrating their TiA, and maybe even intending to take over manual control during city centre and risk scenarios. This statement would agree not only with self-reported distrust (Fig. 6) but also with the variations observed exclusively within this group from baseline to city centre, indicating an increase in brain activity (↑HbO and ↑HbT) in BA10-right, possibly due to the uncertainty generated by increased traffic complexity ( Fig. 7 and Fig. 8). HbR results coupled with these trends show aligned variations in deoxygenated haemoglobin between suburbs and city centre scenarios.
Another argument favouring this hypothesis was found in haemodynamic concentrations for the LC group in the VLPC (BA44). This group reported significantly greater HbO and HbT concentrations in this area during the risk scenario ( Fig. 7 and Fig. 8). This finding would strongly agree with the broader literature linking this area with distrust [6], [7], [18] and intense negative emotions [6], [14], [72], [73].
Finally, a lateralised activation (↑HbO and ↑HbT) in the right DLPC (BA09-R and BA46-R) was also observed in the LC group, only during the risk scenario (Fig. 7, and  Fig. 8). This seems to agree with those findings from [18], who found increased HbO in the right DLPC, and particularly in BA46, when participants were judging the credibility of the vehicle's abilities under assisted manual and assisted automated control. The DLPC was also found more active under low robot reliability conditions along with lower perceptions of trust [12]. Relatedly, [7] associated the DLPC with reflective processes and deliberate decision-making during the evaluation of trustworthiness. The right DLPC has also been critical for visuospatial working memory, visuomotor mapping and vigilance while driving [19], [27], [77]. In particular, [27] found bilateral DLPC increases in HbO during incongruent vehicle dynamics, thus, supporting the role of the right DLPFC in judging vehicle performance and possibly the calibration of situational TiA. These findings highlight the active role of the DLPC in situational TiA calibration and judging the contextual reliability of the HAV.
Overall, these findings align with Hypothesis 2 with the LC group showing higher activity in both the DLPC and VLPC during the risk scenario when reported the highest ratings for distrust. Both areas are predictors of emotional valence levels in a previous fNIRS study [74]. This would agree with the broader literature in that distrust is quick and episodic (i.e., event-related) and linked to emotional brain mechanisms [5], [6], [7]. This could result from an affective decision-making process made upon the strong emotional cues generated by distrust. Even though these participants had no reason to distrust the HAV -as it proved reliable across the scenariosthey kept distrusting.
On the contrary, the HC group seemed to follow the same trend reported in Hypothesis 1 -i.e. actively trusting the HAV as the traffic context was becoming more complex, thus suggesting their trust calibration relied on heuristics. Once they observed that the vehicle performance matched their credibility expectations, they disengaged from the driving task. This was particularly evident during the risk scenario were this group showed minimal cognitive workload overall as indicated by fNIRS data.
In summary, this research presents the first contribution to measure situational TiA under HAD in a realistic driving simulator setup. The three mental processes for trust calibration described in [4] acknowledged the importance of emotional cues in distrust judgements. As shown, these cues can bias trust judgements irrespectively of the actual HAV reliability. This finding is significant since affective and analogic judgements are prone to inappropriate behaviours like automation misuse due to overtrust or disuse due to distrust [78]. Because affective and analogic judgements do not rely on the actual knowledge of system limitations, capabilities and driving performance under specific contexts, but instead are led by feelings, beliefs or impressions.
The phenomenon known as "autonowashing" [79] refers to the usage of misleading terminology to describe current automated driving technology, exaggerating the actual capabilities of such systems. With these regards, research must focus on better understanding trust in automation and the reliance behaviours of automated vehicle users to ensure this technology is safer than manual driving. Driver state monitoring systems supporting safe take-over transitions and actively preventing automation misuse will be necessary. fNIRS stands out for offering specific advantages to become a vital tool for driving research in neuroergonomics [70]. This knowledge will help develop and integrate future AI-based driver state monitoring and infotainment systems [32], [80].

VII. CONCLUSION
We expect further related work will benefit from this knowledge since it provides a considerable research methodology to assess TiA in real-time and objectively. Neurophysiology has the potential to become the longed-for objective measure of TiA. Notwithstanding, accurate self-reported tools are required to interpret neurophysiological data, and in this case, the scale used may have limited this. Although established and widely used, some authors have criticised this scale as not an accurate measure of situational trust but rather a propensity or disposition to trust [1], [81]. In addition, the experimental design with conditions in a fixed order, a relatively small sample and an fNIRS montage covering only the pre-frontal cortex may have limited our findings. Future work should consider counterbalanced or Latin-Square experimental designs where practical in a simulated environment. In addition, a dropout rate of roughly 25% in driving simulator experiments due to motion sickness should be considered when recruiting the sample [52]. Ultimately, we recommend further research should include montage set-ups covering the temporal, parietal and occipital cortices.
These results expand our existing knowledge in the following areas: • Provide supporting evidence of two separate neural processes for trust and distrust. • Where distrust is event-related and strongly tied to affective mechanisms, trust seems to decrease monitoring and working memory. • Thus supporting the view that TiA and situation awareness are strongly related during driving automation usage [82], [83]. Considering these results as a whole, orbitofrontal, ventrolateral and dorsolateral prefrontal cortex structures are the most promising areas responsible for shaping part of the neural network responsible for situational TiA in HAD.