Automated Repair of Feature Interaction Failures in Automated Driving Systems

Automated Repair of Feature Interaction
Failures in Automated Driving Systems
Raja Ben Abdessalem, Annibale Panichella, Shiva Nejati,
Lionel C. Briand, and Thomas Stifter
!1

Automated Driving Systems
Trafﬁc Sign Recognition (TSR)
Pedestrian Protection (PP) Lane Departure Warning (LDW)
!2
Automated Emergency Braking (AEB)

Feature Interactions
Sensors /
Camera
Autonomous
Feature
Actuator
Braking (over time)
!3
Sensors /
Camera
Autonomous
Feature
Actuator
Sensors /
Camera
Autonomous
Feature
Actuator
.
.
.
30 % 20 % … 80 %
Acceleration (over time)
60 % 10 % … 20 %
Steering (over time)
30 % 20 % … 80 %
(Deep Learning)
(Neural Net.)
(K-means)

Integration Components
!4
Pedestrian
Protection
(PP)
Autom. Emerg.
Braking
(AEB)
Lane Dep.
Warning
(LDW)
The integration is a rule set:
each condition checks a
speciﬁc feature interaction
situation and resolves
potential conﬂicts that may
arise under that condition

Testing Automated Driving Systems
!5
Testing on-the-road
!
Simulation-based Testing

Simulation-Based Test Case
Simulator
(Matlab/Simulink)
Test Input
Test Output
!6
Software
Under Test
(SUT)

Case Study
• Two case study systems from IEE (industrial partner)
• Designed by experts
• Manually tested for more than six months
• Different rules to integrated feature actuator commands
• 700K eLOC
• Two system-level test suites (≈30 min) with failing tests
• Both systems consist of four self-driving features
• ACC, AEB, TSR, PP
!7

Feature Interactions Failures
!8
Stop

Program Repair
!9
C. Le Goues et al. TSE 2012 Martinez and Monperrus, ISSTA 2016

Genetic Programming
!10
Patch
Selection
Faulty
Program
GP
Patch
Evaluation
Variants
Generation
Test Suite
Potential patches
are generated
using crossover
(AST cuts) and
mutation (AST
changes)
Run the entire test
suite against each
generated patch
The patches with a
lower number of failing
test cases survive

Genetic Programming
!11
Implicit Assumptions:
• One-defect assumption
• The patches require ﬁle line
changes
• Inexpensive test suites (a few
seconds)
• No guiding heuristics (a test
either fails or passes)
Automated Driving Systems:
• Multiple defects in different
locations
• Up to 100 lines to changes
• Each test suite requires 30 min
• Not all failures are equal (the
intensity of the violation changes)

ARIEL
Automated Repair of IntEgration
ruLes for ADS
12

ARIEL
ARIEL is a (1+1) Evolutionary Algorithm with an Archive
!13
ICSE ’20, May 23-29, 2020, Seoul, South Korea
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
e then
(1)
t have
failed
passed
assing
to the
failing
):
(2)
some
is the
t fails.
signed
s they
Algorithm 1: ARIEL
Input:
(f1, . . . , fn, ): Faulty self-driving system
TS: Test suite
Result: ⇤: a repaired rule-set satisfying all tc 2 TS
1 begin
2 Archive
3 RUN-EVALUATE( , TS)
4 while not(|Archive|==1 & Archive satises all tc 2 TS) do
5 p SELECT-A-PARENT(Archive) // Random selection
6 o GENERATE-PATCH( p, TS, )
7 RUN-EVALUATE( o, TS)
8 Archive UPDATE-ARCHIVE(Archive , o, )
9 return Archive
localization (Equation 1) and (2) mutating the rule set in p. The
routine GENERATE-PATCH is presented in subsection 3.2.1.
Then, the ospring o is evaluated (line 7) by running the test
suite TS, extracting the remaining failures, and computing their
corresponding objective scores ( ). Note that the severities of the
failures are our search objectives to optimize and are discussed in
Section 3.2.3. The ospring o is added to the archive (line 8 of Al-
gorithm 1) if it decreases the severity of the failures compared to the
patches currently stored in the archive. The archive and its updating
Archive
Run the faulty program and
computes the failures
intensities (search-objectives)
Generate only one patch
through customized fault
localization and mutation
Add the offspring to the archive
if it is better than the archive
for at least one failure
(1 parent + 1 offspring)

Customized Fault Localization
!14
FL formulae measures the suspicious (likely faulty) statements
in the production code based on the number of failing tests
te by wtc the weight (severity) of the failure of tc. We then
pute the suspiciousness of each statement s as follows:
Susp(s) =
Õ
tc2T Sf
[wtc ·co (tc,s)]
Õ
tc2T Sf
wtc
passed(s)
total_passed +
f ailed(s)
total_f ailed
(1)
e passed(s) counts the number of passed test cases that have
uted s at some time step; f ailed(s) counts the number of failed
ases that have executed s at some time step; and total_passed
otal_f ailed denote the total numbers of failing and passing
cases, respectively. Note that Equation 1 is equivalent to the
dard Tarantula formula if we let the weight (severity) for failing
cases be equal to one (i.e., if wtc = 1 for every tc 2 TSf ):
Susp(s) =
f ailed(s)
total_f ailed
passed(s)
total_passed +
f ailed(s)
total_f ailed
(2)
r each test case tc that fails at time step u and violates some
irement r, we dene wtc = |O(tc(u),r)|. That is, wtc is the
ee of violation caused by tc at the time step u when it fails.
ce, test cases that lead to more severe violations are assigned
r weights. Note that since we stop test cases as soon as they
each test case can violate at most one requirement.
Program Repair
Algorithm 1: ARIEL
Input:
(f1, . . . , fn, ): Faulty self-driving system
TS: Test suite
Result: ⇤: a repaired rule-set satisfying all tc 2
1 begin
2 Archive
3 RUN-EVALUATE( , TS)
4 while not(|Archive|==1 Archive satises al
5 p SELECT-A-PARENT(Archive)
6 o GENERATE-PATCH( p, TS,
7 RUN-EVALUATE( o, TS)
8 Archive UPDATE-ARCHIVE(Archi
9 return Archive
localization (Equation 1) and (2) mutatin
routine GENERATE-PATCH is presented
Then, the ospring o is evaluated (lin
suite TS, extracting the remaining failur
corresponding objective scores ( ). Note
failures are our search objectives to optim
Section 3.2.3. The ospring o is added to
gorithm 1) if it decreases the severity of the
patches currently stored in the archive. Th
routine are described in details in subsecti
when the termination criteria are met (se
Tarantula [Jones et al. 2002]
Suspicious statements are covered
by failing tests mostly
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
Automated Repair of Integration Rules in Automated Driving Systems
denote by wtc the weight (severity) of the failure of tc. We then
compute the suspiciousness of each statement s as follows:
Susp(s) =
Õ
tc2T Sf
[wtc ·co (tc,s)]
Õ
tc2T Sf
wtc
passed(s)
total_passed +
f ailed(s)
total_f ailed
(1)
where passed(s) counts the number of passed test cases that have
executed s at some time step; f ailed(s) counts the number of failed
test cases that have executed s at some time step; and total_passed
and total_f ailed denote the total numbers of failing and passing
test cases, respectively. Note that Equation 1 is equivalent to the
standard Tarantula formula if we let the weight (severity) for failing
test cases be equal to one (i.e., if wtc = 1 for every tc 2 TSf ):
f ailed(s)
Our formula
Failing tests have weights that are
proportional to the severity of the failures

Customized Mutation
!15
Potential patches are generated using only two operators:
• Changing the thresholds in the rules (e.g., minimum distance between cars)
• Shifting conditions within rule sets (changing the priorities of the checks/rules)
• No deletion (legal and ethical constraints) Anon.
727
728
729
730
731
732
733
734
735
736
737
738
739
Figure 5: Illustrating the shift operator: (a) selecting bs and
path , and (b) applying the shift operator.

Setting
!17
Benchmark:
• SafeDrive1 and SafeDrive2 from our industrial partner
Baselines:
• Genetic Programming (GP)
• Random Search (RS)
Parameters:
• GP with population size of 10 patches
• Search time = 16h
• 50 repetitions

Results
!18
SelfDrive1
#FailingTests
0
1
2
3
4
Time(h)
0 2 4 6 8 10 12 14 16
GP
ARIEL
Random
SelfDrive2
#FailingTests
0
0,5
1
1,5
2
Time(h)
0 2 4 6 8 10 12 14 16
GP
ARIEL
Random

Feedback From Domain Experts
!19
• We interviewed software engineers involved in the development of
AutoDrive1 and AutoDrive2
• ARIEL produces patches that differ from patches developers would
write manually (developers would add more integration rules)
• According to the developers, the patches generated by ARIEL are
valid, understandable, useful and optimal. Besides, they cannot be
produced by engineers
Synthesized patches are superior to manually-written
patches based on expert judgements

Automated Repair of Feature Interaction Failures in Automated Driving Systems

Recommended

More Related Content

What's hot (20)

Similar to Automated Repair of Feature Interaction Failures in Automated Driving Systems (20)

More from Lionel Briand (20)

Recently uploaded (20)

Automated Repair of Feature Interaction Failures in Automated Driving Systems