Fixing robots, part 2
Pascal Cuoq - 20th May 2011This post follows that post.
For cases 2 and 3 in order to get more complete execution traces we again take advantage of Frama_C_dump_each(). Before launching the analysis we insert a call to this primitive right at the end of RoCo_process() so that local variables of that function will be displayed too.
--- roco.c.orig 2011-05-16 13:53:57.000000000 +0200
+++ roco.c 2011-05-16 13:55:18.000000000 +0200
@@ -303 4 +303 5 @@
else {
RoCo_engineVoltage = 0.0;
}
+ Frama_C_dump_each();
}
The static analysis done by the previously provided shell script (key command below) again confirms that the execution does not run into any of the C99 undefined behaviors that the value analysis detects (uninitialized access use of a dangling pointer overflows in signed integer arithmetics invalid memory access invalid comparison of pointers division by zero undefined logical shift overflows in conversions from floating-point to integer infinite or NaN resulting from a floating-point operation undefined side-effects in expressions). Yes I will now give the full list every time. One year ago I stumbled on a comparative study between PolySpace and Frama-C's value analysis that gave the full list of alarms for PolySpace but listed only two types of undefined behaviors for the value analysis. The author had lazily copy-pasted a sentence from the documentation's introduction but had crucially omitted the ellipsis. As a consequence I will now inflict the full list on everyone every time. Bwa-ha-ha. Also I plan to add three more categories of detected undefined behaviors in the near future. Bwa-ha-ha-ha-ha.
frama-c ${FILES} ${TESTCASE} -val -slevel 50000 -unspecified-access -val-signed-overflow-alarms -calldeps -save state$1
Case 2
The trace corroborates the "does not stop jiggling when it hits the target angle" diagnostic. We concentrate on relevant lines of the log using the command:
grep "\\(STATE\\|RoCo_legAngle \\|RoCo_engineVoltage\\|lastTime\\)" log2
Here is the bit when the target for the first command is reached:
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -29.9791361754
RoCo_engineVoltage ∈ -0.700730737571
lastTime ∈ {66520}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.0013178639
RoCo_engineVoltage ∈ -0.573709804476
lastTime ∈ {66540}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.0235995355
RoCo_engineVoltage ∈ -0.469713860267
lastTime ∈ {66560}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.0454486754
RoCo_engineVoltage ∈ -0.384569182548
lastTime ∈ {66580}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.0664935071
RoCo_engineVoltage ∈ -0.314858616438
lastTime ∈ {66600}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.0864858403
RoCo_engineVoltage ∈ -0.257784432149
lastTime ∈ {66620}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.1052716542
RoCo_engineVoltage ∈ 0.07897475281
lastTime ∈ {66640}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.1227678881
RoCo_engineVoltage ∈ 0.354689853917
lastTime ∈ {66660}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.1389442061
RoCo_engineVoltage ∈ 0.580426286282
lastTime ∈ {66680}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.1538087376
RoCo_engineVoltage ∈ 0.765243645549
lastTime ∈ {66700}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.1673969897
RoCo_engineVoltage ∈ 0.916559301284
lastTime ∈ {66720}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.1791116186
RoCo_engineVoltage ∈ 1.04044608206
lastTime ∈ {66740}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.1885670036
RoCo_engineVoltage ∈ 1.14187599937
lastTime ∈ {66760}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.195537956
RoCo_engineVoltage ∈ 1.22491979196
lastTime ∈ {66780}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.1999193366
RoCo_engineVoltage ∈ 1.29291029881
lastTime ∈ {66800}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.2016944051
RoCo_engineVoltage ∈ 1.34857621768
lastTime ∈ {66820}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -30.2009101399
RoCo_engineVoltage ∈ 1.39415161736
lastTime ∈ {66840}
...
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -29.9951370606
RoCo_engineVoltage ∈ 0.46939308602
lastTime ∈ {67120}
...
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ -29.8107877453
RoCo_engineVoltage ∈ -1.35491972386
lastTime ∈ {67400}
...
The voltage is still negative when the target angle of -30 is reached. The voltage becomes positive around the angle of -30.1 but the leg is still moving down at that time. We attribute this phenomenon to inertia. The angle reaches about -30.2 before it finally stops decreasing and starts increasing again.
Using the same method as in case 1 it is possible to identify some variables that influence the choice of value for Roco_Enginevoltage. This allows us to determine a number of independent changes that each fix the problem.
The first possible change is to make the robot leg move back more slowly when it has overshot its destination. This fixes the issue according to the Python script provided to be used as an oracle.
--- roco_config_testcase_2.h~ 2011-03-23 10:44:24.000000000 +0100 +++ roco_config_testcase_2.h 2011-05-19 20:15:56.000000000 +0200 @@ -35 7 +35 7 @@ real64 RoCo_maxAngle_PARAM = 90.0; real64 RoCo_minAngle_PARAM = -90.0; real64 RoCo_shutdownTimeout_PARAM = 12.0; -real64 RoCo_stepSpeed_PARAM = 10.0; +real64 RoCo_stepSpeed_PARAM = 5.0; real64 RoCo_TempFltT_PARAM = 0.1; real64 RoCo_TimeSlopeNeg_PARAM = 1.0; real64 RoCo_TimeSlopePos_PARAM = 1.0;
Another possible fix is to make the robot accept a larger error margin when deciding whether it is far enough of the intended angle to justify moving back.
--- roco_config_testcase_2.h~ 2011-03-23 10:44:24.000000000 +0100 +++ roco_config_testcase_2.h 2011-05-19 20:18:12.000000000 +0200 @@ -11 7 +11 7 @@ real64 RoCo_angleAtMark_PARAM = 25.0; real64 RoCo_angleReachedThreshold1_PARAM = 2.0; real64 RoCo_angleReachedThreshold1Fast_PARAM = 4.0; -real64 RoCo_angleReachedThreshold2_PARAM = 0.1; +real64 RoCo_angleReachedThreshold2_PARAM = 0.3; real64 RoCo_batteryLowLimit_PARAM = 15.0; real64 RoCo_batteryLowDelay_PARAM = 60.0; boolean RoCo_checkBatteryVoltage_PARAM = TRUE;
Finally my two personal favorite fixes are below. They are (most probably) completely wrong but they do fix the problem according to the Python script oracle. The first makes the voltage filter so strong that the oscillations become less visible to the script:
--- roco_config_testcase_2.h~ 2011-03-23 10:44:24.000000000 +0100 +++ roco_config_testcase_2.h 2011-05-19 20:19:10.000000000 +0200 @@ -39 7 +39 7 @@ real64 RoCo_TempFltT_PARAM = 0.1; real64 RoCo_TimeSlopeNeg_PARAM = 1.0; real64 RoCo_TimeSlopePos_PARAM = 1.0; -real64 RoCo_voltageFilter_PARAM = 0.1; +real64 RoCo_voltageFilter_PARAM = 0.7; real64 Engine_maxVoltage_PARAM = 18.0; real64 Engine_minVoltage_PARAM = -18.0;
My other favorite fix makes the voltage filter so weak that the motor no longer overshoots systematically when trying to correct its position.
--- roco_config_testcase_2.h~ 2011-03-23 10:44:24.000000000 +0100 +++ roco_config_testcase_2.h 2011-05-19 20:20:35.000000000 +0200 @@ -39 7 +39 7 @@ real64 RoCo_TempFltT_PARAM = 0.1; real64 RoCo_TimeSlopeNeg_PARAM = 1.0; real64 RoCo_TimeSlopePos_PARAM = 1.0; -real64 RoCo_voltageFilter_PARAM = 0.1; +real64 RoCo_voltageFilter_PARAM = 0.02; real64 Engine_maxVoltage_PARAM = 18.0; real64 Engine_minVoltage_PARAM = -18.0;
"Favorite" does not mean that I would recommend either of these two incompatible fixes. The first two seem more reasonable and less likely to break something else.
Case 3
According to the challenge description we should use our "tacit knowledge gained from several years of employment at RobotControllers.com". Since we have not really been employed at RobotControllers.com for several years it is only fair to use the tacit knowledge encoded in the provided acceptance Python script. This is probably what the organizers meant. Looking at the script we see that the robot leg destruction is caused by a sudden variation in the voltage applied to the motor. In practice the motor when carried away by the inertia of the mechanism it is supposed to set in motion becomes a generator. The difference between the voltage this generator provides and the voltage set by the environment should never be too large. A well-designed modern power supply can take the hit but a large voltage difference also means large intensities going through the motors' coils. The motor turns into a generator but it doesn't turn into a well-protected one.
The relevant part of the log is extracted as follows:
grep "\\(STATE\\|RoCo_legAngle \\|RoCo_engineVoltage\\|lastTime\\)" log3
...
[value] DUMPING STATE of file roco.c line 306
RoCo_engineVoltage ∈ -6.93042056043
lastTime ∈ {109960}
rampValue ∈ 0.998
[value] DUMPING STATE of file roco.c line 306
RoCo_engineVoltage ∈ -6.94303317304
lastTime ∈ {109980}
rampValue ∈ 1.
[value] DUMPING STATE of file roco.c line 306
RoCo_engineVoltage ∈ -4.41559004995
lastTime ∈ {110000}
rampValue ∈ 1.
[value] DUMPING STATE of file roco.c line 306
RoCo_engineVoltage ∈ -2.34629463843
lastTime ∈ {110020}
rampValue ∈ 1.
[value] DUMPING STATE of file roco.c line 306
RoCo_engineVoltage ∈ -0.652098847809
lastTime ∈ {110040}
rampValue ∈ 1.
[value] DUMPING STATE of file roco.c line 306
RoCo_engineVoltage ∈ 0.734991347706
lastTime ∈ {110060}
rampValue ∈ 1.
[value] DUMPING STATE of file roco.c line 306
RoCo_engineVoltage ∈ 1.87064474807
lastTime ∈ {110080}
I have included variable rampValue above because as one can see by exploring the code in the GUI as for case 1 in addition to the voltage filter the program has a mechanism to make the voltage increase slowly when transitioning from idle to active or from active to idle. Slice for the statement that computes the value of rampValue to see this mechanism more clearly for yourself: the command frama-c-gui -load state3 can be used to load the values previously computed. In the statement's contextual menu first select "Enable slicing" and then "Slicing → Slice stmt".
The log extract above shows that unfortunately this mechanism does not smooth the voltage when a new order arrives in the middle of the execution of the current order. Here the motor has just ramped out to full speed (rampValue==1.0) when the command to move in the opposite direction arrives at lastTime==110000. The value of rampValue remains pegged at 1.0 during the transition. This could be considered to be the bug that causes the voltage jump.
Apart from the sleazy way we obtain the numerical limit by reverse-engineering the python script it is impossible to argue against the inclusion of the patch below. Although it is written without understanding the root cause of the problem it exactly prevents a situation that is never desirable the destruction of the robot.
--- challenge/src/roco.c.orig 2011-05-16 13:53:57.000000000 +0200
+++ ppc/tests/ICPC/src/roco_sol3.c 2011-05-02 16:35:52.000000000 +0200
@@ -293 8 +293 17 @@
desiredEngineVoltage Engine_maxVoltage_PARAM);
limitationActive = (Engine_maxVoltage_PARAM == desiredEngineVoltage) ||
(Engine_minVoltage_PARAM == desiredEngineVoltage);
- RoCo_engineVoltage = PT1_Filter (&voltageFilter desiredEngineVoltage
- t13 dT);
+
+ { real64 tentativeV = PT1_Filter (&voltageFilter desiredEngineVoltage
+ t13 dT);
+ if (tentativeV - RoCo_engineVoltage > 0.390625)
+ RoCo_engineVoltage += 0.390625;
+ else if (tentativeV - RoCo_engineVoltage < -0.390625)
+ RoCo_engineVoltage -= 0.390625;
+ else
+ RoCo_engineVoltage = tentativeV;
+ }
+
wasInit = init;
wasActive = RoCo_isActive;
Timer_tick (&shutdownTimer dT);
Even if/when the root cause of the problem is fixed this patch should remain in place for the sake of defensive programming. It exactly prevents the voltage jumps that cause the destruction of the motor and has almost no practical effects otherwise so it does not hurt.
The best fix would perhaps be to do a ramp down followed by a ramp up when receiving in the middle of the execution of an order another order that causes a change in direction. This seems like a major change to the logic of the program so we won't attempt it and we will instead rely on the patch above to prevent destruction of the robot leg.
After applying the patch above we run and analyze testcase 3 again and we obtain another complaint from the validation script eval.py provided by the organizers:
src $ make && ./roco_3.exe 2> trace3 && python ../eval.py 3 < trace3 make: Nothing to be done for `all'. 149100: TOO SLOW / TARGET NOT REACHED! found 1 error(s)
To look on the bright side of life the robot didn't explode.
The analysis log again conveniently provides all variables' values in a single (if longish) step:
$ grep "\\(RoCo_engineVoltage\\|lastTime\\|STATE\\|rampValue\\|legAngle \\)" log3
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ 19.970696807
RoCo_engineVoltage ∈ 6.00755510873
lastTime ∈ {115740}
rampValue ∈ 0.874
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ 20.0821939081
RoCo_engineVoltage ∈ 5.99058545861
lastTime ∈ {115760}
rampValue ∈ 0.872
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ 20.1933858392
RoCo_engineVoltage ∈ 5.9736158
lastTime ∈ {115780}
rampValue ∈ 0.87
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ 20.304272565
RoCo_engineVoltage ∈ 5.95664613443
lastTime ∈ {115800}
rampValue ∈ 0.868
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ 20.4148540546
RoCo_engineVoltage ∈ 5.93967646317
lastTime ∈ {115820}
rampValue ∈ 0.866
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ 20.5251302809
RoCo_engineVoltage ∈ 5.92270678725
lastTime ∈ {115840}
rampValue ∈ 0.864
...
As illustrated in the log snippet above the robot leg overshoots the target position. This can be attributed to the value of variable rampValue being too high and specifically decreasing too slowly. Since there is now a piece of code to limit voltage jumps we can increase the speed at which this variable varies without fear of destroying the robot. In fact for all we know this variable was set to vary slowly in a misguided attempt to fix the explosion problem.
Since parameters RoCo_TimeSlopePos_PARAM and RoCo_TimeSlopeNeg_PARAM influence directly the computation of rampValue we decide to adjust these:
--- roco_config_testcase_3.h 2011-03-23 11:07:48.000000000 +0100 +++ roco_config_testcase_3_sol3.h 2011-05-02 16:35:52.000000000 +0200 @@ -37 8 +37 8 @@ real64 RoCo_shutdownTimeout_PARAM = 12.0; real64 RoCo_stepSpeed_PARAM = 8.0; real64 RoCo_TempFltT_PARAM = 0.15; -real64 RoCo_TimeSlopeNeg_PARAM = 0.1; -real64 RoCo_TimeSlopePos_PARAM = 0.1; +real64 RoCo_TimeSlopeNeg_PARAM = 0.5; +real64 RoCo_TimeSlopePos_PARAM = 0.5; real64 RoCo_voltageFilter_PARAM = 0.1; real64 Engine_maxVoltage_PARAM = 15.0;
$ make && ./roco_3.exe 2> trace3 && python ../eval.py 3 < trace3 ... found 0 error(s)
The ReadMe.txt reports that the field engineers also complained about voltage variations at time t=50. Since the voltage is perfectly stable around t=50s we can interpret this part of their report as caused by the hallucinogenic fumes famously released by burning electronics. Perhaps they mean t=66140ms when indeed the voltage can be seen to drop a bit sharply in the original execution trace. The reason why it drops sharply is clear on this selection of variable values:
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ 12.3506194964
RoCo_engineVoltage ∈ 4.50997019947
lastTime ∈ {66120}
rampValue ∈ 0.614
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ 12.4293066254
RoCo_engineVoltage ∈ 4.52254504977
lastTime ∈ {66140}
rampValue ∈ 0.616
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ 12.5082201017
RoCo_engineVoltage ∈ 3.90576827098
lastTime ∈ {66160}
rampValue ∈ {0}
[value] DUMPING STATE of file roco.c line 306
RoCo_legAngle ∈ 12.5873599253
RoCo_engineVoltage ∈ 3.4007941544
lastTime ∈ {66180}
rampValue ∈ {0}
Variable rampValue drops from 0.616 to 0.0 and this cannot even be excused by the arrival of an order at that instant. It seems the ramping up/down part of the program might as well be thrown away and re-written from scratch.
Looking at values for instants 66140 and 66160 side-by-side it becomes clear what changes and causes rampValue to jump to 0. 
The variable rampTarget changes from 1.0 to 0.0.
We could execute cycle 66160 in a debugger but that would be banal. Instead let us use a trick to observe that cycle on its own in Frama-C's GUI. We replace the line RoCo_process(); in main_testcase_3.c with:
if (lastTime==66160) RoCo_process_66160(); else RoCo_process();
We then copy-paste the body of function RoCo_process() as the definition of an identical function RoCo_process_66160() and launch again the analysis. Yes this means waiting ten minutes again. You have to make sacrifices to be cool. This maneuver allows us to observe cycle 66160 in the GUI unmixed with the others with a lot more code displayed in orange (indicating that this code was not executed at that cycle and can be ignored) and values that are specific for that cycle.
Listing the differences between this approach and using a debugger is left as an exercise for the reader. This inspection mostly confirms that the value 0.0 seems normal for variable rampTarget and that it should be function Ramp_out() that makes sure that rampValue varies slowly even when rampTarget jumps around. Note in passing that the duplication of function RoCo_process() does not help to see what happens inside function Ramp_out() at cycle 66160. But you are probably growing a little restless so I won't suggest to launch an analysis again. We can perfectly well infer what happens inside function Ramp_out() by reading the code. Doing that we notice the tests below.
if (data->dir == -1) {
data->state = target;
}
...
if (data->dir == 1) {
data->state = target;
}
This doesn't make sense. There is never any reason to jump directly to the target. We suppress these lines and the voltage jump around 66160 disappears.


