Fixing robots, part 2

Pascal Cuoq - 20th May 2011

This post follows that post.

For cases 2 and 3 in order to get more complete execution traces we again take advantage of Frama_C_dump_each(). Before launching the analysis we insert a call to this primitive right at the end of RoCo_process() so that local variables of that function will be displayed too.

--- roco.c.orig	2011-05-16 13:53:57.000000000 +0200 
+++ roco.c	2011-05-16 13:55:18.000000000 +0200 
@@ -303 4 +303 5 @@ 
     else { 
     	RoCo_engineVoltage = 0.0; 
     } 
+    Frama_C_dump_each(); 
 } 

The static analysis done by the previously provided shell script (key command below) again confirms that the execution does not run into any of the C99 undefined behaviors that the value analysis detects (uninitialized access use of a dangling pointer overflows in signed integer arithmetics invalid memory access invalid comparison of pointers division by zero undefined logical shift overflows in conversions from floating-point to integer infinite or NaN resulting from a floating-point operation undefined side-effects in expressions). Yes I will now give the full list every time. One year ago I stumbled on a comparative study between PolySpace and Frama-C's value analysis that gave the full list of alarms for PolySpace but listed only two types of undefined behaviors for the value analysis. The author had lazily copy-pasted a sentence from the documentation's introduction but had crucially omitted the ellipsis. As a consequence I will now inflict the full list on everyone every time. Bwa-ha-ha. Also I plan to add three more categories of detected undefined behaviors in the near future. Bwa-ha-ha-ha-ha.

frama-c ${FILES} ${TESTCASE} -val -slevel 50000 -unspecified-access -val-signed-overflow-alarms -calldeps -save state$1 

Case 2

The trace corroborates the "does not stop jiggling when it hits the target angle" diagnostic. We concentrate on relevant lines of the log using the command:

grep "\\(STATE\\|RoCo_legAngle \\|RoCo_engineVoltage\\|lastTime\\)" log2 

Here is the bit when the target for the first command is reached:

[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -29.9791361754 
        RoCo_engineVoltage ∈ -0.700730737571 
        lastTime ∈ {66520} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.0013178639 
        RoCo_engineVoltage ∈ -0.573709804476 
        lastTime ∈ {66540} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.0235995355 
        RoCo_engineVoltage ∈ -0.469713860267 
        lastTime ∈ {66560} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.0454486754 
        RoCo_engineVoltage ∈ -0.384569182548 
        lastTime ∈ {66580} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.0664935071 
        RoCo_engineVoltage ∈ -0.314858616438 
        lastTime ∈ {66600} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.0864858403 
        RoCo_engineVoltage ∈ -0.257784432149 
        lastTime ∈ {66620} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.1052716542 
        RoCo_engineVoltage ∈ 0.07897475281 
        lastTime ∈ {66640} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.1227678881 
        RoCo_engineVoltage ∈ 0.354689853917 
        lastTime ∈ {66660} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.1389442061 
        RoCo_engineVoltage ∈ 0.580426286282 
        lastTime ∈ {66680} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.1538087376 
        RoCo_engineVoltage ∈ 0.765243645549 
        lastTime ∈ {66700} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.1673969897 
        RoCo_engineVoltage ∈ 0.916559301284 
        lastTime ∈ {66720} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.1791116186 
        RoCo_engineVoltage ∈ 1.04044608206 
        lastTime ∈ {66740} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.1885670036 
        RoCo_engineVoltage ∈ 1.14187599937 
        lastTime ∈ {66760} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.195537956 
        RoCo_engineVoltage ∈ 1.22491979196 
        lastTime ∈ {66780} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.1999193366 
        RoCo_engineVoltage ∈ 1.29291029881 
        lastTime ∈ {66800} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.2016944051 
        RoCo_engineVoltage ∈ 1.34857621768 
        lastTime ∈ {66820} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -30.2009101399 
        RoCo_engineVoltage ∈ 1.39415161736 
        lastTime ∈ {66840} 
... 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -29.9951370606 
        RoCo_engineVoltage ∈ 0.46939308602 
        lastTime ∈ {67120} 
... 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ -29.8107877453 
        RoCo_engineVoltage ∈ -1.35491972386 
        lastTime ∈ {67400} 
... 

The voltage is still negative when the target angle of -30 is reached. The voltage becomes positive around the angle of -30.1 but the leg is still moving down at that time. We attribute this phenomenon to inertia. The angle reaches about -30.2 before it finally stops decreasing and starts increasing again.

Using the same method as in case 1 it is possible to identify some variables that influence the choice of value for Roco_Enginevoltage. This allows us to determine a number of independent changes that each fix the problem.

The first possible change is to make the robot leg move back more slowly when it has overshot its destination. This fixes the issue according to the Python script provided to be used as an oracle.

--- roco_config_testcase_2.h~	2011-03-23 10:44:24.000000000 +0100 
+++ roco_config_testcase_2.h	2011-05-19 20:15:56.000000000 +0200 
@@ -35 7 +35 7 @@ 
 real64 RoCo_maxAngle_PARAM = 90.0; 
 real64 RoCo_minAngle_PARAM = -90.0; 
 real64 RoCo_shutdownTimeout_PARAM = 12.0; 
-real64 RoCo_stepSpeed_PARAM = 10.0; 
+real64 RoCo_stepSpeed_PARAM = 5.0; 
 real64 RoCo_TempFltT_PARAM = 0.1; 
 real64 RoCo_TimeSlopeNeg_PARAM = 1.0; 
 real64 RoCo_TimeSlopePos_PARAM = 1.0; 

Another possible fix is to make the robot accept a larger error margin when deciding whether it is far enough of the intended angle to justify moving back.

--- roco_config_testcase_2.h~	2011-03-23 10:44:24.000000000 +0100 
+++ roco_config_testcase_2.h	2011-05-19 20:18:12.000000000 +0200 
@@ -11 7 +11 7 @@ 
 real64 RoCo_angleAtMark_PARAM = 25.0; 
 real64 RoCo_angleReachedThreshold1_PARAM = 2.0; 
 real64 RoCo_angleReachedThreshold1Fast_PARAM = 4.0; 
-real64 RoCo_angleReachedThreshold2_PARAM = 0.1; 
+real64 RoCo_angleReachedThreshold2_PARAM = 0.3; 
 real64 RoCo_batteryLowLimit_PARAM = 15.0; 
 real64 RoCo_batteryLowDelay_PARAM = 60.0; 
 boolean RoCo_checkBatteryVoltage_PARAM = TRUE; 

Finally my two personal favorite fixes are below. They are (most probably) completely wrong but they do fix the problem according to the Python script oracle. The first makes the voltage filter so strong that the oscillations become less visible to the script:

--- roco_config_testcase_2.h~	2011-03-23 10:44:24.000000000 +0100 
+++ roco_config_testcase_2.h	2011-05-19 20:19:10.000000000 +0200 
@@ -39 7 +39 7 @@ 
 real64 RoCo_TempFltT_PARAM = 0.1; 
 real64 RoCo_TimeSlopeNeg_PARAM = 1.0; 
 real64 RoCo_TimeSlopePos_PARAM = 1.0; 
-real64 RoCo_voltageFilter_PARAM = 0.1; 
+real64 RoCo_voltageFilter_PARAM = 0.7; 
 real64 Engine_maxVoltage_PARAM = 18.0; 
 real64 Engine_minVoltage_PARAM = -18.0; 

My other favorite fix makes the voltage filter so weak that the motor no longer overshoots systematically when trying to correct its position.

--- roco_config_testcase_2.h~	2011-03-23 10:44:24.000000000 +0100 
+++ roco_config_testcase_2.h	2011-05-19 20:20:35.000000000 +0200 
@@ -39 7 +39 7 @@ 
 real64 RoCo_TempFltT_PARAM = 0.1; 
 real64 RoCo_TimeSlopeNeg_PARAM = 1.0; 
 real64 RoCo_TimeSlopePos_PARAM = 1.0; 
-real64 RoCo_voltageFilter_PARAM = 0.1; 
+real64 RoCo_voltageFilter_PARAM = 0.02; 
 real64 Engine_maxVoltage_PARAM = 18.0; 
 real64 Engine_minVoltage_PARAM = -18.0; 

"Favorite" does not mean that I would recommend either of these two incompatible fixes. The first two seem more reasonable and less likely to break something else.

Case 3

According to the challenge description we should use our "tacit knowledge gained from several years of employment at RobotControllers.com". Since we have not really been employed at RobotControllers.com for several years it is only fair to use the tacit knowledge encoded in the provided acceptance Python script. This is probably what the organizers meant. Looking at the script we see that the robot leg destruction is caused by a sudden variation in the voltage applied to the motor. In practice the motor when carried away by the inertia of the mechanism it is supposed to set in motion becomes a generator. The difference between the voltage this generator provides and the voltage set by the environment should never be too large. A well-designed modern power supply can take the hit but a large voltage difference also means large intensities going through the motors' coils. The motor turns into a generator but it doesn't turn into a well-protected one.

The relevant part of the log is extracted as follows:

grep "\\(STATE\\|RoCo_legAngle \\|RoCo_engineVoltage\\|lastTime\\)" log3 
... 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_engineVoltage ∈ -6.93042056043 
        lastTime ∈ {109960} 
        rampValue ∈ 0.998 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_engineVoltage ∈ -6.94303317304 
        lastTime ∈ {109980} 
        rampValue ∈ 1. 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_engineVoltage ∈ -4.41559004995 
        lastTime ∈ {110000} 
        rampValue ∈ 1. 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_engineVoltage ∈ -2.34629463843 
        lastTime ∈ {110020} 
        rampValue ∈ 1. 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_engineVoltage ∈ -0.652098847809 
        lastTime ∈ {110040} 
        rampValue ∈ 1. 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_engineVoltage ∈ 0.734991347706 
        lastTime ∈ {110060} 
        rampValue ∈ 1. 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_engineVoltage ∈ 1.87064474807 
        lastTime ∈ {110080} 

I have included variable rampValue above because as one can see by exploring the code in the GUI as for case 1 in addition to the voltage filter the program has a mechanism to make the voltage increase slowly when transitioning from idle to active or from active to idle. Slice for the statement that computes the value of rampValue to see this mechanism more clearly for yourself: the command frama-c-gui -load state3 can be used to load the values previously computed. In the statement's contextual menu first select "Enable slicing" and then "Slicing → Slice stmt".

slice_rampValue.png

The log extract above shows that unfortunately this mechanism does not smooth the voltage when a new order arrives in the middle of the execution of the current order. Here the motor has just ramped out to full speed (rampValue==1.0) when the command to move in the opposite direction arrives at lastTime==110000. The value of rampValue remains pegged at 1.0 during the transition. This could be considered to be the bug that causes the voltage jump.

Apart from the sleazy way we obtain the numerical limit by reverse-engineering the python script it is impossible to argue against the inclusion of the patch below. Although it is written without understanding the root cause of the problem it exactly prevents a situation that is never desirable the destruction of the robot.

--- challenge/src/roco.c.orig	2011-05-16 13:53:57.000000000 +0200 
+++ ppc/tests/ICPC/src/roco_sol3.c	2011-05-02 16:35:52.000000000 +0200 
@@ -293 8 +293 17 @@ 
                 desiredEngineVoltage  Engine_maxVoltage_PARAM); 
         limitationActive = (Engine_maxVoltage_PARAM == desiredEngineVoltage) || 
                              (Engine_minVoltage_PARAM == desiredEngineVoltage); 
-        RoCo_engineVoltage = PT1_Filter (&voltageFilter  desiredEngineVoltage  
-                t13  dT); 
+ 
+	{ real64 tentativeV = PT1_Filter (&voltageFilter  desiredEngineVoltage  
+					  t13  dT); 
+	  if (tentativeV - RoCo_engineVoltage > 0.390625) 
+	    RoCo_engineVoltage += 0.390625; 
+	  else if (tentativeV - RoCo_engineVoltage < -0.390625) 
+	    RoCo_engineVoltage -= 0.390625; 
+	  else  
+	    RoCo_engineVoltage = tentativeV; 
+	} 
+ 
         wasInit = init; 
         wasActive = RoCo_isActive; 
         Timer_tick (&shutdownTimer  dT); 

Even if/when the root cause of the problem is fixed this patch should remain in place for the sake of defensive programming. It exactly prevents the voltage jumps that cause the destruction of the motor and has almost no practical effects otherwise so it does not hurt.

The best fix would perhaps be to do a ramp down followed by a ramp up when receiving in the middle of the execution of an order another order that causes a change in direction. This seems like a major change to the logic of the program so we won't attempt it and we will instead rely on the patch above to prevent destruction of the robot leg.

After applying the patch above we run and analyze testcase 3 again and we obtain another complaint from the validation script eval.py provided by the organizers:

src $ make && ./roco_3.exe 2> trace3 && python ../eval.py 3 < trace3 
make: Nothing to be done for `all'. 
149100: TOO SLOW / TARGET NOT REACHED! 
found 1 error(s) 

To look on the bright side of life the robot didn't explode.

The analysis log again conveniently provides all variables' values in a single (if longish) step:

$ grep "\\(RoCo_engineVoltage\\|lastTime\\|STATE\\|rampValue\\|legAngle \\)" log3  
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ 19.970696807 
        RoCo_engineVoltage ∈ 6.00755510873 
        lastTime ∈ {115740} 
        rampValue ∈ 0.874 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ 20.0821939081 
        RoCo_engineVoltage ∈ 5.99058545861 
        lastTime ∈ {115760} 
        rampValue ∈ 0.872 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ 20.1933858392 
        RoCo_engineVoltage ∈ 5.9736158 
        lastTime ∈ {115780} 
        rampValue ∈ 0.87 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ 20.304272565 
        RoCo_engineVoltage ∈ 5.95664613443 
        lastTime ∈ {115800} 
        rampValue ∈ 0.868 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ 20.4148540546 
        RoCo_engineVoltage ∈ 5.93967646317 
        lastTime ∈ {115820} 
        rampValue ∈ 0.866 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ 20.5251302809 
        RoCo_engineVoltage ∈ 5.92270678725 
        lastTime ∈ {115840} 
        rampValue ∈ 0.864 
... 

As illustrated in the log snippet above the robot leg overshoots the target position. This can be attributed to the value of variable rampValue being too high and specifically decreasing too slowly. Since there is now a piece of code to limit voltage jumps we can increase the speed at which this variable varies without fear of destroying the robot. In fact for all we know this variable was set to vary slowly in a misguided attempt to fix the explosion problem.

Since parameters RoCo_TimeSlopePos_PARAM and RoCo_TimeSlopeNeg_PARAM influence directly the computation of rampValue we decide to adjust these:

--- roco_config_testcase_3.h	2011-03-23 11:07:48.000000000 +0100 
+++ roco_config_testcase_3_sol3.h	2011-05-02 16:35:52.000000000 +0200 
@@ -37 8 +37 8 @@ 
 real64 RoCo_shutdownTimeout_PARAM = 12.0; 
 real64 RoCo_stepSpeed_PARAM = 8.0; 
 real64 RoCo_TempFltT_PARAM = 0.15; 
-real64 RoCo_TimeSlopeNeg_PARAM = 0.1; 
-real64 RoCo_TimeSlopePos_PARAM = 0.1; 
+real64 RoCo_TimeSlopeNeg_PARAM = 0.5; 
+real64 RoCo_TimeSlopePos_PARAM = 0.5; 
 real64 RoCo_voltageFilter_PARAM = 0.1; 
 real64 Engine_maxVoltage_PARAM = 15.0; 
$ make && ./roco_3.exe 2> trace3 && python ../eval.py 3 < trace3  
... 
found 0 error(s) 

The ReadMe.txt reports that the field engineers also complained about voltage variations at time t=50. Since the voltage is perfectly stable around t=50s we can interpret this part of their report as caused by the hallucinogenic fumes famously released by burning electronics. Perhaps they mean t=66140ms when indeed the voltage can be seen to drop a bit sharply in the original execution trace. The reason why it drops sharply is clear on this selection of variable values:

[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ 12.3506194964 
        RoCo_engineVoltage ∈ 4.50997019947 
        lastTime ∈ {66120} 
        rampValue ∈ 0.614 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ 12.4293066254 
        RoCo_engineVoltage ∈ 4.52254504977 
        lastTime ∈ {66140} 
        rampValue ∈ 0.616 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ 12.5082201017 
        RoCo_engineVoltage ∈ 3.90576827098 
        lastTime ∈ {66160} 
        rampValue ∈ {0} 
[value] DUMPING STATE of file roco.c line 306 
        RoCo_legAngle ∈ 12.5873599253 
        RoCo_engineVoltage ∈ 3.4007941544 
        lastTime ∈ {66180} 
        rampValue ∈ {0} 

Variable rampValue drops from 0.616 to 0.0 and this cannot even be excused by the arrival of an order at that instant. It seems the ramping up/down part of the program might as well be thrown away and re-written from scratch.

Looking at values for instants 66140 and 66160 side-by-side it becomes clear what changes and causes rampValue to jump to 0. 66140.png

The variable rampTarget changes from 1.0 to 0.0.

We could execute cycle 66160 in a debugger but that would be banal. Instead let us use a trick to observe that cycle on its own in Frama-C's GUI. We replace the line RoCo_process(); in main_testcase_3.c with:

                if (lastTime==66160) RoCo_process_66160(); else RoCo_process(); 

We then copy-paste the body of function RoCo_process() as the definition of an identical function RoCo_process_66160() and launch again the analysis. Yes this means waiting ten minutes again. You have to make sacrifices to be cool. This maneuver allows us to observe cycle 66160 in the GUI unmixed with the others with a lot more code displayed in orange (indicating that this code was not executed at that cycle and can be ignored) and values that are specific for that cycle.

gui_66160.png

Listing the differences between this approach and using a debugger is left as an exercise for the reader. This inspection mostly confirms that the value 0.0 seems normal for variable rampTarget and that it should be function Ramp_out() that makes sure that rampValue varies slowly even when rampTarget jumps around. Note in passing that the duplication of function RoCo_process() does not help to see what happens inside function Ramp_out() at cycle 66160. But you are probably growing a little restless so I won't suggest to launch an analysis again. We can perfectly well infer what happens inside function Ramp_out() by reading the code. Doing that we notice the tests below.

  if (data->dir == -1) { 
    data->state = target; 
  } 
   ... 
  if (data->dir == 1) { 
    data->state = target; 
  } 

This doesn't make sense. There is never any reason to jump directly to the target. We suppress these lines and the voltage jump around 66160 disappears.

Pascal Cuoq
20th May 2011