Proton BASIC Compiler - Floating point issues and alternatives

• Pic® Basic • Floating point issues and alternatives

Contributed by John Drew

Level: Intermediate

The following description applies to all platforms although there is reference to Proton Development System when it is helpful to do so. Limitations in floating point arithmetic apply to ALL platforms; differences in the degree of accuracy are mostly a result of the number of bytes allocated for storage of the floating point number. In most microcontrollers with their limited memory and speed we may have up to 32 bit storage for floating point while in a Desktop there may be 64 bit storage, although this varies depending on the language. In PDS, floats are stored in 32 bits (4 bytes). Note: most modern desktop languages use single and double precision decimal signed numbers instead of floats.

Floating point numbers are known as irrational numbers, whereas 1/10 or 2/3 with a numerator and a denominator are rational numbers (a ratio of two integers). This page is about the storage and use of irrational numbers and the limitations of doing this.

There are many ways to display a number using decimal notation. Floating point numbers for humans use a sequence of numerals with a decimal point to indicate place value. For many of us, the transition is shown with a . although some countries use a ,. Some examples of floating point numbers are 3.14 or 0.017 or 234.0696 and so on. The position of the decimal point floats to inform the reader of the transition between units and tenths.

Computers need to store numbers in a standard way so that they may be read by different machines. To understand how this is done it is useful to show floating point numbers alongside their scientific notation equivalent.

Floating point notation / Scientific notation

3.14 becomes 3.14 * 100 (note 100 equates to 1)

31.4 becomes 3.14 * 101

3140 becomes 3.14 * 103

0.0314 becomes 3.14 * 10-2

Picking up clues from the scientific notation way of doing things it can be seen that it should be possible to store a number using just a series of numerals eg 314 (the significand or mantissa), then a further number eg +1 (the exponent including its sign) that tells where to place the decimal point to create the float of 31.4. To cater for negative numbers we would also need to store whether the number is  or +.

Floating point notation / Possible computer storage

3.14 becomes +314 (significand) and -2 (exponent)

31.4 becomes +314 and -1

3140 becomes +314 and +1

0.0314 becomes +314 and -4

In the PDS help file, Les shows us how this is done in the system we use. Just 4 bytes are used to store:

a) The sign (one bit of the 32 available)

b) The mantissa (or significand) without a decimal point ( 23 bits of the 32)

c) The exponent (8 bits that provide the information on where to put the decimal point)

For more detail read the Help file under Floating Point Numbers or refer to this excellent reference in Wikipedia (http://en.wikipedia.org/wiki/Floating_point).

As you can see from above there are just 23 binary bits to store the number. The maximum number that can be stored in 23 bits is 8,388,607 (2^23-1). In the IEE754 Standard there is an implicit bit, so effectively the maximum number becomes 16,777,215 (2^24-1).

The IEEE754 single precision standard has the following structure:
1 sign bit, 8 exponent bits, 23 significand bits, for a total of 32 bits.
The exponent bias is 127, precision bits are 24 (because of the implicit bit) and number of decimal digits ~7.

8 bit Proton uses a Microchip modified form of the IEEE754 standard and the implicit bit may not be implemented, therefore the maximum significand may be 8,388,607.

In summary, when using an 8 bit chip with Proton the accuracy of floating point should be considered <=7 digits.

Proton 24 single precision maths uses true IEEE754 and a full 24 bit significand (mantissa) is available so will be slightly more accurate than the implementation in Proton. I am unsure if the implicit bit is implemented.

Better still, use the double precision 64 bit maths in Proton24. The more efficient device and 64 bit floating point makes for a formidable solution when extra precision is required.

When using floating point, remember that many numbers do not have an exact binary equivalent. Well known examples include 1/3 or 0.1 or pi. What looks simple for humans may not be so for the machine, for example the square of 0.1 should calculate as 0.01 but instead results in 0.009999999776 in a 4 byte system. If you test for equality to 0.01 the test would fail.

In a 4 byte system, under most circumstances the result of a computation is rounded to 7 digits.
With 4 byte floats, only 7 digit precision can be expected, so consider the following example:

123456.7 (This number is stored as accurately as possible in 4 bytes)

+ 101.7654 (so is this one)

123558.4654 (which is rounded off to 123558.5 and so loses the last 3 digits)

Another problem is the difficulty in expressing some rational numbers as a float. Consider 1/3 which you would expect as 0.333333333 (recurring). In real life, 32 bit floats will tend to store a number closer to 0.333333310. This is because after the 7th decimal place, we run out of bits to put the threes in, so the result is abruptly cut off, leading to an incorrect answer.

Subtraction of close numbers can generate significant errors, similarly multiplication and division calculations that lead to very large or very small results MAY be unusable. Trigonometry functions of numbers that cannot be represented exactly such as pi will not be accurate eg sine (pi) which should equal 0 will compute as a small negative number. Tan (pi/2) will compute in single precision C language as -22877332.0 instead of infinity.

Even rules that we take to be a fundamental truth such as (a + b)+c =a+(b+c) may not be true in floating point practice because of the rounding that occurs and the need to represent numbers in a practical number of bytes.

So what can we do about these inaccuracies?

1. Keep operations one to a line so that Z=sin A * sin B appears as
Code:
X = Sin A
Y = Cos A
Z = X * Y
2. Test for shortcuts. For example the sine of an angle approximates the angle (in radians) for small values.
Code:
If A <=  0.1 Then
X = A
Else
X = Sin A
EndIf
3. Float tests may be inaccurate so use X <= Y or X >= Y. Rather than test for X = Y, test for a small gap. For example If Y - X <= 0.000001 then do something is a reasonable test for equality.
4. Understand that if you use floats the accuracy may be poor. You should especially check for values at the limits where one number is large and the other small. Make use of ISIS. Even the demo version is very useful.
5. Never use floating point maths for financial calculations. Always use integer arithmetic.
6. Significant figures should normally be taken into account when doing calculations. If data1 is accurate to 5 significant figures and you multiply it by data2 that is accurate to just 2 significant figures, then the result is only valid to 2 significant figures. For example 2.3 * 18.234 shows as 41.9382 on a calculator but should only be printed to the maximum of a rounded 2 significant figures, that is, 42 (rounded).
7. If a possibility of divide by zero, check using something like this If X <= 0.000001 then Result = 999999.9 or whatever is acceptable to your program.
8. Dont assume rounding works in a particular way. There are two common schemes of rounding, the major difference being for negative numbers. In Proton a float is rounded using fRound to the nearest integer. Eg 144.3>144, 0.6>1, 1.1>1, -0.6>-1, -0.3>0, -2.3>-2. On the other hand if you assign a float value to an integer it is truncated e.g. 3.9 becomes 3
9. If you need the fractional part of a floating point number in Proton turn off rounding, assign the value of the float to an integer large enough to accommodate the likely range, and then subtract the integer from the float.
Code:
_FP_FLAGS = 0    ' Disable Rounding
WordVar = FloatVar
_FP_FLAGS = 64     ' Enable Rounding
Float_FractionalPart = FloatVar  WordVar

Whenever possible use integer arithmetic
This is 100% accurate providing you work within the limits of the type. Choose an appropriate integer variable type that will accommodate the range you want to cover. Eg a Byte for values from 0 to 255, Word 0 to 65535, signed Dword from -2147483648 to +2147483647 or unsigned Dword to 4294967296.

For example if you are sending data (with two numerals after the decimal point) over a serial link you might choose to first multiply each number by 100, convert it to an integer and send it. At the receiving end you may choose to manipulate the data as a Word type in the PIC® and then convert it to a float for display by dividing by 100 to print the result in its original 2 decimal places form.
Print At 1, 1, DEC2 Result
There are examples at the end of this document.

Alternatively, imagine the number to be sent was 28.32, firstly multiply it by 100. It becomes 2832 when assigned to a Word variable. It is sent over the serial link as an integer and then may be modified in the PIC®, perhaps it is averaged with a group of readings. If the result after integer arithmetic was 3851 you could send this to a display like this without ever converting it to a float:

Code:
Dim PrintVar As Word
Dim PrintVar2 As Word
PrintVar = 3851 / 100 ' PrintVar now has a value of 38
PrintVar1 = 3851 // 100 ' PrintVar1 contains the modulus value of 51
Print At 1, 1, DEC PrintVar,., DEC PrintVar1 ' The display reads 38.51
Things to remember when using integer math:
1. Remember where your decimal place is!
2. This is primary school mathematics  do the sum on paper the way you know, and then try to convert that to BASIC.
3. In this example 13.3 / 8 = 1.6625, but the implied precision is useless as the input numbers are only to 3 and 1 significant figures.
4. When adding and subtracting in integer math, multiply both values by the same amount so there is no truncation or rounding being performed, then add or subtract as normal.
5. When multiplying in integer math, the output precision is equal to the sum of the two inputs' precision. Multiply both numbers by 10^Precision before executing the multiplication.
6. When dividing in integer math, the output precision is equal to the difference between the two inputs' precision.
7. Make sure you keep track of what is positive and what is negative. By default, DWords are signed, but this can be disabled using the code: Declare UNSIGNED_DWORDS = On
8. You may be reading a temperature sensor. Do all your arithmetic in integer types. Leave the conversion to a float until the last moment or never convert it, just use the strategy above to print it to the display.
9. Remember that a Byte rolls over to 0 when you exceed 255, Words rollover to 0 when you exceed 65535, and Dwords rollover to zero when you exceed 2147483647. With integer subtraction the results are accurate unless the number you are subtracting is larger than the one you started with. For example a byte of value 2 which has 3 subtracted from it will give 255 not -1. And so on.
Examples using integer maths:
Contributed by Wastrix

SUBTRACTION (SIMILAR FOR ADDITION): Take 87.9482135 from 112.1987345
With integer math:
Code:
Dim DWord1 As DWord
Dim DWord2 As DWord
Dim Result As DWord
Dim Before As Byte
Dim After As DWord

DWord1 = 1121987345 ' Multiply both numbers by 10^7
DWord2 = 879482135

Result = DWord1 - DWord2
Before = Result / 10000000 ' Divide by 10^7 again
After = Result // 10000000

Print Dec Before, ".", DEC7 After
' Result is 24.250521 (correct)
With floating point:
Code:
Dim Float1 As Float
Dim Float2 As Float
Dim ResultF As Float

Float1 = 112.1987345
Float2 =  87.9482135

ResultF = Float1 - Float2

Print \$FE, \$C0, DEC7 ResultF
' Result is 24.250564 (incorrect)
DIVISION: Divide 1 by 3
With integer math:
Code:
Dim DWord1 As DWord
Dim DWord2 As DWord
Dim Result As DWord
Dim Before As DWord
Dim After As DWord
DWord1 = 1000000000 ' Set variables to correct initial values
DWord2 = 3
Result = DWord1 / DWord2 ' Do first operation (1/3)
Before = Result / 1000000000 ' Get numbers before decimal
After = Result // 1000000000 ' Get numbers after decimal place
Print Dec Before, ".", DEC9 After
' Result is 0.333333333
With floating point:
Code:
Dim Float1 As Float
Dim Float2 As Float
Dim ResultF As Float
Float1 = 1
Float2 = 3
ResultF = Float1 / Float2
Print \$FE, \$C0, DEC8 ResultF
End
' Result is 0.333333310
MULTIPLICATION: Multiply \$89.45 by 12.4
With integer math:
Code:
Dim WordOne As Word     ' We only need word size as we
Dim WordTwo As Word     ' are working with small numbers
Dim Result As DWord
Dim Before As Word     ' Likewise here...
Dim After As Byte

WordOne = 8945                 ' Set the values
WordTwo = 124

Result = WordOne * WordTwo
Before = Result / 1000         ' 10^(2+1), because we multiplied
After = Result // 1000         ' the inputs by 10^2 and 10^1

Print Dec Before, ".", Dec After
' Result is 1109.18 (correct)
With floating point:
Code:
Dim Float1 As Float
Dim Float2 As Float
Dim ResultF As Float

Float1 = 89.45
Float2 = 12.4

ResultF = Float1 * Float2

Print \$FE, \$C0, DEC2 ResultF
' Result is 1109.17 (incorrect)
ALL OF THE ABOVE: Convert 34.5189 degrees Celsius to Fahrenheit
With integer math:
Code:
Dim Celsius As DWord
Dim Fahrenheit As DWord
Dim Before As DWord
Dim After As DWord

Celsius = 34518900 ' Multiplied by 10^6

Fahrenheit = Celsius * 9
Fahrenheit = Fahrenheit / 5
Fahrenheit = Fahrenheit + 32000000 ' Multiplied by 10^6

Before = Fahrenheit / 1000000 ' Divide by 10^6
After = Fahrenheit // 1000000

Print Dec Before, ".", DEC6 After ' Display to 6dp. Notice the 6?
' Result is 94.134020 (correct)
With floating point:
Code:
Dim Celsius As Float
Dim Fahrenheit As Float

Celsius = 34.5189
Fahrenheit = Celsius * 9 / 5
Fahrenheit = Fahrenheit + 32

Print \$FE, \$C0, DEC6 Fahrenheit
' Result is 94.134017 (incorrect)
These inaccuracies may seem small, but added together they can create large errors. This could cause significant issues with small, sensitive data or with financial information.