; gcc IEEE single precision floating point implementation for HC12 ; Copyright (C) 2002 Tim Housel ; All rights reserved. ; ; Redistribution and use in source and binary forms, with or without ; modification, are permitted provided that the following conditions are met: ; ; Redistributions of source code must retain the above copyright notice, this ; list of conditions and the following disclaimer. ; ; Redistributions in binary form must reproduce the above copyright notice, ; this list of conditions and the following disclaimer in the documentation ; and/or other materials provided with the distribution. ; ; The name of the copyright holder may not be used to endorse or promote ; products derived from this software without specific prior written ; permission. ; ; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER "AS IS" AND ANY EXPRESS ; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED ; WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ; ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER BE LIABLE FOR ANY ; DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ; (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR ; SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER ; CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT ; LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY ; WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY ; OF SUCH DAMAGE. ; All you have to do is link this code in and suddenly your floating point ; code will run much faster. One thing you should be aware of ; is that all floating point literals should be suffixed with 'f' or 'F' ; (ex. 1.0f or 3.14159F). By default, literals are doubles, and since this ; is only a single precision implementation, you don't want that. ; lots of IEEE things are not implemented - can't specify rounding mode, ; doesn't mess with infinity or NaN or exceptions. Except for 0, it doesn't ; really handle denormalized numbers (numbers between 0 and +/- 2^-126). ; In the application I used to test (which had over 3000 fp ops), There was ; a +/- 1 LSB difference from gcc's default softfloat implementation for ; each operation (naturally errors accumulate when you do lots of ; operations). .sect .text .globl __addsf3 .globl __subsf3 .globl __mulsf3 .globl __divsf3 .globl __negsf2 .globl __cmpsf2 .globl __eqsf2 .globl __gtsf2 .globl __gesf2 .globl __ltsf2 .globl __lesf2 .globl __fixsfsi .globl __fixunssfsi .globl __floatunsisf .globl __floatsisf ; adds two floating point numbers - actually this is just a front end ; the real work happens in sumsf3, which is used if the magnitude of the ; result is the sum of the magnitudes of the operands, and diffsf3, which ; is used when the result is the difference of the operands ; operand 1 is in x:d, operand 2 is at [sp + 2] __addsf3: ldy #1 bra checksigns ; subtracts two floating point numbers - actually this is just a front end ; the real work happens in sumsf3, which is used if the magnitude of the ; result is the sum of the magnitudes of the operands, and diffsf3, which ; is used when the result is the difference of the operands ; operand 1 is in x:d, operand 2 is at [sp + 2] __subsf3: ldy #0 ; this routine uses the sign of the two operands and the operation (which ; was put into y above, 1 = add, 0 = subtract) to determine whether ; the magitude of the result will increase (sumsf3) or decrease ; (diffsf3) checksigns: pshd ; save off first operand pshx ldaa 0,sp ; is second operand negative? bmi checkadd ; yes dbne y,negop ; if not, and we're subtracting, it's effectively ; negative anyway posop: ldaa 6,sp ; is the first op negative? bmi diffsf3 ; yes - magnitude of result will be less than the ; the magnitude of the operands bra sumsf3 ; no - magnitude of result will be greater than ; the magnitude of the operands checkadd: dbne y,posop ; the second op is negative - if we're subtracting, ; it's effectively positive negop: ldaa 6,sp ; is the first op negative? bmi sumsf3 ; yes - magnitude of result will be greater than the ; magnitude of the operands bra diffsf3 ; no - magnitude of result will be less than the ; magnitude of the operands ; this takes the sum of 2 numbers, regardless of sign (the sign of the result ; should be the sign of the first operand, unless the first operand ; is too small relative to the second) ; stack usage: ; 0 : exponent of sum ; 1-3 : mantissa of sum ; 5 : tmp storage ; 6 : sign of addend1 ; 7 : sign of addend2 ; 8 : exponent of addend1 ; 9-11 : mantissa of addend1 ; 12-13 : return address ; 14 : exponent of addend2 ; 15-17 : mantissa of addend2 sumsf3: leas -8,sp sty 2,sp ldy #0 jsr sizecheckandalign ; get mantissa and exponent, also normalize ldab 5,sp ; were the exponents too far apart? beq donesum ; yes - return operand with greater magnitude ldd 2,y ; add the normalized mantissas addd 2,x std 2,y ldaa 1,y adca 1,x staa 1,y bcc shiftexp ; was there a carry from the add? ror 1,y ; yes - shift right to renormalize ror 2,y ror 3,y inc 0,y shiftexp: rol 6,sp ; get the sign of the first operand ror 0,y ; reformat to IEEE bcs setexpbit clrexpbit: bclr 1,y, #0x80 bra donesum setexpbit: bset 1,y, #0x80 donesum: ldx 0,y ldd 2,y leas 12,sp rts ; this takes the difference of 2 numbers, regardless of sign (the sign of the ; result should be the sign of the larger operand for addition. subtraction ; is more complicated because it's noncommutative - if the largest operand ; is first, the sign for the result should be the sign of the larger operand, ; if it's second the sign of the result should be opposite the sign of the ; largest operand) ; stack usage: ; 0 : exponent of difference ; 1-3 : mantissa of difference ; 4 : offset (6 or 7) of the sign of the largest operand ; 5 : tmp storage ; 6 : sign of minuend ; 7 : sign of subtrahend ; 8 : exponent of minuend ; 9-11 : mantissa of minuend ; 12-13 : return address ; 14 : exponent of subtrahend ; 15-17 : mantissa of subtrahend diffsf3: leas -8,sp sty 2,sp ldy #0 jsr sizecheckandalign ; get mantissa and exponent, also normalize ldab 5,sp ; were the exponents too far apart? beq donediff ; yes - return operand with greater magnitude ldd 2,y ; subtract the normalized mantissas subd 2,x std 2,y ldaa 1,y sbca 1,x staa 1,y bcc checknormal borrow: ; there could be a borrow if the exponents were equal coma ; if so, take the 2's complement of the result staa 1,y ldd 2,y coma comb addd #1 std 2,y ldaa 1,y adca #0 staa 1,y ldab #7 ; also we now know the subtrahend was larger stab 4,sp checknormal: ; check to see if the difference mantissa is ldd 0,y ; normalized tstb bmi shiftexp2 renormbig: ; if it needs to be shifted more than 8 bits, tbeq a,saveexp ; use memory moves rather than bit shifts cmpa #8 blo renormsmall ldab 1,y bmi saveexp bne renormsmall ldx 2,y stx 1,y clr 3,y suba #8 bra renormbig renormsmall: ; use bit shifts to do the rest deca beq saveexp ; if we hit zero lsl 3,y rol 2,y rol 1,y bpl renormsmall saveexp: staa 0,y shiftexp2: ; check to see if we need to switch the sign ldab 4,sp ; (which is normally the sign of the larger operand, ldaa b,sp ; except in the case when we're subtracting and the tst 3,sp ; subtrahend was larger than the minuend) beq notsubtract cmpb #7 ; is the second operand greater? bne notsubtract ; no eora #0x80 ; switch signs for subtract and second operand greater notsubtract: ; reformat for IEEE rola ror 0,y bcs setexpbit2 clrexpbit2: bclr 1,y, #0x80 bra donediff setexpbit2: bset 1,y, #0x80 donediff: ldx 0,y ldd 2,y leas 12,sp rts ; this function does several useful tasks ; it shifts the IEEE format around to get the sign, exponent, ; and mantissa (with the hidden 1 for normalized numbers). ; Also, for add and subtract it normalizes the mantissas. ; And for multiply and divide it calculates the result's exponent sizecheckandalign: ldd 10,sp staa 8,sp ; save off sign bit for first operand lsld staa 10,sp ; save off exponent beq adjust2ndop ; don't add the hidden 1 if denormalized lsrb orab #0x80 stab 11,sp adjust2ndop: ldd 16,sp staa 9,sp ; save off sign bit for second operand lsld staa 16,sp ; save off exponent beq checkop ; don't add the hidden 1 if denormalized lsrb orab #0x80 stab 17,sp checkop: cpy #0 ; is this for add/sub or mul/div beq addsubcheckdiff muldivcheckdiff: bpl cmpexp suba #0x7f ; for division, we will subtract exponents nega adda #0x7d cmpexp: suba #0x7e bpl expgreatzero ; if the (second exponent - 126) is > 0, ; check for overflow explesszero: ldab 10,sp ; otherwise, check for underflow bmi expdiffsigns deca aba ble retzero ; if underflow, return zero inca bra retexp expgreatzero: ldab 10,sp bpl expdiffsigns inca aba bge retinfinity ; if overflow, return infinity deca bra retexp expdiffsigns: aba retexp: staa 16,sp ldab #1 stab 7,sp rts retinfinity: ldx #0xffff ldd #0xffff bra movetoreg retzero: ldx #0 clra clrb bra movetoreg addsubcheckdiff: suba 10,sp blo secondlow beq aligned ; exponents are equal - already aligned firstlow: ; the first exponent is less than the second leax 10,sp leay 16,sp ldab #7 ; for subtraction we need to know the larger operand stab 6,sp bra checkexpdiff secondlow: ; the second exponent is less than the first leax 16,sp leay 10,sp ldab #6 ; for subtraction we need to know the larger operand stab 6,sp nega ; get absolute value of difference between exponents checkexpdiff: cmpa #0x17 ; are the exponents too far apart? bls shiftright ; no - normalize the smaller toosmall: ; yes addb #2 ; mask off everything but the sign ldaa b,sp anda #0x80 lsr 0,y ; one of the operands is too small - return other bcs checksubtract bclr 1,y, #0x80 checksubtract: ; if the larger value is the second operand and cmpb #9 ; we are doing subtract, need to reverse the sign bne expbithigh tst 5,sp beq expbithigh eora #0x80 expbithigh: tsta bpl movetoreg bset 0,y, #0x80 movetoreg: clr 7,sp rts shiftright: ; normalize smaller operand pshy shiftbigloop: ; use memory moves for shifts of more than 8 tbeq a,doneloop cmpa #8 blo shiftloop ldy 1,x sty 2,x clr 1,x suba #8 bra shiftbigloop shiftloop: ; use bit shifts for the rest lsr 1,x ror 2,x ror 3,x dbne a,shiftloop doneloop: puly bra done aligned: ; exponents already aligned - assume first exponent ldab #6 ; is larger until proven otherwise stab 6,sp leax 16,sp leay 10,sp done: ldaa #1 staa 7,sp rts ; multiply ; if you take the 24-bit mantissa of the multiplier and represent it as ; A0 * 2^16 + A1, where A0 is the upper 8 bits, and A1 is the lower 16 bits ; if you represent the multiplicand the same way, then the problem becomes ; (A0 * 2^16 + A1) * (B0 * 2^16 + B1) = ; (A0 * B0) * 2^32 + ((B0 * A1) + (A0 * B1)) * 2^16 + A1 * B1 ; now we can use the emul instruction to do 16x16 multiplies ; stack usage: ; 0 : exponent of product ; 1-6 : mantissa of product (for 48-bit intermediate product) ; 7 : sign of product ; 9-11 : mantissa of multiplier ; 12-13 : return address ; 14 : exponent of product ; 15-17 : mantissa of multiplicand __mulsf3: pshd pshx leas -8,sp tfr x,d eora 14,sp anda #0x80 staa 0,sp ; save off sign of result ldy #0x1 jsr sizecheckandalign ldab 5,sp ; are the exponents too far apart? beq donemul ; yes - return result ldaa 0,sp staa 7,sp ; move sign to 7,sp ldd 10,sp ; do A1 * B1 ldy 16,sp emul std 5,sp sty 3,sp clra ldab 15,sp ; do B0 * A1 ldy 10,sp emul addd 3,sp std 3,sp bcc nocarry1 iny nocarry1: sty 1,sp clra ldab 9,sp ; do A0 * B1 ldy 16,sp emul addd 3,sp std 3,sp bcc nocarry2 iny nocarry2: ldd 1,sp leay d,y sty 1,sp ldaa 9,sp ; do A0 * B0 (8x8 multiply) ldab 15,sp mul addd 1,sp std 1,sp checknormal2: ; renormalize if necessary ldaa 14,sp ldab 1,sp tstb bmi finaladjust renormsmall2: deca beq finaladjust ; if we hit zero lsl 4,sp rol 3,sp rol 2,sp rol 1,sp bpl renormsmall2 finaladjust: staa 0,sp rol 7,sp ror 0,sp bcs setexpbit3 clrexpbit3: bclr 1,sp, #0x80 bra donediff2 setexpbit3: bset 1,sp, #0x80 donediff2: ldx 0,sp ldd 2,sp donemul: leas 12,sp rts ; This implements floating point division, the trickiest of the ; arithmetic operations. I looked to Knuth for help, and used the ; multiple precision algorithm described in The Art of Computer Programming ; Volume 2, Section 4.3.1. Good thing too - I would not have come up with ; that algorithm for picking quotient digits on my own. As a note, ; this implementation won't do denormalized numbers, and may allow the ; lsb to be a little off. It's not hard to fix, but adds more time, so ; I traded performance for accuracy. ; Because we're using normalized mantissas between [1.0, 2.0), the quotient ; is guaranteed to be between (0.5, 2.0). This allows me to take all kinds ; of shortcuts, and means in the end we'll need at most a single right ; shift to renormalize. ; stack usage: ; 0-3 : quotient ; 5-6 : tmp storage ; 7 : sign of quotient ; 9-11 : mantissa of dividend ; 12-13 : return address ; 14 : exponent of quotient ; 15-17 : mantissa of divisor __divsf3: pshd pshx leas -8,sp tfr x,d eora 14,sp anda #0x80 staa 0,sp ; save off sign of result ldy #0xffff ; lets sizecheckandalign know we're dividing jsr sizecheckandalign ldab 5,sp ; are the exponents too far apart? beq donediv ldaa 0,sp staa 7,sp ; move sign to 7,sp firstdiv: clra ; I modified things a little from Knuth ldab 9,sp ; since the mantissas are normalized, no need to tfr d,y ; change them, and I add preceding byte of 0 ldd 10,sp ; to the dividend that will give me the 24 bits ldx 15,sp ; of precision I need ediv ; This gets a quotient digit (the most significant ; 7-9 bits) sty 0,sp mulsub: ; multiply the quotient digit by the divisor & subtract ; you would think this would give you a 40 bit (24+16) ; result, but because the first quotient digit is ; maximum of 0x1ff (for the max case of ; 0x00ffffff / 0x800000), the result is actually 32 bits ldd 16,sp emul std 5,sp sty 3,sp ldy 0,sp clra ldab 15,sp emul addd 3,sp std 3,sp ; there won't be a carry from this ldaa 11,sp ; now subtract the result from the dividend clrb subd 5,sp ldy 9,sp std 10,sp bcc noborrow dey noborrow: tfr y,d subd 3,sp stab 9,sp ; the upper 8 bits are guaranteed to be zero, unless std 2,sp bcc seconddiv ; there was a borrow, of course addback: ; the quotient digit could be 1 or 2 larger than it ldy 0,sp ; should be. If so we need to decrement it, and add dey ; back the divisor sty 0,sp ldd 10,sp addd 16,sp std 10,sp ldd 2,sp adcb 15,sp adca #0 stab 9,sp std 2,sp bmi addback ; do we need to add back again? seconddiv: ; get the lower 16 bits of the quotient ldy 9,sp ldaa 11,sp clrb ldx 15,sp ediv ; there could be an overflow (result = 0x10000) bvc notoverflow ; from this divide overflow: clra ; if so, increment the most significant byte clrb std 2,sp ldx 0,sp inx stx 0,sp bra testshiftright notoverflow: sty 2,sp testshiftright: ; do we need to shift right to normalize? ldaa 14,sp ror 0,sp bcc insertexp ror 1,sp ror 2,sp ror 3,sp inca ; increment the exponent insertexp: ; put the exponent at 0,sp lsra bcs alreadyset bclr 1,sp, #0x80 alreadyset: oraa 7,sp ; put the sign in the msb ldab 1,sp ldx 2,sp exg d,x donediv: leas 12,sp rts ; negate operand __negsf2: exg x,d eora #0x80 exg x,d rts ; compare - return 1 if op1 < op2, 0 if op1 == op2, 0xffff (-1) if op1 > op2 __cmpsf2: __eqsf2: __gtsf2: __gesf2: __ltsf2: __lesf2: leas -6,sp checksigns2: ; easy decision if signs are different std 2,sp tfr x,d eora 8,sp bpl samesign tst 8,sp bmi op1hi op2hi: ldd #0xffff bra donecmp op1hi: ldd #1 bra donecmp samesign: ; now check the exponents tfr x,d lsld staa 4,sp ldd 8,sp lsld cmpa 4,sp bhi op2hi bne op1hi sameexp: ; now compare the upper 16-bits of the mantissa ldaa 9,sp anda #0x7f staa 9,sp stx 0,sp ldd 1,sp anda #0x7f cmpd 9,sp beq samehimant bhi op1hi bra op2hi samehimant: ; compare the low 8 bits of the mantissa ldaa 3,sp cmpa 11,sp bhi op1hi bne op2hi clra clrb donecmp: leas 6,sp rts ; convert from integer to floating point __floatunsisf: ldy #1 bra checkzero __floatsisf: ldy #0 checkzero: ; if zero, work is done cpx #0 bne notzero cpd #0 bne notzero rts notzero: pshd pshx pshx clr 0,sp dey beq initexp ldd 2,sp bpl initexp ; if signed and less than zero, need to get 2's compl. inc 0,sp coma comb ldd 4,sp coma comb addd #1 std 4,sp ldd 2,sp adcb #0 adca #0 std 2,sp initexp: ldaa #0x9e shiftleftbig: ldab 2,sp bmi saveexp2 ; already normalized bne shiftleftsmall ldx 2,sp stx 1,sp ldx 4,sp stx 3,sp clr 5,sp suba #8 bra shiftleftbig shiftleftsmall: deca lsl 4,sp rol 3,sp rol 2,sp bpl shiftleftsmall saveexp2: ror 0,sp rora staa 1,sp bcs donefixtofloat bclr 2,sp, #0x80 donefixtofloat: ldx 1,sp ldd 3,sp leas 6,sp rts ; convert float to integer __fixsfsi: ldy #1 bra checkexp __fixunssfsi: ldy #0 checkexp: psha pshd pshx tfr x,d lsld tbeq a,retzeroint dey bne notsigned deca notsigned: suba #0x9e nega bmi retmaxint cmpa #0x1f bhi retzeroint clr 4,sp bset 1,sp, #0x80 shiftrightbig: cmpa #8 blo shiftrightsmall beq checksigned ldab 3,sp stab 4,sp ldx 1,sp stx 2,sp clr 1,sp suba #8 bra shiftrightbig shiftrightsmall: lsr 1,sp ror 2,sp ror 3,sp ror 4,sp dbne a,shiftrightsmall checksigned: tst 0,sp bpl loadreg ldd 1,sp coma comb std 1,sp ldd 3,sp coma comb addd #1 std 3,sp ldd 1,sp adcb #0 adca #0 std 1,sp loadreg: ldx 1,sp ldd 3,sp bra donefloattofix retzeroint: ldx #0 clra clrb bra donefloattofix retmaxint: ldd #0xffff iny beq notsigned2 ldx #0x7fff bra donefloattofix notsigned2: ldx #0xffff donefloattofix: leas 5,sp rts