To accurately multiply two 16.16 number you shouldn't need to split them up
in to separate 16 bit multiplies, commonly known as "using the highschool
math" (sic) method for example.
The solution to getting faster multiplication for 16.16 maths is to use the
ARM 64bit SMULL instruction. This instruction takes four registers. From
memory the first two are the 65 bit result and the last two registers are
the two 32 bit registers to multiply.
The bets bit about this instruction is that there is a SMULA (IIRC) that
accumulates in to the 64 bit result registers. This way you can do a series
of 64 bit result multiples with accumulation and then do only one shift down
by 16 bits at the end.
To do the shift at the end you can do:
Assume 64 bit result is in r0,r1 where r1 is the lower 32 bits.
mov r1,r1,lsr#16
orr r1,r0,lsl#16
Again, if I remember correctly. ;-)
Since 16.16 numbers are really useful for 3D maths the 3D maths calculation
functions should be moved in to the 32K of fast RAM and executed as 32 bit
ARM code, not thumb. Just by doing this you will see a speed up of about 3.5
to 4 times.
-----Original Message-----
From: tomstdenis [mailto:tomstdenis@...]
Sent: 02 May 2002 11:24
To: gbadev@yahoogroups.com
Subject: [gbadev] Re: fast 32x32=>64 multiplier for 16.16 fixed point
--- In gbadev@y..., tom st denis <tomstdenis@y...> wrote:
> I didn't find much in the archives for this so I wrote my own....
>
> This routine gets ~27000 multiplies per second in VBA which means
on
[..]