Search the web
Sign In
New User? Sign Up
gbadev
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
fast 32x32=>64 multiplier for 16.16 fixed point   Message List  
Reply | Forward Message #11215 of 15019 |
RE: [gbadev] Re: fast 32x32=>64 multiplier for 16.16 fixed point

To accurately multiply two 16.16 number you shouldn't need to split them up
in to separate 16 bit multiplies, commonly known as "using the highschool
math" (sic) method for example.

The solution to getting faster multiplication for 16.16 maths is to use the
ARM 64bit SMULL instruction. This instruction takes four registers. From
memory the first two are the 65 bit result and the last two registers are
the two 32 bit registers to multiply.
The bets bit about this instruction is that there is a SMULA (IIRC) that
accumulates in to the 64 bit result registers. This way you can do a series
of 64 bit result multiples with accumulation and then do only one shift down
by 16 bits at the end.

To do the shift at the end you can do:
Assume 64 bit result is in r0,r1 where r1 is the lower 32 bits.
mov r1,r1,lsr#16
orr r1,r0,lsl#16
Again, if I remember correctly. ;-)

Since 16.16 numbers are really useful for 3D maths the 3D maths calculation
functions should be moved in to the 32K of fast RAM and executed as 32 bit
ARM code, not thumb. Just by doing this you will see a speed up of about 3.5
to 4 times.

-----Original Message-----
From: tomstdenis [mailto:tomstdenis@...]
Sent: 02 May 2002 11:24
To: gbadev@yahoogroups.com
Subject: [gbadev] Re: fast 32x32=>64 multiplier for 16.16 fixed point


--- In gbadev@y..., tom st denis <tomstdenis@y...> wrote:
> I didn't find much in the archives for this so I wrote my own....
>
> This routine gets ~27000 multiplies per second in VBA which means
on
[..]




Thu May 2, 2002 10:47 am

fnagaton
Offline Offline
Send Email Send Email

Forward
Message #11215 of 15019 |
Expand Messages Author Sort by Date

I didn't find much in the archives for this so I wrote my own.... This routine gets ~27000 multiplies per second in VBA which means on the real hardware its...
tom st denis
tomstdenis
Offline Send Email
May 2, 2002
8:43 am

... on ... can't ... bits so ... I've managed to speed up the code 5x and I will tell y'all how. Really easy using some school math. Ok so the the basic 16.16...
tomstdenis
Offline Send Email
May 2, 2002
10:32 am

To accurately multiply two 16.16 number you shouldn't need to split them up in to separate 16 bit multiplies, commonly known as "using the highschool math"...
Martin Piper
fnagaton
Offline Send Email
May 2, 2002
10:54 am

... split them up ... highschool ... Very true. Thanks guys... a small snippet fmul32: @ inputs are in r0 and r1, return in r0 smull r2,r3,r0,r1 mov...
tomstdenis
Offline Send Email
May 2, 2002
12:31 pm

For something that small, wouldn't it be better to put it inline than to do a branch and return? Unless, of course, you are in thumb mode at the time. ... ...
Dennis Munsie
bea_dennis
Offline Send Email
May 3, 2002
8:29 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help