Search the web
Sign In
New User? Sign Up
gbadev
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
fast 32x32=>64 multiplier for 16.16 fixed point   Message List  
Reply | Forward Message #11214 of 15019 |
Re: fast 32x32=>64 multiplier for 16.16 fixed point

--- In gbadev@y..., tom st denis <tomstdenis@y...> wrote:
> I didn't find much in the archives for this so I wrote my own....
>
> This routine gets ~27000 multiplies per second in VBA which means
on
> the real hardware its probably around ~22000 [my gba broke so I
can't
> test it myself...]
>
> What the code does is a 32x32 multiply then a shift right by 16
bits so
> it emulates a multiply for a 16.16 system.

I've managed to speed up the code 5x and I will tell y'all how.
Really easy using some school math.

Ok so the the basic 16.16 multiply looks like this

x = (a*b)>>16

Where a*b is a 64-bit [or at least 48 bits] product. Problem: The
GBA only produces 32 bits of product. Solution: Only do 16x16
multiplies!

Let a = AB and b = CD then you have

AB
*CD
---

Where you must compute [with the right shift taken]

B*D >> 16, D*A, C*B and A*C << 16

So in my new code I do just that. After taking care of the sign of
the input [all four A,B,C,D must be unsigned at this point] I
proceed to mask/chop off the 16-bit words then I perform the four
multiplications.

In VBA I routinely get >100k worth of multiplications per second
[instead of 20k]. I dunno how VBA emulates the multiplication but I
know one thing for sure this is going to be faster than the my other
method. My other method had 7 instructions per bit for a total
around 250 instructions. This routine has ~40 instructions which is
1/5 the size :-)

Because I preshift the values [to keep this all 32 bits] the lower
bits of the fraction are probably not exactly what they should be.
I've tested the routine with various inputs [negative and positive]
and I found it appears to work.

In my new library I've added an Inverse table so you can compute
division easily. I've also added 16.16 Cos/Sin/Tan tables to my lib
[I already had 8.8 tables...]

Should be enough to implement pretty much any 3d graphics in a
larger world space.

You can see my multiplier code in /lib/amath.s its called "fmul32"
and is fairly well commented assuming you can read ARM assembler.

All of my code is in Thumb mode because the instructions execute
quicker [and I don't have an ARM reference handy...]

http://tomstdenis.home.dhs.org/mylib.zip

I'd still be interested in tweaking my multiplier code if possible
so please reply if you have any ideas.

Also [I'm going to search too but I thought I'd ask....] anyone have
any good info on implementing atan() quickly? My trig is not 100%
so I could use some help.

Tom

[mod note: I think you should take a look at the ARM MULL instructions for your
own good, this has also been discussed lots and lots previously on the list :]




Thu May 2, 2002 10:23 am

tomstdenis
Offline Offline
Send Email Send Email

Forward
Message #11214 of 15019 |
Expand Messages Author Sort by Date

I didn't find much in the archives for this so I wrote my own.... This routine gets ~27000 multiplies per second in VBA which means on the real hardware its...
tom st denis
tomstdenis
Offline Send Email
May 2, 2002
8:43 am

... on ... can't ... bits so ... I've managed to speed up the code 5x and I will tell y'all how. Really easy using some school math. Ok so the the basic 16.16...
tomstdenis
Offline Send Email
May 2, 2002
10:32 am

To accurately multiply two 16.16 number you shouldn't need to split them up in to separate 16 bit multiplies, commonly known as "using the highschool math"...
Martin Piper
fnagaton
Offline Send Email
May 2, 2002
10:54 am

... split them up ... highschool ... Very true. Thanks guys... a small snippet fmul32: @ inputs are in r0 and r1, return in r0 smull r2,r3,r0,r1 mov...
tomstdenis
Offline Send Email
May 2, 2002
12:31 pm

For something that small, wouldn't it be better to put it inline than to do a branch and return? Unless, of course, you are in thumb mode at the time. ... ...
Dennis Munsie
bea_dennis
Offline Send Email
May 3, 2002
8:29 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help