> What kind of speed increase are you guys getting by using ARM32 asm
> directly? because if its significant I may have another attempt. As an example
I wrote
> a horizontal textured line blitter in C and ARM32 asm and got only a 10% speed
> increase. The only real difference I got was some very odd stack
> manipulation madness. I think it may be because I was writing directly to
VRAM.
hi,
the major bottle neck is often the EXRAM & ROM accesses. If your routine
spends most it's time copying stuff from rom to ram there is often no big
speed increase, but when you can access your stuff from internal RAM only
the difference can be quite huge. E.g. the code for our sample mixer in our GAX
soundroutines
is about 8 times faster in handoptimized ARM32 code then in optimized C code.
The code can be make use of the conditional ARM32 commands and so our mainloop
gets alot smaller then the C one which is quite bloated.
> Yep I agree the integrated shifter and conditional execution are excellent,
> just a pity there is no hardware divide or DMA mod / scale functionality, oh
and
> a double speed mode would have been nice :) Actually I think I would have
> just settled with a complete 32 bit bus so I dont have to cache most of my
code
> in WRAM :(
The ARM has a sort of hardware divide, dont you know? :)
You know that muls are fast and "X=Y/Z" can be translated to "X=Z*1/Y"....
As we have in our current game a couple of hundred DIVS each frame it
was just not possible to use the normal div without going down to 30fps with.
With the above method i was able to speed up the divs by about 800%. Only
drawback is the need of a 16kb table.
So finally everything is still 60fps and we have enough time to mix 8 channels
for music and fx in 21khz. :)
regards
--
Manfred Linzner
(Project Manager)
Shin'en Multimedia
http://www.shinen.com
Tel.: ++49 (0)89 785 82 565
Fax.: ++49 (0)89 785 82 535