Hi Manfred
> the major bottle neck is often the EXRAM & ROM accesses. If your routine
> spends most it's time copying stuff from rom to ram there is often no big
> speed increase, but when you can access your stuff from internal RAM only
> the difference can be quite huge. E.g. the code for our sample mixer in
our GAX soundroutines
> is about 8 times faster in handoptimized ARM32 code then in optimized C
code.
> The code can be make use of the conditional ARM32 commands and so our
mainloop
> gets alot smaller then the C one which is quite bloated.
Im rendering large textures from ROM directly to VRAM, I dont touch EX_WRAM.
at the moment, but I'm going to need it soon for larger geometry buffers.
> The ARM has a sort of hardware divide, dont you know? :)
> You know that muls are fast and "X=Y/Z" can be translated to
"X=Z*1/Y"....
from which I gather you mean something along these lines:
n = 16384 / Z; // Tabulate this
X = (Y * n) >> 16;
The numbers I am using require too much precision for me to use recipricols
and its too
late in the day for me to move the engine over to 64 bit math , I tried it
for geometry
and perspective texture mapping but accuracy was terrible.
> As we have in our current game a couple of hundred DIVS each frame it
> was just not possible to use the normal div without going down to 30fps
with.
> With the above method i was able to speed up the divs by about 800%. Only
drawback is the need of a 16kb table.
The game I am working on requires a a couple of thousand divs per frame :)
, but with interpolation I managed to get it down by a factor of 16.
> So finally everything is still 60fps and we have enough time to mix 8
channels for music and fx in 21khz. :)
>
What type of game is it? Im currently getting between 30 and 45 fps, but
would like a solid 40+
Regards
Mat