Sorry, this post I'm replying to is old ... and long ...
John Q. Pretentious wrote:
=== on the topic of the bit depth input into the GBA FIFO ===
> > In 9-bit mode the extra bit is just
> > "invented" by the system so it's still just 8-bit sound.
>
> The nine bits of data *are* being interpreted by the system. Otherwise,
> that ninth bit I put in the waveform should have wreaked havoc, and it
> didn't. Moreover, 8-bit data filled in in the 9-bit mode sounds
completely
> wrong.
Are you saying that the data is sent into the FIFO packed? That's the only
way these comments of yours make sense ... so for 6-bit data you would send
three bytes of AAAAAABB BBBBCCCC CCDDDDDD? And for 9-bit you send AAAAAAAA
ABBBBBBB BBCCCCCC CCCDDDDD DDDDEEEE EEEEEFFF FFFFFFGG GGGGGGGH HHHHHHHH?
It doesn't seem to work that way for me. I'm setting up the sound circuit
to 9-bit at 32kHz and I'm feeding it 8-bit data from ROM [using DMA] and it
sounds just fine. I can't imagine that it would even be recognizable if it
was expecting 8 lots of 9-bit data per 9 bytes. As far as I can tell, the
FIFO accepts 8-bit numbers and then either truncates them or extends them to
the correct bit-depth that is set up in the PWM circuit. In which case, how
could you possibly feed it 9-bit data?
Also, I note that on page 82 of the hardware manual there is a specific
"8 -> 9 bit" convertor. Look closely. The 4 legacy sound channels go
through a "4 -> 9 bit" convertor too. These are then mixed inside the "R/L
selection and addition" and only then must the signal be truncated to under
9 bits. On page 83 of the hardware manual it is also stated that "Linear
8-bit audio data can be played". Nothing about 9-bit audio. Granted, there
is 9-bits of *internal* accuracy, but that's only useful if you're using
DirectSound at 1/2 volume alongside legacy sound at full volume, and who
does that?
I'd love to give you the benefit of the doubt, because you do seem so sure
of yourself, but I can find no evidence to support your theory, and lots of
evidence to support mine. What code are you using to send these alleged
9-bit samples into the hardware?
=== on the topic of PWM ===
> Uh, no. While it is true that PWM is a 1-bit signal constructed into
> varying rates, there are absolute scads of logical errors. If the sound
> quality is equivalent, what's this about testing to see which sounds best?
Testing to see which sounds best is necessary because the ANALOG output
circuitry is a priori non-ideal and probably doesn't adapt when you change
the DSP output rate [anyone know for sure?]. However, the output qualities
are identical, in theory, given ideal analog output stages. These ideal
analog circuits differ for different sample rates.
All this is quite well documented in several DSP textbooks, but the way in
which theory and practice diverge so quickly is one of the things that makes
DSP a "black art". I don't like to make incorrect statements, hence my
rider.
Another reason is that if the sample frequency is too low then some legacy
sounds will alias [foldover around the Nyquist frequency] so you may need to
raise the frequency [lower the bit depth] if you are using some of these
sounds. If the sample frequency now mismatches with the analog filter
cutoff frequency the quality is reduced.
It's a complex system; YMMV.
> And what makes you think that it cannot be predicted what effects sample
> rate and sample depth will have on a sample?
If you know the exact circuitry that Nintendo use in the GBA you can
certainly predict the results to a high degree of accuracy. But you don't
know, so predicting it would be an impressive feat of reverse engineering.
I'll be fascinated by your results.
I'm just saying, why bother? The theory is complex and there are only four
settings to try. There's nothing worse than doing hours [or days] of work
and coming up with the wrong answer when you can find the right answer
empirically in about five minutes.
=== on the topic of dithering ===
> +0.5 what, and -0.5 what?
+/-0.5 units. I think LSBs is the usual term. I'm assuming you're working
in fixed-point - say you use 8.24 resolution in your mixer stage. Then you
have to drop precision to 8 bits, so you add what would be (RND(1) - 0.5) in
BASIC - a random number between -0.5 and +0.5. This is a number between
0xFF800000 and 0x00800000 in 8.24 fixed-point.
> Why would adding noise make something sound better? Is this to combat
aliasing?
Not aliasing, no. Aliasing is when frequencies exist that are higher than
the Nyquist frequency. This can happen in a variety of situations, and is a
big problem in audio processing. The word "alias" means "another name" and
this refers to the fact that frequencies above the Nyquist frequency get
"aliased" with frequencies below the Nyquist after sampling or re-sampling.
However, what I am talking about has nothing to do with this at all.
Like I said, it's exactly like dithering on a grayscale image [e.g. if you
only have 4 shades of grey]. I added some more explanation of "why" below.
> Certainly there's something better than randomness.
In the grayscale image case, what looks better than randomness? Randomness
is good because it doesn't favor one frequency over another - it is "white"
noise. Any other function is pink noise and will be visible/audible in the
output as pink noise. Any cyclic function is a non-starter since it
introduces definite frequencies in the output.
> For that matter, if random noise is good, why are uninsulated cables bad?
I didn't say random noise is "good". I said that when you add 0.5LSB of
noise before quantization the results are better than if you don't.
Dithering requires *exactly* 0.5LSB of noise - any less and the quantization
artifacts show; any more and you are just adding noise to the signal.
> They're not getting very much amplitude compared to the real signal at
all, and though +0.5 has *no*
> *unit* (this is bad, 3rd grade math), I'm assuming you mean against a
signal
> level of one.
When your samples go from -128 to +127 tell me what the "units" are? LSBs
is the nearest I can get, but it's not an ideal name.
> That said, unless you're a sound engineer, that paragraph is completely
> incomprehensible. And I'm kind of hesitant to believe that a sine wave,
> when played wihtout noise, sounds liek a square wave when listened to. Or
> whatever you meant.
OK, I'll exlpain it in a ltitle more detail.
Imagine a sine wave at an amplitude of 1.0 LSB. What does it sound like?
With only -1.0, 0.0 and 1.0 available, it sounds pretty shitty - like a
squarewave, basically. Now, we know that a squarewave is a mixture of
harmonic sinewaves which means that the quantization process has introduced
harmonics onto your nice pure sinewave. This is bad.
By dithering you still only have those three levels avaiable, but the actual
signal jumps around more - and the *average* levels become correct over a
period of time. The analog filters smooth this out into something close to
an actual sinewave plus noise. It's pretty clever stuff. Your ears and
brain can then seperate the sinewave from the noise and hear it as an
independent entity [instead of it being a mess of quantization noise].
It's no different in principle to how your eyes and brain "smooth out" a
dithered image.
There is 0.5LSB of extra noise, true, but there is now far less quantization
noise - and if you experiment you'll quickly determine which sort of noise
is the least agreeable.
With a little imagination you can see that this is exactly how PWM works in
the first place. I'll leave the details as an exercise to the reader. :-)
> If you mean that the dither pattern has a random component, that's
> different, but those aren't random - tehy're pseudo-random, and in the
case
> of dithering patterns, not very - they have rules regarding neighbors,
etc.
> That's very different than random noise and should not be presented as
such.
Dithering patterns are an approximation to true random dithering. They are
only used because they are economical in hardware [or software] compared to
a random number generator. These dithering matrices exhibit periodic
behavior [repetitions] over the screen, which the brain picks up like a shot
if the scene contains large patches of smooth gradients. It would be like
using a 32-sample noise segment - you'd hear a 1kHz tone [at a 32kHz sample
rate].
Pseudo-random noise is imperceptibly different to true random noise. 32-bit
pseudo-random noise at 44kHz doesn't repeat for 27 hours. No ear can detect
frequencies that low! :-) Just using rand() would not be good though, since
that repeats after a second or two.
> Dithering
> is a term used specifically to refer to graphics, especially bitmap
> graphics. It comes from the old english word, meaning to shiver (merriam
> webster online comes up with that one; the better definition can be found
at
> FOLDOC.org).
Explain why the old English word for shiver is unsuitable for the process of
adding random values to a sound signal? Seems pretty apt to me. I imagine
that's why the DSP community chose the term in the first place. I wasn't
aware of any attempt by the CG community to annex it.
In any case, it's the same function. Of course it has the same name.
> Now, I'm not claiming to know how people on the inside of the digital
audio
> industry talk about things, but I think it's curious that every single one
> of the pages I come up with uses the phrase "anti-aliasing".
Anti-aliasing is a different thing. Aliasing in sound refers to the
foldover of higher frequencies than the Nyquist, and anti-aliasing refers to
the removal of such components. It's very important, hence you will easily
find it on a random selection of DSP-related pages.
> If you have a good way of demonstrating to me
> that the *total* resolution - sample rate and sample depth - is changed by
> this operation, I'll take it. Do remember, as I keep restating, that the
> data rate is unchanged.
Do you still mean the operation of adding 0.5LSB of noise? Of course the
total resolution isn't changed. That's not the point of the operation. The
point is to preserve the data you have got. If you don't dither, you're
throwing away information [in the low bits of the samples]. If you dither,
this information is somewhat folded in to the final output samples and the
quality is somewhat improved.
Ed xxx