A Crash Course in Floating Point Numbers (and why XNA is stupid)

March 29, 2009 at 2:50 pm | In Blogroll | Leave a Comment

(Note: If the following makes no sense, then I’m not very good at explaining the concept and hence I will most likely fail at least part of my exams this year, anyway…)

Floating point number are how a computer can represent a decimal number in memory.   As most of you should know, a computer works with everything in 1s and 0s, which are bits.  When a floating point number is created, the bits used to create the number are split into two groups, an exponent and a mantissa.  The mantissa creates a little binary decimal.  It’s very fiddley because of the way bits but 0.1 = 1/2, 0.01 = 1/4, 0.11 = 3/4.  If that doesn’t make sense to you, don’t worry, I can’t be asked to go into a huge amount of detail here. The mantissa is then multiplyed by 2 to the power of the exponent, giving you a wide range of numbers to work with.

Let’s focus on the mantissa, the more bits you have to work with, the more precise a number will be.  Think about it, let’s try to represent 0.1 (normal 0.1 as in a tenth) in binary

with 4 bits, the closest you can get is 0.0010 (i.e. an eigth, 0.125)

however, if you use twice that, the closest is 0.00011001 (25/256, 0.09765625)

As you can see, using twice as many bits allows you to be far more accurate.

There are two main types of floating bit numbers used in computers, single and double precision.  Single precision uses 32 bits (split between the mantissa and the exponenet) and double precision uses 64 bits.  As numbers take up such a small amount of space and memory isn’t an issue in this day and age, we have been advised to use double precision numbers at every opportunity
Why am I bringing this up?  Well, XNA Game Creator doesn’t seem to understand some simple concepts and constantly require you to have to add extra chunks of code to make it convert numbers from single to double precision floating point numbers (or vica versa) because it will moan if you use the wrong one.  Not only that, but it asks you to use each one for specific things on different occassions.

For example, a vector requires two values of double precision

For drawing something at a particular angle, you need it to be single precision, however if you want to apply any trigonometry to it, it has to double precision.  Also, any inverse trig rules will output a double precision number so to use it in drawing an object, you need to convert.

In fact pretty much anything involving a calculation involves double precision and every drawing uses single.  Now, I know that’s the sensible way around to have it, but I don’t see what the problem would be is everything was designated to be a double precision number?

Add that to the fact you need to keep some things as integers and…GAH!

No Comments Yet »

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.