The following is more a explanation about the principle than a precise description of float values in programming, since working with binary values has its own quirks, especially with values lower than one, but anyways:
Think about a number noted by a base and an exponent, like 1.000.000
can be represented as
1*10^6.
1.000.001 now becomes 1,000001*10^6.
If you want more precision or bigger numbers maintaining the same precision, you will have to add further and further decimal places and that hits a limit at a certain amount.
So basically you can either get really high numbers in a floating point unit or you can store really precise small numbers. But you cannot achieve both at the same time.
Alternatively as both floats (32 bit) and doubles (64 bit) are represented in binary we can directly compare them to the possible values an int (32 bit) and a long (64 bit) has. That is to say a float has the same amount of possible values as an int does (and double has the same amount of values as a long) . That’s quite a lot of values but still ultimately limited.
Since we generally use decimal numbers that look like this 1.5 or 3.14. It’s setup so the values are clustered around 0 and then every power of 2 you have half as many meaning you have high precision around zero (what you use and care about in practice) and less precision as you move towards negative infinity and positive infinity.
In essence it’s a fancy fraction that is most precise when it’s representing a small value and less precise as the value gets farther from zero
Well that’s how floating point units work.
The following is more a explanation about the principle than a precise description of float values in programming, since working with binary values has its own quirks, especially with values lower than one, but anyways:
Think about a number noted by a base and an exponent, like
1.000.000
can be represented as
1*10^6
.1.000.001
now becomes1,000001*10^6
.If you want more precision or bigger numbers maintaining the same precision, you will have to add further and further decimal places and that hits a limit at a certain amount.
So basically you can either get really high numbers in a floating point unit or you can store really precise small numbers. But you cannot achieve both at the same time.
Alternatively as both
floats
(32 bit) anddoubles
(64 bit) are represented in binary we can directly compare them to the possible values anint
(32 bit) and along
(64 bit) has. That is to say afloat
has the same amount of possible values as anint
does (anddouble
has the same amount of values as along
) . That’s quite a lot of values but still ultimately limited.Since we generally use decimal numbers that look like this
1.5
or3.14
. It’s setup so the values are clustered around 0 and then every power of 2 you have half as many meaning you have high precision around zero (what you use and care about in practice) and less precision as you move towards negative infinity and positive infinity.In essence it’s a fancy fraction that is most precise when it’s representing a small value and less precise as the value gets farther from zero
Thanks!