Bits, Bytes, and Data

Although I've been programming for the better part of two decades, I didn't really see a meaningful shift in the quality of my understanding and reasoning about software until I took a deep dive into the math behind how computers work. In this article we'll go all the way down to the fundamental building block of computers - the bit - and build up our understanding from there. We'll explore the basic primitive data structures of all programming languages - the boolean, integer, char, string, and float - to see how all of them fit together and provide a platform from which all computers communicate meaning. My intention is to introduce the minimal amount of math necessary to illuminate how the computer "thinks" and at the same time help the budding software engineer - you - develop a clearer intuition of how computers work.

What is a bit?

A bit is essentially a box that can be one of two states. In the case of a box, it can be either empty or not empty. The bit is not what's in the box, but rather the box and it's ability to contain something in itself. If we translate this mathematically, we can say that an empty box can be represented by 0 and a not-empty box can be represented by 1.

1.0The Mighty Bit
1

The Boolean

This concept of having only two states relates to the idea of something being either true or false. In the realm of logic, something can either be true or false. It can't be both true and false. It also can't be neither true nor false. If it's not true, then it has to be false. If it's not false, then it has to be true. Notice how we're not allowing anything but true or false? That's the point - there's nothing else that's allowed. If we look at the bit as a box that we can't see inside, you might wonder if the box is either empty or not empty. It can't be both empty and not empty. It also can't be neither empty nor not empty. That's simply confusing and weird. It either has something in it or it doesn't.

1.1The Boolean Bit
1
T

So, how useful is this bit on its own? Honestly, it's useful enough to represent true or false, but not much more than that. We have to start putting a bunch of bits together and create some rules around how we interact with them in order to do something more useful. This is where some of the concepts we know from math come into play!

Place Value

Remember from math class (or everyday life) how if we put two numbers right next to each other they can mean something different than the two numbers on their own? For example, how 1, 2 and 12 are three different numbers even though 12 has both the 1 and the 2 in it? Or how "0012", "0120", "1200" are all different numbers even though they all have the same number of 1's, 2's, and 0's? Why is that?

Because we were taught this in elementary school, it's really easy to think it's simply "the way things work", but this style of representing numbers was invented. The math term we use for this is "place value" and we use it for writing numbers by reusing the numbers we already have. In writing the number "12" we're reusing the number "1" and the number "2" because we agree that the "1" actually represents "10" in the number "12". In the case of "120" the "1" actually represents "100".

1.2Place Value
1
2
3
4
1,000
+
200
+
30
+
4
11000
+
2100
+
310
+
41
1103
+
2102
+
3101
+
4100

Notice how every number has one more zero tacked onto it than the number to its right? This is what we mean by place value. Although I haven't made clear why we need this refresher, it'll make sense soon. We simply have one more math concept to get through - base number systems.

Number Systems

You ever wonder why the number 10 doesn't have its own symbol but instead reuses the "1" and the "0"? Mathematicians have explored this question in a concept called Number Systems. The one in which we're used to is called Base 10. That means the next number after 9 is a 1 followed by a 0. The reason for doing this is so that we don't have to use a unique symbol for larger and larger numbers but rather can reuse the numbers we have.

And yet, 10 isn't the only number we could pick for the base. Really, we could pick any number. 2, 3, 4, 5, 8, and so on. The most important quality is that when we change the base number, the number written as "10" actually references that base number. For example, if we choose base 2, the number "10" actually means 2. In base 3, the number "10" means 3. That's because there in each of these systems there's a rule that says when you reach the base number you write a "1" followed by a "0" to proceed in counting. Here's a really nifty way to explore how a given number would be written in other base number systems.

1.3Base Number Systems
=10
(1101)
+
(3100)
=10
(110)
+
(31)
=10
10
+
3
=10
1
3

As you can see this gets pretty complicated looking very quickly, but let's just break it down a bit. The first line is an option to change the base number system so that you can explore how numbers are represented in different bases. The second line is a bit box that we can use to increment or decrement by 1 to see how numbers change. The next three lines show how the math in different base systems work. What you'll notice is that each grouping is the base number to a given exponent. This is the little nugget of insight that helps all of this make sense. By changing our base system each bit represents a coefficient less than the base number multiplied by the base that's exponentiated by the number of bits to the right. Check the numbers out to explore this relationship!

Graduating from Bits to Bytes

So now that we've had some fun with showing how computers use bits to represent 1 or 0 and brushed up on place value and alternate base number systems we're ready to discuss how computers put these together to store data. Specificaly, computers use Base 2 to store data as either a 0 or a 1. Yet, how can we represent other data from this? If we kept stringing together bits as we saw above, all we'd really do is create a really long row of bits together that form one very long number... This is where early computer scientists created the byte - an agreed upon chunk of bits from which we can assign meaning. A byte is simply 8 bits chunked together - just like how a bunch of letters chunked together is considered a word.

1.4A Byte
1
0
1
0
1
0
1
0

Now that we have an agreed upon chunk of bits, what can we represent with this chunk? Well, if we take our Base 2 number system and give it 8 bits, we could represent the numbers 0 to 255. This type of number is referred to as an "unsigned integer" because it does not have a sign indicating positive or negative value. It's simply assumed to be positive.

1.5A Byte Representing 0-255
0
0
0
0
0
0
0
1

Mathematically, there's a very interesting connection between the number of bits and the corresponding number that can be represented. Namely, if we exponentiate 2 by the number of bits that we have, we have the max representable range of numbers. For a byte, that means:

28 = 2 2 2 2 2 2 2 2 = 256

But, you may be wondering, why does a byte of "11111111" = 255 instead of 256? That's because the first number we represent is the number 0, or "00000000" at a byte. In other words, counting the numbers 0, 1, 2, ..., 255 is a total of 256 numbers. If this doesn't quite make sense, check out the Base Number Systems diagram in Base 2.

Now what if you want to represent a number bigger than 255? Or a number less than 0? Well, because a byte is a container of 8 bits, we can create a different set of rules describing how that byte can represent something else... And herein is where the magic of creating standards - rules all computers follow - comes from. It's like creating your own language between your friends in which you make up words and all agree on what those words mean. So let's start with representing negative numbers. How might we do that? Well, what if we made one of those bits represent the positive or negative sign? If we made the left-most bit represent the sign of the number then we could store a number with the remaining 7 bits. If we exponentiate two by the remaining seven bits we get

27 = 2 2 2 2 2 2 2 = 128

Like before, the maximum number possible is 127 because we need to represent the number 0. But what we've gained is the ability to represent numbers as low as -127! This is referred to in computer science as a "signed integer" because we use one bit to store whether or not the number is positive or negative.

1.6A Byte Representing -127 to 127
0
0
0
0
0
0
0
1

Even with this new convention, we still can't represent that wide of a range of numbers. How would we represent a number less than -127? A number greater than 255? A number like 1.25? Nothing we have thus far explored can do this. But this is where some clever people decided to create a rule using scientific notation to represent these types of numbers. Scientific notation, as a brief refresher is representing a series of numbers times 10 to some exponent. For example:

1.25 = 125 10-2

What this means is we put the decimal place 2 digits to the left of the 5. If we were to say 125 102 we would move it two digits to the right creating 12,500. This nifty system for writing numbers makes it easy to write really big or really small numbers without writing a bunch of digits to show the "place value". For example, writing a billion is 1 109 instead of 1,000,000,000.

Let's say we want to create a rule that allows us to implement scientific notation in our byte. How might we do it? Well, we have to store four different pieces of information 1) whether the number is positive or negative, 2) the number itself, 3) the exponent of 10 of that number, and 4) whether the exponent is positive or negative... But, since we only have 1 or 0 in our bit the exponent will be in base 2 - allowing multiples of 2 as opposed to multiples of 10. A rule we can create to represent this is something computer scientists have called "floating point numbers" and is as follows: the left-most bit represents the sign of the number. The second-left-most bit represents the sign of the exponent. The next three bits represent the exponent. The final three bits represent the base number - also known as the mantissa.

1.7A Byte Representing Scientific Notation
0
0
0
0
0
0
0
1

It takes a little getting used to, but with this example we can store numbers as small as 1 2-7 = 0.0078125 or as large as 7 27 = 896. Unfortunately, due to the tradeoff being made between the bits used for the base number and the bits used for the exponent, we actually can't represent every number in this range. For example, we can't represent the number 9 exactly. We can get a number close to it - for example 8 - which can be represented as 2 22. This lack of precision is known as the "rounding error" of floating point numbers and is something mathematicians and computer scientists take into consideration when using floating point numbers.

Going Deeper

With this whirlwind tour of how we can represent numbers with bits, we have only scratched the surface of what's possible. In future articles we'll explore how a byte can be used to represent letters and words. Another article will discuss using different base bit systems and what that means for computing (8bit, 16bit, 32bit, 64bit...). And another we'll go deeper into boolean algebra and how we can represent logic - NOT, AND, OR - and how it's used in computers. Last but not least, once all the basics are covered we can explore how programming languages like LISP use all of the above concepts to help us make computers do what they do today.