Straight Stuff on Structures in the C Programming Language

Straight Stuff on Structures in the C Programming Language

What is a “struct”? How does it work? How should a C programmer think about them?

All but the simplest C programs are filled with structures. C was written to be a replacement for Unix’s original assembly language. Since its inventors decided that the semantics of the C language should reflect the actual machine and nothing more, the struct is merely a way to describe a block of variables, just like assembly language programmers did with “control blocks.” But since most of us haven’t done much assembly language programming, I’ll explain structs by starting with something more relatable—the array.

To understand structs and arrays is to understand the unifying idea that all of the computer’s memory, where variables and instructions live, is one big array of bytes. I’ll refer to the memory array as the “mem.”

So, what is a variable? A variable is an area of memory we use to hold a value. In the machine I’m typing this post on (a 64 bit Plan 9 system running on an Intel XEON processor), each integer is a concatenation of four bytes. When I want to set that variable to zero, as in

   int a;
   a = 0;

the CPU stores four bytes of zeros into the memory locations allocated by the C compiler to hold my variable “a.” We say the address of a is at location so-and-so.

All Memory is an Array

I could also have an array of integers, say 100 of them. I would declare this array with the following statement:

   int ia[100];

I can then use a number to select which integer in the array I want. I could set one of them to zero like so:

   ia[42] = 0;

A single integer in an array is called an element. The 42 in this example is the index. We could, of course, use an integer variable instead of the constant as a selector.

The idea of an array has many analogies in the real world. One of the best is post office boxes. Each box is numbered and identical to the other boxes. The selection of which “box” to use is simply the number of the element.

Actually in C the number is the offset to the box. The first element of an array in C is at index zero, so the first element in “ia” is “ia[0],” the second is at “is[1],” and so on. The indexes are offsets to the element from the beginning of the array.

The idea of an index as being an offset to an element is based on how computer memory works. Our computer programs run in main memory that lives ethereally on the DDR sticks plugged into your machine’s motherboard. The first byte in all of memory has an address (computerese for index) of 0, the next byte is at address 1, and so forth. The address is an index offset.

So, I can think of all of memory on my machine as

   char mem[274877906944];

(Don’t try that at home. You can’t declare an array that large.)

So our arrays are just segments of the larger mem array, and our indexes are offsets from the first mem element where the compiler put our array.

Bear with me on this. It’s a bit detailed, but if you can get through it, you’ll better understand what’s going on in C. Hopefully the effort will be worth your while.

My array “ia” is given by the C compiler an offset into memory, let’s say at location 10,000. When I type

   ia[42] = 0;

the compiler takes the value 42, adds it to the value 10,000 (the offset to the first element in the array), and produces an address (or index) of 10,042. There we deposit four bytes of zeros since our “int” is 32 bits long.

Structures are Funny Arrays

Arrays are aggregates of homogeneous variables, and are referenced by a numerical offset to a given variable. Structures, on the other hand, are not necessarily made of homogeneous elements, but have the potential to be heterogeneous. As a result, it’s not very useful to use an offset to talk about the member. So Dennis decided to use a name for each variable in the structure.

As an example, here is a structure used to keep track of variables in a C compiler itself:

   struct Symbol
   {
      char                name[64];      /* the name of our variable */
      int                   type;               /* what kind of symbol it is */
      long long       addr;              /* where the symbol is */
   };

This doesn’t declare a variable but defines a new variable type, a structure with a tag of “Symbol.” A structure tag is a kind of type, but only for structures. We can declare a single variable of type “struct Symbol” as follows.

     struct Symbol symb;

We now have a variable (“symb”) that is made up of three variables (name, type, addr). Each of these variables is referred to as a field. A field is the element of a structure.

In this example the types are arrays of 64 characters, an integer variable of 4 bytes, and a 8 byte long long variable. The total size of our structure is 76 bytes.

To select a particular variable we have to use a different notation than we used to select an array element. We can’t use an integer, so instead we use the name of the field we are interested in.

To set the field with the name “type” to 9, for example, we would say,

     symb.type = 9;

The “dot” notation is used to tell the compiler that we are talking about the integer variable named “type.”

Interestingly, what’s going on in the background is very much like what went on with our array. The structure variable “symb” is declared at some location in memory, let’s say 9,000. Each member of a structure is at an offset from the beginning of the structure. That offset is a function of the variables sizes that went before it. Our 32-bit integer variable is at offset 64 because the 64 byte character array appears just before it. To get to our integer, we add 64 to the offset in the memory in which our structure variable begins resulting in an offset into memory of 9,064.

Arrays of Structs

We can also have an array of structure, declared as follows:

     struct Symb symtab[1000];

We then can use both the array notation and the structure notation. To set the 123 entry to type 42 we say

     symtab[123].type = 42;

I’ll let you do the math. (Assuming location of 20,000 for the beginning of the array, the location of our integer is 20,000 + 123 * 76 + 64.)

Pointing out Pointers

In reality we most often point to structure element with the “points to” notation. If we define a pointer type

     struct Symb *sp;

we can use it to loop over all the structure in the table with the following loop.

     for (sp = symtab; sp < &symtab[1000]; sp++)
          if (sp->type == 42)
               print("life, universe and everything\n");

Here we see the “address of” operator (the ampersand) to check for the end of the array. We also see the “points to” operator (the “->” symbols) because we are using a pointer instead of a variable. The variable “sp” holds the offset in main memory for the structure and the ->name adds the offset of name to that value, giving the address in memory for our variable.

Cleaner Structs using Typedef

As an aside, when using C structure in Plan 9 we use a feature of C to make the structure type a real type. The “typedef” operator allows us to add a type to the compile. It was invented to parameterize the first Unix port to a non-PDP–11 machine. But, it was defined so well, it works for structures too. I would really type the above like

     typedef struct Symb Symb;

which defines a new type “Symb” to be a structure with tag “Symb.” I would then declare variables using just the “Symb” and avoid cluttering things up with the “struct” keyword everywhere. As in

     Symb symtab[1000];

Whew! That was a lot, but hopefully you have the idea. A block of variables we want to treat as a unit can be organized by collecting them into a structure definition. The memory model gives us a way to reduce the complexity to a simple unifying way to think about both structures and arrays.

For more information on structures, see Brian and Dennis’ book The C Programming Language. They explain this a whole bunch better than I just did. In my defense, no one explains technology as well as Brian.

Yaroslav Kolomiiets

Production Engineering Manager at Meta

9y

This makes a good read - thank you. What kind of Plan 9 do you use to run on Xeon - is it Nix or something? One note: in your example you stated that, assuming (long)ia==10000 the address of ia[42] is 10,042 which is not precisely that: the offset is multiplied by sizeof element, so having ia declared as an array of ints, (long)&ia[42] would evaluate to 10,168.

Like
Reply

To view or add a comment, sign in

More articles by Brantley Coile

  • How OSs Work: Scheduling

    Surely the essence of any operating system is something called the "scheduler." How does the computer run all those…

    7 Comments
  • What is a File? What is a block?

    We use them every day. Sometimes we call them folders and file.

    2 Comments
  • Why Some Programming Languages Win

    And others loose. It's no secret that we at Coraid run pretty much a C monoculture.

    10 Comments
  • 400,000,000 DISKS A YEAR, ALL ALIKE

    In the early computer game adventure, there were two mazes, one with twisty passages all different, and one with twisty…

  • Intel Flubs Again

    The Wall Street Journal reports that Intel is telling motherboard vendors not to install its latest fixes to mitigate…

    6 Comments
  • The Lowdown on Intel's Meltdown

    I've read a lot about the Meltdown CPU security exploit over the past week, but I've not read a simple, clear…

    2 Comments
  • Holiday Lights

    HOLIDAY LIGHTS My earliest memories of Christmas are all about the lights. In the early 1960's, the Christmas tree was…

    4 Comments
  • Happy New Year from Coraid

    Everyone here at Coraid/SouthSuite is wishing everyone a very Happy New Year. 2017 promises to be a very exciting one…

  • New Driver for VMware ESX 6.0

    I’m so pleased to announce the newest version of Coraid’s VMware ESX EtherDrive HBA driver for VMware’s 6.0.

    7 Comments
  • Coraid SRX OS Gets a Memory Lift

    How about a 256 GB Coraid SRX for your Ethernet SAN Flash system? In the past weeks I have been overhauling an…

Insights from the community

Others also viewed

Explore topics