INTERMISSION: Let's talk about linked lists and "container_of." (FREE LESSON)

Printer-friendly versionPrinter-friendly version

What are these "intermission" things?

Nothing major -- in the midst of the subscriber-only content for this (and upcoming other courses), occasionally I'll want to explain something that's so basic and fundamental that it really doesn't qualify as for-pay content, so I'll just toss it out and let everyone have at it. Eventually, I'll probably move it all back to the early free content when I completely rewrite this course. In any case, this tutorial is readable by all. Enjoy.

And why are we doing this?

Given the prevalence of doubly-linked lists in the Linux kernel, it's worth taking a minute and explaining very, very clearly how those linked lists work since, I guarantee it, if you've never understood it, you're going to be surprised. It's quite a clever implementation and, as far as I know, was developed specifically for the kernel to clean up the various and incompatible implementations that were scattered throughout the source. So ... to work.

OBLIGATORY TWEAK: I'm sure Linux kernel guru Robert Love will never forgive me for this but when I first encountered his earlier second edition of "Linux Kernel Development," and read the section where he explained linked lists, I eventually dropped him a short note along the lines of, "Uh ... no." And I'm happy to say, his 3rd edition now explains it correctly. That just shows you how different is the implementation.

Note: For the remainder of this tutorial and for the sake of brevity, I'll just use the word "list" to refer to a doubly-linked list. And I'm still not sure this is the best way to explain kernel linked lists but let's see how well this works.

MORE ADDENDUM: Turns out that Love's 3rd edition still isn't totally accurate, and I cover that here. Time for a rewrite of this page, I think.

What does a list normally look like?

Assuming that everyone has at least used a list at one time or another, the standard implementation is fairly straightforward -- each element in the list is a structure with three fields:

  • a "next" pointer,
  • a "prev" pointer, and
  • some data object to represent the "payload".

All of that should be trivially recognizable, and the only variation is normally whether the data payload at each position in the list is defined specifically for a given data type, or whether it's just a generic void* C pointer that points at the payload.

In addition, sitting above the list will be a single pointer that refers to the "head" of the list (or the first element, depending on how you want to define where the list "begins"), where that pointer will be NULL if the list is currently empty (has no elements).

And, finally, list operations should be obvious -- given the address of a list, you can iterate through the list elements, add and delete elements, and so on. Absolutely nothing exciting here, it should all look familiar. So far, so good?

So how does the kernel handle this?

Kernel linked lists are based primarily on the following simple data structure defined in include/linux/list.h to hold the elements of a list together:

struct list_head {
        struct list_head *next, *prev;
};

And now you're thinking -- what the hell? It's a structure with two pointers (forward and back), but where's the payload? Where's the data? And here's the novelty.

Unlike the lists you're used to where the pointers to hold the list together live outside of the data payload you're tracking, kernel linked lists are implemented by defining that struct list_head structure as part of each payload object that you want to keep track of, as in:

struct my_data {
    int this;
    long that;
    char the_other_thing;
    struct list_head lh;
    ... snip ...
};

So for each object that you want to store on a list, you need to define -- as part of the object itself -- a struct list_head data member, whose job it is to simply point to the previous and next data objects on the list. Which seems confusing but let's finish the picture.

You're also going to need a global (as in, outside the list) object to refer to the list itself just so you know how to get to it, and that's also represented by a struct list_head object which starts off being, well, "empty" until you start adding elements to your list. But what does it mean to initialize your list to be "empty" with that data structure?

In that same header file, you see this chunk of code:

struct list_head {
        struct list_head *next, *prev;
};

#define LIST_HEAD_INIT(name) { &(name), &(name) }

#define LIST_HEAD(name) \
        struct list_head name = LIST_HEAD_INIT(name)

static inline void INIT_LIST_HEAD(struct list_head *list)
{
        list->next = list;
        list->prev = list;
}

which means that a totally empty list is represented by a struct list_head object, both of whose prev and next pointers refer to itself. So, in a funny way, even an empty list needs that out-of-list, global object to show that there's nothing on the list.

On the other hand, if the list does contain elements, you could iterate through the list by simply following the pointers from one list_head structure to the next until you returned to the beginning -- that would represent a full traversal of your doubly-linked list. But if you're still confused about how you get your data, there's a reason for that.

How do you get your payload?

If you've understood everything so far, the one thing that should still be confusing is that, if you start with the "beginning" of a list and simply follow pointers, all you're doing is moving from one list_head structure to the next, which is terrific except that, at each location, what you want is the corresponding data payload. And since you've buried your list pointers inside your actual data, how do you, in a sense, back out to get the address of the payload? And that's where the container_of macro comes in, as defined in include/linux/kernel.h:

/**
 * container_of - cast a member of a structure out to the containing structure
 * @ptr:        the pointer to the member.
 * @type:       the type of the container struct this is embedded in.
 * @member:     the name of the member within the struct.
 *
 */
#define container_of(ptr, type, member) ({                      \
        const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
        (type *)( (char *)__mptr - offsetof(type,member) );})

What the above macro does is, given the address of a structure member, and the name of that member, and the type of the enclosing structure, it gives you back the address of the enclosing structure. (NOTE to self: probably need to expand on this. Shortly.) For people used to user space C programming, note that that macro is based on the offsetof macro you might have seen before, at which point you're starting to see how this works.

In order to iterate through a list, you'll start with the list_head structure that represents the list itself. If its prev and next pointers both refer to itself, that represents an empty list and you're done.

On the other hand, if they're different, that means there's at least one element on the list, so you can do a normal iteration, following pointers from one struct list_head to the next and, at each step, you need to use container_of() to take the address of the simple list_head object to get its enclosing payload object, at which point you can do whatever you want with the payload, then move on to the next element in the list. You're done when you've returned to the "head" of the list.

Let's summarize

In order to pull all this together, let's briefly summarize what just happened so you can appreciate how different is this implementation from what you're used to.

  • Rather than the standard list implementation where you have data payloads and the pointers keeping track of them living outside the payload, the kernel implementation requires you to add prev and next list pointers as part of your payload. In short, data objects to be tracked on a list must carry around their own list pointers.
  • Traversing a list means simply following the list_head pointers from one to the next, at each stage having to determine the address of the actual payload object that it's embedded in.
  • Even an empty list is represented by a single list_head object, both of whose pointers refer to itself.
  • The list_head instance that represents the head of a list does not represent a payload. In other words, if you have a list with nine data elements, it will use 10 list_head objects, since the official head of a list is never meant to represent any actual data -- it's just your starting point.

A simple example

As an actual example, let's see how all of the Linux kernel tasks are tracked since (unsurprisingly) they're kept on a list.

Consider this snippet from arch/x86/kernel/init_task.c, which defines the very first system task:

struct task_struct init_task = INIT_TASK(init_task);
EXPORT_SYMBOL(init_task);

OK, so that's your very first system task. And what does that have to do with a list? Check out this declaration of the task_struct structure from include/linux/sched.h, which represents a single task:

struct task_struct {
        volatile long state;    /* -1 unrunnable, 0 runnable, >0 stopped */
        void *stack;
        atomic_t usage;
        unsigned int flags;     /* per process flags, defined below */
        unsigned int ptrace;
        ... snip ...
        struct list_head tasks;   <-- there!
        ... snip ...
};

As long as I've read this correctly, each existing kernel task has, embedded inside it, a list_head structure that's used to keep it on the single, kernel-wide, list of task structures. So you could, in kernel space, iterate through every task by starting at the initial task address, then following the list pointer to the next one until you returned to the beginning.

Exercise for the reader: I'm fairly sure I've run across an example in the kernel of some code traversing the entire task list, but I don't recall where I saw it. Anyone? Anyone? Bueller?

And in conclusion ...

I'm fairly sure I can add a couple more sections to this tutorial to make it even better but that will have to wait until later today. In the meantime, if you think this was useful, feel free to tweet it or link to it or whatever.

Later.

P.S. Feel free to leave a comment if you have any questions or observations.

ADDENDUM: Some additional observations

A couple more things about the consequences of how the kernel implements linked lists.

First, there is a massive collection of macros and inline routines that simplify linked list usage, testing and traversal in the kernel header file include/linux/list.h. For example:

static inline int list_empty(const struct list_head *head)
{
        return head->next == head;
}

That's clearly a routine that tests whether a linked list is empty, and you can see that "empty" is defined as whether that list_head "head" pointer points to itself. By definition, that means there are no list entries.

Other macros and routines make traversal easy, such as:

#define list_entry(ptr, type, member) \
        container_of(ptr, type, member)

define list_for_each(pos, head) \
        for (pos = (head)->next; prefetch(pos->next), pos != (head); \
                pos = pos->next)

and so on. If you want to see how common linked list usage is in the drivers/ directory:

$ grep -rw list_for_each_entry drivers | wc -l
1981
$

so, clearly, this is how almost everyone implements their linked lists.

Another observation that is obvious once it's pointed out is that any data object or payload that you want to track on a list must explicitly define an internal list_head object for that list. More to the point, a payload is welcome to participate on several lists, but only after defining an internal list_head object for each list. In other words, unlike the payloads you're used to that contain only pure data and could (theoretically) be on an infinite number of lists at once, objects on kernel lists can be only on those lists for which they have an internal set of links. It's an interesting way to lock down what lists you're willing to be a member of.

And it should be obvious that the above design, while initially looking a bit strange, gives you a single, simple implementation that is usable by everyone.

Finally, there's some rationale for the design here.

I'm sure I can go back and tidy up this presentation even further but this should be sufficient for now. Again, there's a comments section below if you want to leave a note.

Comments

nice

Very nice Robert specially for newbies like me.I had discussed the linked list implementation a long time ago(approx 1 year before) with a friend but today I understood when I read this here.

Very Good explanation

Especially the container of () was awesome. Since the linked list implementation can be used in user space too , it becomes very handy. I really appreciate your efforts

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <br> <pre> <h1> <h2> <h3> <h4>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.

We know

We're aware of the time and budget pressures at most companies, normally accompanied by the plaintive cry from management of, "Yes, I know we need training on that topic, but I just can't afford to send my entire team away for three (or four or five) days to get it!" And that's where we come in.

Choices!

The main focus at Crashcourse is to offer a choice of intense, 1-day, hands-on courses on specific topics in Linux and open source. And given that we already have the laptops for the delivery of that training, the idea is to show up early, set up a classroom, then spend the day teaching exactly the topic you're interested in. No travel time, and no wasted classroom time.

Customization

If we don't already have a course that addresses the topic you're interested in, drop us a note and we'll see what we can do -- our content providers can almost certainly put together a course that's precisely what you're after.

The difference

While there are a variety of sources for Linux and open source training, we at Crashcourse are taking a slightly different approach. Our philosophy is simple: exactly the training you want, and no wasted time or travel to get it.