Updates to LKD3
From Crashcourse Wiki
What's this page all about?
This is an unofficial page documenting various errors in, omissions from or changes to "Linux Kernel Development" (3rd ed) by Robert Love, or what I call "LKD3" as oopposed to the earlier "LKD2" edition. I'm throwing this open as a crowdsourcing project so if you want to play along, keep reading.
Not surprisingly, the instant any Linux book is published (especially any Linux kernel book), it slowly but inexorably starts to get out of date and, in the event there is ever an "LKD4", it would be nice to have a comprehensive list of necessary corrections and updates, as well as just being nice to have a list of what's changed in the book, anyway. And the things I'd like to document here fall into a few different categories.
First, there are things that should have been updated since LKD2 but were simply and mistakenly overlooked and, as I was the technical editor, I'd have to take the blame for that. As technical editor, it was my job to dig into the source and make sure it matched the manuscript, but without question, I missed some things so those should be pointed out.
The second category to document are changes that have occurred since my editing and the book publication and, as time goes on, there will certainly be more and more of that. As a simple example, consider this note: Updates_to_LKD3#The_atomic_t_type_loses_its_volatile. It seems like a simple enough change, and it's worth documenting, but let's use that as a demonstration of how you can dig into these changes.
Once you notice a discrepancy between the book and the latest kernel source, you might be able to use the git blame feature to see which git commit was responsible for that change:
$ git blame include/linux/types.h ... snip ... 81880d60 (Anton Blanchard 2010-05-17 14:34:57 +1000 213) int counter; ... snip ...
And once you have the commit, you can list it in its entirety with:
$ git show 81880d60It may not be essential to document every change to this extent, but perhaps it doesn't hurt to understand the rationale behind such things.
As another example, on p. 192 of LKD3, there's a reference to the alleged mutex macro DECLARE_MUTEX, which doesn't exist anymore, so a note should be (and will be) made of that.
Finally, there are things that can be documented that aren't so much errors or updates so much as features that deserve some (or more) mention. These could be features that are either new since LKD3 came out, or are features that have been there for a while but weren't mentioned in the book at all, but perhaps deserve to be.
In any event, if you're feeling ambitious, you can get your own copy of LKD3 and start perusing it, and if you find anything that you think deserves mention, drop me an e-mail at rpjday@crashcourse.ca with as much info as you can pull together, and I'll arrange that it gets added in the appropriate place with suitable credit given to the reporter.
Simple errors/typoes
Figure 3.2 (p. 26)
"struct thread_struct" should read "struct thread_info"
(Reported by Julie Sullivan, kernelnewbies list)
Page 94, reference to inode_find_handle()
Apparently, no such routine exists anymore, and hasn't for quite some time, so pick another example.
New kernel features/updates since book publication, by chapter
Introduction to the Linux Kernel
Getting Started with the Kernel
Kernel configuration with Qt-based make xconfig (p. 14)
In addition to mentioning the Gtk-based make gconfig command, it's worth mentioning the Qt-based make xconfig.
Installing the New Kernel (p. 16)
A couple observations here. First, with GRUB version 2, the configuration file is now /boot/grub/grub.cfg and you wouldn't edit it manually, you would instead install the new kernel and modules and run update-grub, as you would do these days on newer versions of Ubuntu.
Also, rather than just install the new modules with:
# make modules_install
it's worth knowing that you can save piles of disk space by stripping those module files during installation with:
# make INSTALL_MOD_STRIP=1 modules_install
Usage of inline (pp. 18-19)
This is currently a bit of a mess, with usage of all of inline, __inline and __inline__, as well as the usage of __always_inline and now, in the drivers/staging directory, defines of the form:
#define INLINE __inline
as if things weren't confusing enough. If this is ever clarified, it really needs to be explained.
Process Management
Process Scheduling
ENQUEUE_MIGRATE is now ENQUEUE_WAKING (p. 54)
Change in kernel/sched_fair.c based on:
commit 371fd7e7a56a5c136d31aa980011bd2f131c3ef5
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Wed Mar 24 16:38:48 2010 +0100
sched: Add enqueue/dequeue flags
In order to reduce the dependency on TASK_WAKING rework the enqueue
interface to support a proper flags field.
Replace the int wakeup, bool head arguments with an int flags argument
and create the following flags:
ENQUEUE_WAKEUP - the enqueue is a wakeup of a sleeping task,
ENQUEUE_WAKING - the enqueue has relative vruntime due to
having sched_class::task_waking() called,
ENQUEUE_HEAD - the waking task should be places on the head
of the priority queue (where appropriate).
For symmetry also convert sched_class::dequeue() to a flags scheme.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This section probably deserves some expansion.
(Reported by Julie Sullivan, kernelnewbies list.)
System Calls
Kernel Data Structures
Linked lists
What makes the kernel linked list unique?
Personally, I think there's a different way to explain what makes the kernel linked list implementation so novel. The standard implementation defines the linked list node with one or two pointers, and typically a void * pointer to the payload -- that is, the actual data to be tracked. This has a couple fairly obvious consequences.
First, and trivially, the nodes of the linked list structure live outside of the payload; the payload data itself for each list node exists elsewhere and the list nodes are simply keeping track of that data. And given that the payload data is exactly that -- pure data -- it's easy to see that any single payload is capable of being on an unlimited number of different lists. All of that should be obvious.
The kernel implementation changes this drastically by adding the pointer/link structure:
struct list_head {
struct list_head *next;
struct list_head *prev;
};
into each object you want to track, as in:
struct fox {
... all my data member fields ...
struct list_head animals;
};
and this has two obvious consequences.
The first is that any data object you want to store on a list has to carry with in, in its own structure definition, a structure that links it to another payload object on that list. No longer do lists keep track of payload objects elsewhere; instead, payload objects embed their links inside their definition.
There is another consequence that isn't quite so obvious, until it's pointed out. Unlike the original data-only payloads, since each data object now needs to embed its link information, those payload objects can no longer participate on an unlimited number of lists. Instead, each payload has to explicitly define a struct list_head member field for each list of which it wants to be a member. In short, by simply looking at a structure definition, you can tell what lists that structure is allowed to be on.
A perfect example is from <linux/sched.h>, in the task_struct structure, where you can see just a subset of the number of different lists that this structure is supposed to be on:
struct task_struct {
... snip ...
struct list_head rcu_node_entry;
struct list_head tasks;
struct list_head children;
struct list_head sibling;
... etc etc ...
};
As you can see, given that a potential payload data object has to embed its own sibling pointer(s), a structure will have to explicitly define a member field for each list on which it wants to participate.
What's so special about a linked list "head" node?
It's important to understand how the head node of a kernel linked list works. Every linked list is defined by a head node that, while it is part of the normal pointer structure -- that is, it's on the list -- it does not represent any data; that is, it is not enclosed by any payload structure. This means that, when you're iterating through a linked list, jumping from one struct list_head to the next, you must explicitly know which of those nodes represents the head of the list so that you can avoid processing it.
That this is true can be seen from the definition of the list_for_each() routine in list.h:
#define list_for_each(pos, head) \
for (pos = (head)->next; prefetch(pos->next), pos != (head); \
pos = pos->next)
Note how, given the head of a list and a variable to be used for the iterator, that iteration clearly starts processing list elements after the head node, and terminates at the final list element before the head node, thereby avoiding processing that head node.
Another example that drives this home can be seen in the source file net/sctp/associola.c, where an infinite list iteration loop clearly has to skip the head node each time it runs across it:
while (1) {
/* Skip the head. */
if (pos->next == head)
pos = head->next;
else
pos = pos->next;
... snip ...
An empty list is defined by a head node whose next pointer simply points back to itself, as you can see from list.h:
/**
* list_empty - tests whether a list is empty
* @head: the list to test.
*/
static inline int list_empty(const struct list_head *head)
{
return head->next == head;
}
One last obvious observation: you must always keep track of the head node of a list. If you forget which node in a list is the head node, there's no way to recover that information and bad things will happen.
Incorrect description of head nodes on pp. 90-91
If you accept all of the above, it seems that the way head nodes are described in LKD3 is inaccurate. On p. 90, we read that a head node is "a special pointer that refers to your linked list, without being a list node itself." But as we've already seen, a head node is a member of the list; it just needs to be recognized as the head node so that it is never dereferenced to return its (non-existent) data container.
Page 91 also seems wrong, stating that, "because the lists are circular, you can generally pass any element for head." That can't be correct -- it's critically important to never lose track of where the head node of a list is, and to treat it differently from the rest of the nodes.
Page 93 could also use some tweaking since it claims that traversing a linked list iterates until each entry has been visited," whereas I'd add the qualifier that that doesn't include the head node.
The list_is_last() macro
Dave Hylands points out another useful macro defined in list.h that tells you whether you're at the last element in a list just before the head node:
static inline int list_is_last(const struct list_head *list,
const struct list_head *head)
{
return list->next == head;
}
The definition of that macro should be obvious, but one wonders how useful it is since the standard list iteration routines will stop at that point, anyway. Obviously, it would make sense if there's something you need to do immediately upon recognizing that you're at the end of a list that can't wait for after the body of the loop.
(As an aside, Boolean tests like that really should be defined to return Boolean values, not ints.)
Sorting loop entries
Another feature I'd never noticed before is the ability to "sort" the entries in a loop, as defined in list_sort.h:
#include <linux/types.h>
struct list_head;
void list_sort(void *priv, struct list_head *head,
int (*cmp)(void *priv, struct list_head *a,
struct list_head *b));
There doesn't appear to be a lot of use of that facility, so one wonders how much kernel code is sorting list entries manually, unaware that that feature exists.
As one possible example, consider this snippet from drivers/base/bus.c:
static void
device_insertion_sort_klist(
struct device *a,
struct list_head *list,
int (*compare)(const struct device *a,
const struct device *b))
{
struct list_head *pos;
struct klist_node *n;
struct device_private *dev_prv;
struct device *b;
list_for_each(pos, list) {
n = container_of(pos, struct klist_node, n_node);
dev_prv = to_device_private_bus(n);
b = dev_prv->device;
if (compare(a, b) <= 0) {
list_move_tail(&a->p->knode_bus.n_node,
&b->p->knode_bus.n_node);
return;
}
}
list_move_tail(&a->p->knode_bus.n_node, list);
}
I'm wondering if that's not somehow rewritable using the list sort library routine.
Debugging list manipulation with CONFIG_DEBUG_LIST
There is a kernel CONFIG variable CONFIG_DEBUG_LIST that adds a certain level of debugging to kernel linked lists, but I have to admit I'm a bit confused by it. Here's a snippet from list.h:
#ifndef CONFIG_DEBUG_LIST
static inline void list_del(struct list_head *entry)
{
__list_del(entry->prev, entry->next);
entry->next = LIST_POISON1;
entry->prev = LIST_POISON2;
}
#else
extern void list_del(struct list_head *entry);
#endif
Obviously, if you don't select that option, the implementation of list_del() is fairly simple, whereas if you do select it, you get this implementation from lib/list_debug.c:
void list_del(struct list_head *entry)
{
WARN(entry->next == LIST_POISON1,
"list_del corruption, next is LIST_POISON1 (%p)\n",
LIST_POISON1);
WARN(entry->next != LIST_POISON1 && entry->prev == LIST_POISON2,
"list_del corruption, prev is LIST_POISON2 (%p)\n",
LIST_POISON2);
WARN(entry->prev->next != entry,
"list_del corruption. prev->next should be %p, "
"but was %p\n", entry, entry->prev->next);
WARN(entry->next->prev != entry,
"list_del corruption. next->prev should be %p, "
"but was %p\n", entry, entry->next->prev);
__list_del(entry->prev, entry->next);
entry->next = LIST_POISON1;
entry->prev = LIST_POISON2;
}
What's confusing is that, even without that kernel parameter, the regular implementation of that routine poisons the deleted node's link pointers. Why?
Queues/kfifo (MORE TO COME)
I may have more to say here, but it's worth pointing out that the kernel source tree now has a samples/kfifo directory.
Yes, there's a fair bit more to say about queues that isn't in LKD3. As a single example, there's a macro DEFINE_KFIFO() that combines the functionality of DECLARE_KFIFO() and INIT_KFIFO(), as well as more routines, so this section can be expanded considerably.
Maps (TO DO)
Binary trees (TO DO)
Interrupts and Interrupt Handlers
Bottom Halves and Deferring Work
An Introduction to Kernel Synchronization
Since this chapter is simply an introduction to the principles of kernel synchronization, I'm not sure there's much here that will need updating. The actual kernel mechanisms for concurrency and synchronization are in the next chapter.
Kernel Synchronization Methods
The atomic_t type loses its volatile
On p. 176, the typedef for atomic_t is shown as:
typedef struct {
volatile int counter;
} atomic_t;
However, these days, that same typedef is in the header file <linux/types.h> as simply:
typedef struct {
int counter;
} atomic_t;
If you want the rationale behind the change:
$ git show 81880d60
commit 81880d603d00c645e0890d0a44d50711c503b72b
Author: Anton Blanchard <anton@samba.org>
Date: Mon May 17 14:34:57 2010 +1000
atomic_t: Remove volatile from atomic_t definition
... snip ...
DECLARE_MUTEX and init_MUTEX don't exist anymore
On pp. 192-3 in the section on semaphores, there are references to DECLARE_MUTEX, init_MUTEX and init_MUTEX_LOCKED that no longer exist. These have been replaced by DEFINE_MUTEX.
It might be worth breaking out a section on mutexes all on its own, separate from general semaphores, rather than having it a bit scattered as it is now.
Timers and Timer Management
"A Tickless OS"
On p. 212, the side note refers to the option to configure a tickless kernel, CONFIG_HZ. That should probably read CONFIG_NO_HZ.
On a more general note, given that tickless behaviour appears to be the standard these days, I suspect a good chunk of this chapter could be rewritten to reflect that.
Memory Management
It looks like there can be a fair bit added here so this is just a first pass; it will almost certainly be reorganized as more content is added.
Pages (pp. 231-2)
I have to admit, I'm still a little puzzled as to how the count fields work in a struct page, especially with the difference between head and tail, and compound pages defined in <linux/page-flags.h>. This will require more reading.
Zones (p. 233)
That section in the book describes four "primary" memory zones defined in <linux/mmzone.h> -- ZONE_DMA, ZONE_DMA32, ZONE_NORMAL and ZONE_HIGHMEM -- then refers to "two other, less notable ones." But, AFAICT, the only other zone defined in that header file is ZONE_MOVABLE; as in, just the additional one.
Is there really just that single additional zone? And might it be worth explaining, if only briefly, even if there is no further reference to it?
The Virtual Filesystem
The Block I/O Layer
Anticipatory scheduler (pp. 302-3)
Now gone:
commit 492af6350a5ccf087e4964104a276ed358811458
Author: Jens Axboe <jens.axboe@oracle.com>
Date: Sat Oct 3 09:37:51 2009 +0200
block: remove the anticipatory IO scheduler
AS is mostly a subset of CFQ, so there's little point in still
providing this separate IO scheduler. Hopefully at some point we
can get down to one single IO scheduler again, at least this brings
us closer by having only one intelligent IO scheduler.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
(Reported by Mulyadi Santosa, kernelnewbies list.)
The Process Address Space
The Page Cache and Page Writeback
Devices and Modules
Debugging
kdb
Yes, we probably should have a short discussion of kdb here. Coming soon ...
Portability
Patches, Hacking and the Community
Things that deserve more emphasis
This section is just my personal perspective on what beginning kernel programmers should know more about; not directly related to the book.
Useful macros in <linux/kernel.h> and other header files
Way too many programmers re-invent the wheel in terms of simple macros for alignment, integer values, determining the size of an array and so on. Take some time and peruse that header file to see what's already defined for you.
Other header files in that directory that might be useful:
- log2.h, for power-of-2 macros
- ... more as I think of them ...
getconf
I might be tempted to demonstrate the use of the getconf command fairly early. It can then be used later on to do things like display the machine page size with:
$ getconf PAGESIZE
or everything with:
$ getconf -a

