- Introduction: Lecture Note 1
- LAMP, virtualization, containers, datacenter computing infrastructure,
- Review of Intel architecture based on Intel 64 and IA-32 Architectures
Software Developer's Manual (4684 pages! as of 9/1/2020),
- Short recap of Intel and AT&T Linux assembly
- "Hello, World!" in Intel assembly, Linux assembly, assembly in-line programming with C
- Setting up LXR (Linux cross referencer http://lxr.sourceforge.net/en/index.php) on your laptop
- Preparatory steps: Lecture Note 2
- Compiling the kernel, which will take hours the first time
- Module programming - see The Linux Kernel Module Programming Guide
- Writing your own system calls, adding to the syscall_64.tbl
- Booting - machine BIOS, disk MBR, Grub Linux loader, preliminary setup (setup(), startup_32/64() 1 and 2)
- Overview of kernel startup and initialization - start_kernel()
- Memory - initialization
- Overview of memory spaces: logical segmentation, linear virtual, actual physical
- Detecting BIOS-provided e820 physical RAM map: detect_memory()
- Converting to memblocks: setup_arch(): e820__memory_setup(), e820__end_of_ram_pfn(), e820__memblock_setup(), init_mem_mapping(), initmem_init()
- Memory - Page allocator - paging, buddy system, setting up page directories (global, upper, middle), tables and PTEs
- (N)UMA, nodes, zones (DMA .. Normal), memory types (Unmovable .. CMA .. Isolate), free_areas
- Physical memory models: flatmem, discontigmem, and sparsemem; mem sections and subsections, pageblocks
- Setting up buddy system: x86_init.paging.pagetable_init(), paging_init(), zone_sizes_init();
- Allocating 1 to 1K contiguous pages from buddy system: __get_free_pages() to __rmqueue_smallest(), fastpath to slowpath to very slow path with swapping and compaction
- Freeing pages: free_pages() to __free_one_page(), fast path to slow path
- Memory - Slab allocator - setting up kmalloc_caches (slabs) for small objects of 8 bytes to 8 KB to large non-contiguous memory space
- Setting up slabs for small memory objects: mm_init() to kmem_cache_init() for initializing general purpose kmalloc_caches():
- Allocating small memory chunks: kmalloc() of 8 bytes to 8 KB from slabs: fastpath to slowpath to very slowpath
Test 1, 4-5:15 pm, Thur, 2/16/2023.
- Memory - Slab allocator - continues
- Freeing small memory chunks: kfree() to return to slabs, fast to slow path
- User malloc(): will discuss briefly or provide pointers to read on your own as it's simple compared to __get_free_pages and kmalloc if time permits.
Process - structrures, organization, initialization
- Structures: thread union, thread info, stack, task, and thread struct, PID0 (swapper)
- Macros to initialize PID0: INIT_THREAD_INFO(init_task), INIT_TASK(), INIT_TASK_TI(), ...
- Initial hardcoded structs: init_task, init_stack, init_mm, init_fs, etc.
- Process - creating the first five kernel threads
- Creating kernel threads: kernel_thread() - copy and insert to RB tree, create_kthread(), hot plug thread
- P0 creating PID1 (init) and PID2 (kthreadd) using kernel_thread(),
- P1 and P2 creating PID3 (softirqd), PID4 (migrationd) using create_kthread() through kthread_create_list
- Hot plug threads for creating P3 and on using kthread
- Sync mechanisms between P0, P1, and P2 for creating P1 and P2; and between (P1,P2) and (P3,P4,...) for creating P3 and on.
- Process - process scheduling (do_fork() to schedule() to rb_entry())
- Priority, nice value, weight, delta, weighted delta, actual runtime, virtual runtime, minimum vruntime, scheduling period, schedule slice
- schedule() - configurable scheduling policies: completely fair scheduler CFS, realtime RT, deadline DL
- IF TIME PERMITS, brief intro to asymmetric multiprocessing using PELT (Per Entity Load Tracking) and/or WALT (Window Assisted Load Tracking) for big.Little architectures such as Apple M1 and Intel 16-core Alder Lake (8 performance and 8 efficient cores)
- Process - process scheduling and switching (schedule() to __switch_to())
- Scheduling processes with red-black tree: pick_next_task(), put_prev_task;
- Context switches
- Switching to suspended process
- Interrupts - overview: PICs, APICs, exceptions (traps) and hard interrupts, IDTs, soft interrupts, ksoftirqd
- Hardware organization: Programmable Inerrupt Controller, interrupt vectors, CPU interrupts and interrupt acknowlege to interrupt handlers.
- Initialization: sort_main_extable(), trap_init(), init_IRQ(), softirq_init(), initializing IDTs, irq_desc and softriq_vec
- Exceptions: exception handlers, do_trap() to do_exit() if no trap handler
Test 2, 4-5:15 pm, Thur, 3/30/2023.
- Interrupts - hard interrupts
- Registering interrupt handlers request_irq() to do_IRQ() to handle_level_irq(), edge vs level trigger
- IDT handler asm_common_interrupt, asm idtentry_irq, asm error_entry
- C common_interrupt, device driver action->handler(), raising softirq invoke_softirq(), softirq_vec action
- Soft interrupts: softirq_init(), ksoftirqd
- Hardirq (schedule,producer,top half) vs softirq (action,consumer,bottom half)
- ksoftirqd (kernel thread 3, P3), do_softirq(), softirq action, run_timer_softirq() as an example
- timer interrupts, soft interrupts, ksoftirqd, run_timer_softriq
NOTE: By now, I find that most of you are completely overwhelmed. It is expected because memory, processes and interrupts are the three pillars of the kernel which consists of well over 20 millions of lines of C and assembly. But the two topics below, file systems and networking, are exciting because they are two applications of the kernel, what you have learned so far. They are complex and difficult. More often than not I find myself discussing only one of them based on your preference. So be warned.
- File system - virtual file system, block IO, elevator scheduler, device driver, softirq, timer, delayed work, kblockd_workqueue
Note: Kernel-level file system is a semester course. Covering file system in a week or two is simply impractical.
- initialization: vfs_caches_init, mnt_init: Virtual file system VFS
- Registering, mounting
- Scheduler - completely fair queuing
File system - device driver, Ext4 example (vfs_read() to to scsi_dispatch_cmd())
- submit_bio(), scheduler
- The big loop in time and space: timer, softirq, delayed work, kblockd_workqueue, kworkers
- Ext4 disk organization: MBR, superblock, group descriptors, bitmaps, inode table, data blocks
- Networking - receiving packets
Note: Kernel-level networking is a semester course. There is even a networking kernel book. Covering kernel-level networking in a week or two is simply impractical.
- receiving packets: softnet data, input_pkt_queue, process_queue, budget
- receiving packets: NIC, ISR, Softirq, IP, TCP, Inet, BSD, User
- User BSD sockets read(), tcp_recvmsg, , Inet socks, TCP and IP layer
- Softirq: net_rx_action, ip_rcvmsg, tcp_recv_msg
Networking - sending packets
- sending packets: output_queue, User, BSD, Inet, TCP, IP, Softirq, Qdisc, ISR, NIC
- User BSD sockets, Inet socks, TCP and IP layer
- Qdisc - p/bfifo packet/byte based FIFO queueing discipline, ISR interrupt service routine, NIC
- Softirq - dequeue, transmit, ISR interrupt service routine, NIC, requeue, delayed timer
- Final exam (week15):
See the registrar's page: http://www.njit.edu/registrar