releases/2.6.30.6/tracing-fix-too-large-stack-usage-in-do_one_initcall.patch - pub/scm/linux/kernel/git/stable/stable-queue - Git at Google

 From 4a683bf94b8a10e2bb0da07aec3ac0a55e5de61f Mon Sep 17 00:00:00 2001
 From: Ingo Molnar <mingo@elte.hu>
 Date: Fri, 21 Aug 2009 12:53:36 +0200
 Subject: tracing: Fix too large stack usage in do_one_initcall()

 From: Ingo Molnar <mingo@elte.hu>

 commit 4a683bf94b8a10e2bb0da07aec3ac0a55e5de61f upstream.

 One of my testboxes triggered this nasty stack overflow crash
 during SCSI probing:

 [    5.874004] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 [    5.875004] device: 'sda': device_add
 [    5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
 [    5.878004] IP: [<b1008321>] print_context_stack+0x81/0x110
 [    5.878004] *pde = 00000000
 [    5.878004] Thread overran stack, or stack corrupted
 [    5.878004] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 [    5.878004] last sysfs file:
 [    5.878004]
 [    5.878004] Pid: 1, comm: swapper Not tainted (2.6.31-rc6-tip-01272-g9919e28-dirty #5685)
 [    5.878004] EIP: 0060:[<b1008321>] EFLAGS: 00010083 CPU: 0
 [    5.878004] EIP is at print_context_stack+0x81/0x110
 [    5.878004] EAX: cf8a3000 EBX: cf8a3fe4 ECX: 00000049 EDX: 00000000
 [    5.878004] ESI: b1cfce84 EDI: 00000000 EBP: cf8a3018 ESP: cf8a2ff4
 [    5.878004]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 [    5.878004] Process swapper (pid: 1, ti=cf8a2000 task=cf8a8000 task.ti=cf8a3000)
 [    5.878004] Stack:
 [    5.878004]  b1004867 fffff000 cf8a3ffc
 [    5.878004] Call Trace:
 [    5.878004]  [<b1004867>] ? kernel_thread_helper+0x7/0x10
 [    5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
 [    5.878004] IP: [<b1008321>] print_context_stack+0x81/0x110
 [    5.878004] *pde = 00000000
 [    5.878004] Thread overran stack, or stack corrupted
 [    5.878004] Oops: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC

 The oops did not reveal any more details about the real stack
 that we have and the system got into an infinite loop of
 recursive pagefaults.

 So i booted with CONFIG_STACK_TRACER=y and the 'stacktrace' boot
 parameter. The box did not crash (timings/conditions probably
 changed a tiny bit to trigger the catastrophic crash), but the
 /debug/tracing/stack_trace file was rather revealing:

         Depth    Size   Location    (72 entries)
         -----    ----   --------
   0)     3704      52   __change_page_attr+0xb8/0x290
   1)     3652      24   __change_page_attr_set_clr+0x43/0x90
   2)     3628      60   kernel_map_pages+0x108/0x120
   3)     3568      40   prep_new_page+0x7d/0x130
   4)     3528      84   get_page_from_freelist+0x106/0x420
   5)     3444     116   __alloc_pages_nodemask+0xd7/0x550
   6)     3328      36   allocate_slab+0xb1/0x100
   7)     3292      36   new_slab+0x1c/0x160
   8)     3256      36   __slab_alloc+0x133/0x2b0
   9)     3220       4   kmem_cache_alloc+0x1bb/0x1d0
  10)     3216     108   create_object+0x28/0x250
  11)     3108      40   kmemleak_alloc+0x81/0xc0
  12)     3068      24   kmem_cache_alloc+0x162/0x1d0
  13)     3044      52   scsi_pool_alloc_command+0x29/0x70
  14)     2992      20   scsi_host_alloc_command+0x22/0x70
  15)     2972      24   __scsi_get_command+0x1b/0x90
  16)     2948      28   scsi_get_command+0x35/0x90
  17)     2920      24   scsi_setup_blk_pc_cmnd+0xd4/0x100
  18)     2896     128   sd_prep_fn+0x332/0xa70
  19)     2768      36   blk_peek_request+0xe7/0x1d0
  20)     2732      56   scsi_request_fn+0x54/0x520
  21)     2676      12   __generic_unplug_device+0x2b/0x40
  22)     2664      24   blk_execute_rq_nowait+0x59/0x80
  23)     2640     172   blk_execute_rq+0x6b/0xb0
  24)     2468      32   scsi_execute+0xe0/0x140
  25)     2436      64   scsi_execute_req+0x152/0x160
  26)     2372      60   scsi_vpd_inquiry+0x6c/0x90
  27)     2312      44   scsi_get_vpd_page+0x112/0x160
  28)     2268      52   sd_revalidate_disk+0x1df/0x320
  29)     2216      92   rescan_partitions+0x98/0x330
  30)     2124      52   __blkdev_get+0x309/0x350
  31)     2072       8   blkdev_get+0xf/0x20
  32)     2064      44   register_disk+0xff/0x120
  33)     2020      36   add_disk+0x6e/0xb0
  34)     1984      44   sd_probe_async+0xfb/0x1d0
  35)     1940      44   __async_schedule+0xf4/0x1b0
  36)     1896       8   async_schedule+0x12/0x20
  37)     1888      60   sd_probe+0x305/0x360
  38)     1828      44   really_probe+0x63/0x170
  39)     1784      36   driver_probe_device+0x5d/0x60
  40)     1748      16   __device_attach+0x49/0x50
  41)     1732      32   bus_for_each_drv+0x5b/0x80
  42)     1700      24   device_attach+0x6b/0x70
  43)     1676      16   bus_attach_device+0x47/0x60
  44)     1660      76   device_add+0x33d/0x400
  45)     1584      52   scsi_sysfs_add_sdev+0x6a/0x2c0
  46)     1532     108   scsi_add_lun+0x44b/0x460
  47)     1424     116   scsi_probe_and_add_lun+0x182/0x4e0
  48)     1308      36   __scsi_add_device+0xd9/0xe0
  49)     1272      44   ata_scsi_scan_host+0x10b/0x190
  50)     1228      24   async_port_probe+0x96/0xd0
  51)     1204      44   __async_schedule+0xf4/0x1b0
  52)     1160       8   async_schedule+0x12/0x20
  53)     1152      48   ata_host_register+0x171/0x1d0
  54)     1104      60   ata_pci_sff_activate_host+0xf3/0x230
  55)     1044      44   ata_pci_sff_init_one+0xea/0x100
  56)     1000      48   amd_init_one+0xb2/0x190
  57)      952       8   local_pci_probe+0x13/0x20
  58)      944      32   pci_device_probe+0x68/0x90
  59)      912      44   really_probe+0x63/0x170
  60)      868      36   driver_probe_device+0x5d/0x60
  61)      832      20   __driver_attach+0x89/0xa0
  62)      812      32   bus_for_each_dev+0x5b/0x80
  63)      780      12   driver_attach+0x1e/0x20
  64)      768      72   bus_add_driver+0x14b/0x2d0
  65)      696      36   driver_register+0x6e/0x150
  66)      660      20   __pci_register_driver+0x53/0xc0
  67)      640       8   amd_init+0x14/0x16
  68)      632     572   do_one_initcall+0x2b/0x1d0
  69)       60      12   do_basic_setup+0x56/0x6a
  70)       48      20   kernel_init+0x84/0xce
  71)       28      28   kernel_thread_helper+0x7/0x10

 There's a lot of fat functions on that stack trace, but
 the largest of all is do_one_initcall(). This is due to
 the boot trace entry variables being on the stack.

 Fixing this is relatively easy, initcalls are fundamentally
 serialized, so we can move the local variables to file scope.

 Note that this large stack footprint was present for a
 couple of months already - what pushed my system over
 the edge was the addition of kmemleak to the call-chain:

   6)     3328      36   allocate_slab+0xb1/0x100
   7)     3292      36   new_slab+0x1c/0x160
   8)     3256      36   __slab_alloc+0x133/0x2b0
   9)     3220       4   kmem_cache_alloc+0x1bb/0x1d0
  10)     3216     108   create_object+0x28/0x250
  11)     3108      40   kmemleak_alloc+0x81/0xc0
  12)     3068      24   kmem_cache_alloc+0x162/0x1d0
  13)     3044      52   scsi_pool_alloc_command+0x29/0x70

 This pushes the total to ~3800 bytes, only a tiny bit
 more was needed to corrupt the on-kernel-stack thread_info.

 The fix reduces the stack footprint from 572 bytes
 to 28 bytes.

 Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
 Cc: Thomas Gleixner <tglx@linutronix.de>
 Cc: Frederic Weisbecker <fweisbec@gmail.com>
 Cc: Steven Rostedt <srostedt@redhat.com>
 Cc: Catalin Marinas <catalin.marinas@arm.com>
 Cc: Jens Axboe <jens.axboe@oracle.com>
 Cc: Linus Torvalds <torvalds@linux-foundation.org>
 LKML-Reference: <new-submission>
 Signed-off-by: Ingo Molnar <mingo@elte.hu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

 ---
  init/main.c |    7 ++++---
  1 file changed, 4 insertions(+), 3 deletions(-)

 --- a/init/main.c
 +++ b/init/main.c
 @@ -702,13 +702,14 @@ asmlinkage void __init start_kernel(void
  int initcall_debug;
  core_param(initcall_debug, initcall_debug, bool, 0644);

 +static char msgbuf[64];
 +static struct boot_trace_call call;
 +static struct boot_trace_ret ret;
 +
  int do_one_initcall(initcall_t fn)
  {
  	int count = preempt_count();
  	ktime_t calltime, delta, rettime;
 -	char msgbuf[64];
 -	struct boot_trace_call call;
 -	struct boot_trace_ret ret;

  	if (initcall_debug) {
  		call.caller = task_pid_nr(current);
	From 4a683bf94b8a10e2bb0da07aec3ac0a55e5de61f Mon Sep 17 00:00:00 2001
	From: Ingo Molnar <mingo@elte.hu>
	Date: Fri, 21 Aug 2009 12:53:36 +0200
	Subject: tracing: Fix too large stack usage in do_one_initcall()

	From: Ingo Molnar <mingo@elte.hu>

	commit 4a683bf94b8a10e2bb0da07aec3ac0a55e5de61f upstream.

	One of my testboxes triggered this nasty stack overflow crash
	during SCSI probing:

	[ 5.874004] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
	[ 5.875004] device: 'sda': device_add
	[ 5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
	[ 5.878004] IP: [<b1008321>] print_context_stack+0x81/0x110
	[ 5.878004] *pde = 00000000
	[ 5.878004] Thread overran stack, or stack corrupted
	[ 5.878004] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
	[ 5.878004] last sysfs file:
	[ 5.878004]
	[ 5.878004] Pid: 1, comm: swapper Not tainted (2.6.31-rc6-tip-01272-g9919e28-dirty #5685)
	[ 5.878004] EIP: 0060:[<b1008321>] EFLAGS: 00010083 CPU: 0
	[ 5.878004] EIP is at print_context_stack+0x81/0x110
	[ 5.878004] EAX: cf8a3000 EBX: cf8a3fe4 ECX: 00000049 EDX: 00000000
	[ 5.878004] ESI: b1cfce84 EDI: 00000000 EBP: cf8a3018 ESP: cf8a2ff4
	[ 5.878004] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
	[ 5.878004] Process swapper (pid: 1, ti=cf8a2000 task=cf8a8000 task.ti=cf8a3000)
	[ 5.878004] Stack:
	[ 5.878004] b1004867 fffff000 cf8a3ffc
	[ 5.878004] Call Trace:
	[ 5.878004] [<b1004867>] ? kernel_thread_helper+0x7/0x10
	[ 5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
	[ 5.878004] IP: [<b1008321>] print_context_stack+0x81/0x110
	[ 5.878004] *pde = 00000000
	[ 5.878004] Thread overran stack, or stack corrupted
	[ 5.878004] Oops: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC

	The oops did not reveal any more details about the real stack
	that we have and the system got into an infinite loop of
	recursive pagefaults.

	So i booted with CONFIG_STACK_TRACER=y and the 'stacktrace' boot
	parameter. The box did not crash (timings/conditions probably
	changed a tiny bit to trigger the catastrophic crash), but the
	/debug/tracing/stack_trace file was rather revealing:

	Depth Size Location (72 entries)
	----- ---- --------
	0) 3704 52 __change_page_attr+0xb8/0x290
	1) 3652 24 __change_page_attr_set_clr+0x43/0x90
	2) 3628 60 kernel_map_pages+0x108/0x120
	3) 3568 40 prep_new_page+0x7d/0x130
	4) 3528 84 get_page_from_freelist+0x106/0x420
	5) 3444 116 __alloc_pages_nodemask+0xd7/0x550
	6) 3328 36 allocate_slab+0xb1/0x100
	7) 3292 36 new_slab+0x1c/0x160
	8) 3256 36 __slab_alloc+0x133/0x2b0
	9) 3220 4 kmem_cache_alloc+0x1bb/0x1d0
	10) 3216 108 create_object+0x28/0x250
	11) 3108 40 kmemleak_alloc+0x81/0xc0
	12) 3068 24 kmem_cache_alloc+0x162/0x1d0
	13) 3044 52 scsi_pool_alloc_command+0x29/0x70
	14) 2992 20 scsi_host_alloc_command+0x22/0x70
	15) 2972 24 __scsi_get_command+0x1b/0x90
	16) 2948 28 scsi_get_command+0x35/0x90
	17) 2920 24 scsi_setup_blk_pc_cmnd+0xd4/0x100
	18) 2896 128 sd_prep_fn+0x332/0xa70
	19) 2768 36 blk_peek_request+0xe7/0x1d0
	20) 2732 56 scsi_request_fn+0x54/0x520
	21) 2676 12 __generic_unplug_device+0x2b/0x40
	22) 2664 24 blk_execute_rq_nowait+0x59/0x80
	23) 2640 172 blk_execute_rq+0x6b/0xb0
	24) 2468 32 scsi_execute+0xe0/0x140
	25) 2436 64 scsi_execute_req+0x152/0x160
	26) 2372 60 scsi_vpd_inquiry+0x6c/0x90
	27) 2312 44 scsi_get_vpd_page+0x112/0x160
	28) 2268 52 sd_revalidate_disk+0x1df/0x320
	29) 2216 92 rescan_partitions+0x98/0x330
	30) 2124 52 __blkdev_get+0x309/0x350
	31) 2072 8 blkdev_get+0xf/0x20
	32) 2064 44 register_disk+0xff/0x120
	33) 2020 36 add_disk+0x6e/0xb0
	34) 1984 44 sd_probe_async+0xfb/0x1d0
	35) 1940 44 __async_schedule+0xf4/0x1b0
	36) 1896 8 async_schedule+0x12/0x20
	37) 1888 60 sd_probe+0x305/0x360
	38) 1828 44 really_probe+0x63/0x170
	39) 1784 36 driver_probe_device+0x5d/0x60
	40) 1748 16 __device_attach+0x49/0x50
	41) 1732 32 bus_for_each_drv+0x5b/0x80
	42) 1700 24 device_attach+0x6b/0x70
	43) 1676 16 bus_attach_device+0x47/0x60
	44) 1660 76 device_add+0x33d/0x400
	45) 1584 52 scsi_sysfs_add_sdev+0x6a/0x2c0
	46) 1532 108 scsi_add_lun+0x44b/0x460
	47) 1424 116 scsi_probe_and_add_lun+0x182/0x4e0
	48) 1308 36 __scsi_add_device+0xd9/0xe0
	49) 1272 44 ata_scsi_scan_host+0x10b/0x190
	50) 1228 24 async_port_probe+0x96/0xd0
	51) 1204 44 __async_schedule+0xf4/0x1b0
	52) 1160 8 async_schedule+0x12/0x20
	53) 1152 48 ata_host_register+0x171/0x1d0
	54) 1104 60 ata_pci_sff_activate_host+0xf3/0x230
	55) 1044 44 ata_pci_sff_init_one+0xea/0x100
	56) 1000 48 amd_init_one+0xb2/0x190
	57) 952 8 local_pci_probe+0x13/0x20
	58) 944 32 pci_device_probe+0x68/0x90
	59) 912 44 really_probe+0x63/0x170
	60) 868 36 driver_probe_device+0x5d/0x60
	61) 832 20 __driver_attach+0x89/0xa0
	62) 812 32 bus_for_each_dev+0x5b/0x80
	63) 780 12 driver_attach+0x1e/0x20
	64) 768 72 bus_add_driver+0x14b/0x2d0
	65) 696 36 driver_register+0x6e/0x150
	66) 660 20 __pci_register_driver+0x53/0xc0
	67) 640 8 amd_init+0x14/0x16
	68) 632 572 do_one_initcall+0x2b/0x1d0
	69) 60 12 do_basic_setup+0x56/0x6a
	70) 48 20 kernel_init+0x84/0xce
	71) 28 28 kernel_thread_helper+0x7/0x10

	There's a lot of fat functions on that stack trace, but
	the largest of all is do_one_initcall(). This is due to
	the boot trace entry variables being on the stack.

	Fixing this is relatively easy, initcalls are fundamentally
	serialized, so we can move the local variables to file scope.

	Note that this large stack footprint was present for a
	couple of months already - what pushed my system over
	the edge was the addition of kmemleak to the call-chain:

	6) 3328 36 allocate_slab+0xb1/0x100
	7) 3292 36 new_slab+0x1c/0x160
	8) 3256 36 __slab_alloc+0x133/0x2b0
	9) 3220 4 kmem_cache_alloc+0x1bb/0x1d0
	10) 3216 108 create_object+0x28/0x250
	11) 3108 40 kmemleak_alloc+0x81/0xc0
	12) 3068 24 kmem_cache_alloc+0x162/0x1d0
	13) 3044 52 scsi_pool_alloc_command+0x29/0x70

	This pushes the total to ~3800 bytes, only a tiny bit
	more was needed to corrupt the on-kernel-stack thread_info.

	The fix reduces the stack footprint from 572 bytes
	to 28 bytes.

	Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
	Cc: Thomas Gleixner <tglx@linutronix.de>
	Cc: Frederic Weisbecker <fweisbec@gmail.com>
	Cc: Steven Rostedt <srostedt@redhat.com>
	Cc: Catalin Marinas <catalin.marinas@arm.com>
	Cc: Jens Axboe <jens.axboe@oracle.com>
	Cc: Linus Torvalds <torvalds@linux-foundation.org>
	LKML-Reference: <new-submission>
	Signed-off-by: Ingo Molnar <mingo@elte.hu>
	Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

	---
	init/main.c \| 7 ++++---
	1 file changed, 4 insertions(+), 3 deletions(-)

	--- a/init/main.c
	+++ b/init/main.c
	@@ -702,13 +702,14 @@ asmlinkage void __init start_kernel(void
	int initcall_debug;
	core_param(initcall_debug, initcall_debug, bool, 0644);

	+static char msgbuf[64];
	+static struct boot_trace_call call;
	+static struct boot_trace_ret ret;
	+
	int do_one_initcall(initcall_t fn)
	{
	int count = preempt_count();
	ktime_t calltime, delta, rettime;
	- char msgbuf[64];
	- struct boot_trace_call call;
	- struct boot_trace_ret ret;

	if (initcall_debug) {
	call.caller = task_pid_nr(current);