blob: ccba81f3f56ef7869bdcaa011affd44041bdef3a [file] [log] [blame]
From gregkh@mini.kroah.org Thu Sep 10 17:24:09 2009
Message-Id: <20090911002409.099894354@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:47 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Wei Yongjun <yjwei@cn.fujitsu.com>,
Eric Dumazet <eric.dumazet@gmail.com>,
"David S. Miller" <davem@davemloft.net>
Subject: [patch 01/22] dccp: missing destroy of percpu counter variable while unload module
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=dccp-missing-destroy-of-percpu-counter-variable-while-unload-module.patch
Content-Length: 1799
Lines: 53
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Wei Yongjun <yjwei@cn.fujitsu.com>
[ Upstream commit 476181cb05c6a3aea3ef42309388e255c934a06f ]
percpu counter dccp_orphan_count is init in dccp_init() by
percpu_counter_init() while dccp module is loaded, but the
destroy of it is missing while dccp module is unloaded. We
can get the kernel WARNING about this. Reproduct by the
following commands:
$ modprobe dccp
$ rmmod dccp
$ modprobe dccp
WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c()
Hardware name: VMware Virtual Platform
list_add corruption. next->prev should be prev (c080c0c4), but was (null). (next
=ca7188cc).
Modules linked in: dccp(+) nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc
Pid: 1956, comm: modprobe Not tainted 2.6.31-rc5 #55
Call Trace:
[<c042f8fa>] warn_slowpath_common+0x6a/0x81
[<c053a6cb>] ? __list_add+0x27/0x5c
[<c042f94f>] warn_slowpath_fmt+0x29/0x2c
[<c053a6cb>] __list_add+0x27/0x5c
[<c053c9b3>] __percpu_counter_init+0x4d/0x5d
[<ca9c90c7>] dccp_init+0x19/0x2ed [dccp]
[<c0401141>] do_one_initcall+0x4f/0x111
[<ca9c90ae>] ? dccp_init+0x0/0x2ed [dccp]
[<c06971b5>] ? notifier_call_chain+0x26/0x48
[<c0444943>] ? __blocking_notifier_call_chain+0x45/0x51
[<c04516f7>] sys_init_module+0xac/0x1bd
[<c04028e4>] sysenter_do_call+0x12/0x22
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
net/dccp/proto.c | 1 +
1 file changed, 1 insertion(+)
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -1159,6 +1159,7 @@ static void __exit dccp_fini(void)
kmem_cache_destroy(dccp_hashinfo.bind_bucket_cachep);
dccp_ackvec_exit();
dccp_sysctl_exit();
+ percpu_counter_destroy(&dccp_orphan_count);
}
module_init(dccp_init);
From gregkh@mini.kroah.org Thu Sep 10 17:24:09 2009
Message-Id: <20090911002409.235687801@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:48 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
=?ISO-8859-15?q?Krzysztof=20Ha=C5=82asa?= <khc@pm.waw.pl>,
"David S. Miller" <davem@davemloft.net>
Subject: [patch 02/22] E100: fix interaction with swiotlb on X86.
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=e100-fix-interaction-with-swiotlb-on-x86.patch
Content-Length: 1527
Lines: 38
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Krzysztof Hałasa <khc@pm.waw.pl>
[ Upstream commit 6ff9c2e7fa8ca63a575792534b63c5092099c286 ]
E100 places it's RX packet descriptors inside skb->data and uses them
with bidirectional streaming DMA mapping. Data in descriptors is
accessed simultaneously by the chip (writing status and size when
a packet is received) and CPU (reading to check if the packet was
received). This isn't a valid usage of PCI DMA API, which requires use
of the coherent (consistent) memory for such purpose. Unfortunately e100
chips working in "simplified" RX mode have to store received data
directly after the descriptor. Fixing the driver to conform to the API
would require using unsupported "flexible" RX mode or receiving data
into a coherent memory and using CPU to copy it to network buffers.
This patch, while not yet making the driver conform to the PCI DMA API,
allows it to work correctly on X86 with swiotlb (while not breaking
other architectures).
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/net/e100.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -1764,7 +1764,7 @@ static int e100_rx_indicate(struct nic *
nic->ru_running = RU_SUSPENDED;
pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr,
sizeof(struct rfd),
- PCI_DMA_BIDIRECTIONAL);
+ PCI_DMA_FROMDEVICE);
return -ENODATA;
}
From gregkh@mini.kroah.org Thu Sep 10 17:24:09 2009
Message-Id: <20090911002409.373226576@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:49 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Tom Goff <thomas.goff@boeing.com>,
"David S. Miller" <davem@davemloft.net>
Subject: [patch 03/22] gre: Fix MTU calculation for bound GRE tunnels
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=gre-fix-mtu-calculation-for-bound-gre-tunnels.patch
Content-Length: 865
Lines: 28
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Tom Goff <thomas.goff@boeing.com>
[ Upstream commit 8cdb045632e5ee22854538619ac6f150eb0a4894 ]
The GRE header length should be subtracted when the tunnel MTU is
calculated. This just corrects for the associativity change
introduced by commit 42aa916265d740d66ac1f17290366e9494c884c2
("gre: Move MTU setting out of ipgre_tunnel_bind_dev").
Signed-off-by: Tom Goff <thomas.goff@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
net/ipv4/ip_gre.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -952,7 +952,7 @@ static int ipgre_tunnel_bind_dev(struct
addend += 4;
}
dev->needed_headroom = addend + hlen;
- mtu -= dev->hard_header_len - addend;
+ mtu -= dev->hard_header_len + addend;
if (mtu < 68)
mtu = 68;
From gregkh@mini.kroah.org Thu Sep 10 17:24:09 2009
Message-Id: <20090911002409.524280560@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:50 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Ben McKeegan <ben@netservers.co.uk>,
"David S. Miller" <davem@davemloft.net>
Subject: [patch 04/22] ppp: fix lost fragments in ppp_mp_explode() (resubmit)
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=ppp-fix-lost-fragments-in-ppp_mp_explode.patch
Content-Length: 3133
Lines: 101
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Ben McKeegan <ben@netservers.co.uk>
[ Upstream commit a53a8b56827cc429c6d9f861ad558beeb5f6103f ]
This patch fixes the corner cases where the sum of MTU of the free
channels (adjusted for fragmentation overheads) is less than the MTU
of PPP link. There are at least 3 situations where this case might
arise:
- some of the channels are busy
- the multilink session is running in a degraded state (i.e. with less
than its full complement of active channels)
- by design, where multilink protocol is being used to artificially
increase the effective link MTU of a single link.
Without this patch, at most 1 fragment is ever sent per free channel
for a given PPP frame and any remaining part of the PPP frame that
does not fit into those fragments is silently discarded.
This patch restores the original behaviour which was broken by commit
9c705260feea6ae329bc6b6d5f6d2ef0227eda0a 'ppp:ppp_mp_explode()
redesign'. Once all 'free' channels have been given a fragment, an
additional fragment is queued to each available channel in turn, as many
times as necessary, until the entire PPP frame has been consumed.
Signed-off-by: Ben McKeegan <ben@netservers.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/net/ppp_generic.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
--- a/drivers/net/ppp_generic.c
+++ b/drivers/net/ppp_generic.c
@@ -1383,7 +1383,7 @@ static int ppp_mp_explode(struct ppp *pp
/* create a fragment for each channel */
bits = B;
- while (nfree > 0 && len > 0) {
+ while (len > 0) {
list = list->next;
if (list == &ppp->channels) {
i = 0;
@@ -1430,29 +1430,31 @@ static int ppp_mp_explode(struct ppp *pp
*otherwise divide it according to the speed
*of the channel we are going to transmit on
*/
- if (pch->speed == 0) {
- flen = totlen/nfree ;
- if (nbigger > 0) {
- flen++;
- nbigger--;
- }
- } else {
- flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
- ((totspeed*totfree)/pch->speed)) - hdrlen;
- if (nbigger > 0) {
- flen += ((totfree - nzero)*pch->speed)/totspeed;
- nbigger -= ((totfree - nzero)*pch->speed)/
+ if (nfree > 0) {
+ if (pch->speed == 0) {
+ flen = totlen/nfree ;
+ if (nbigger > 0) {
+ flen++;
+ nbigger--;
+ }
+ } else {
+ flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
+ ((totspeed*totfree)/pch->speed)) - hdrlen;
+ if (nbigger > 0) {
+ flen += ((totfree - nzero)*pch->speed)/totspeed;
+ nbigger -= ((totfree - nzero)*pch->speed)/
totspeed;
+ }
}
+ nfree--;
}
- nfree--;
/*
*check if we are on the last channel or
*we exceded the lenght of the data to
*fragment
*/
- if ((nfree == 0) || (flen > len))
+ if ((nfree <= 0) || (flen > len))
flen = len;
/*
*it is not worth to tx on slow channels:
@@ -1466,7 +1468,7 @@ static int ppp_mp_explode(struct ppp *pp
continue;
}
- mtu = pch->chan->mtu + 2 - hdrlen;
+ mtu = pch->chan->mtu - hdrlen;
if (mtu < 4)
mtu = 4;
if (flen > mtu)
From gregkh@mini.kroah.org Thu Sep 10 17:24:09 2009
Message-Id: <20090911002409.656912578@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:51 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Eric Dumazet <eric.dumazet@gmail.com>,
Cyrill Gorcunov <gorcunov@openvz.org>,
"David S. Miller" <davem@davemloft.net>
Subject: [patch 05/22] pppol2tp: calls unregister_pernet_gen_device() at unload time
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=pppol2tp-calls-unregister_pernet_gen_device-at-unload-time.patch
Content-Length: 800
Lines: 26
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Eric Dumazet <eric.dumazet@gmail.com>
[ Upstream commit 446e72f30eca76d6f9a1a54adf84d2c6ba2831f8 ]
Failure to call unregister_pernet_gen_device() can exhaust memory
if module is loaded/unloaded many times.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/net/pppol2tp.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/net/pppol2tp.c
+++ b/drivers/net/pppol2tp.c
@@ -2682,6 +2682,7 @@ out_unregister_pppol2tp_proto:
static void __exit pppol2tp_exit(void)
{
unregister_pppox_proto(PX_PROTO_OL2TP);
+ unregister_pernet_gen_device(pppol2tp_net_id, &pppol2tp_net_ops);
proto_unregister(&pppol2tp_sk_proto);
}
From gregkh@mini.kroah.org Thu Sep 10 17:24:09 2009
Message-Id: <20090911002409.803547267@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:52 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Eric Dumazet <eric.dumazet@gmail.com>,
Pavel Emelyanov <xemul@openvz.org>,
"David S. Miller" <davem@davemloft.net>
Subject: [patch 06/22] net: net_assign_generic() fix
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=net-net_assign_generic-fix.patch
Content-Length: 830
Lines: 27
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Eric Dumazet <eric.dumazet@gmail.com>
[ Upstream commit 144586301f6af5ae5943a002f030d8c626fa4fdd ]
memcpy() should take into account size of pointers,
not only number of pointers to copy.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
net/core/net_namespace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -498,7 +498,7 @@ int net_assign_generic(struct net *net,
*/
ng->len = id;
- memcpy(&ng->ptr, &old_ng->ptr, old_ng->len);
+ memcpy(&ng->ptr, &old_ng->ptr, old_ng->len * sizeof(void*));
rcu_assign_pointer(net->gen, ng);
call_rcu(&old_ng->rcu, net_generic_release);
From gregkh@mini.kroah.org Thu Sep 10 17:24:10 2009
Message-Id: <20090911002409.986396768@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:53 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
"David S. Miller" <davem@davemloft.net>
Subject: [patch 07/22] sparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds.
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=sparc64-kill-spurious-nmi-watchdog-triggers-by-increasing-limit-to-30-seconds.patch
Content-Length: 2327
Lines: 71
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: David S. Miller <davem@davemloft.net>
[ Upstream commit e6617c6ec28a17cf2f90262b835ec05b9b861400 ]
This is a compromise and a temporary workaround for bootup NMI
watchdog triggers some people see with qla2xxx devices present.
This happens when, for example:
CPU 0 is in the driver init and looping submitting mailbox commands to
load the firmware, then waiting for completion.
CPU 1 is receiving the device interrupts. CPU 1 is where the NMI
watchdog triggers.
CPU 0 is submitting mailbox commands fast enough that by the time CPU
1 returns from the device interrupt handler, a new one is pending.
This sequence runs for more than 5 seconds.
The problematic case is CPU 1's timer interrupt running when the
barrage of device interrupts begin. Then we have:
timer interrupt
return for softirq checking
pending, thus enable interrupts
qla2xxx interrupt
return
qla2xxx interrupt
return
... 5+ seconds pass
final qla2xxx interrupt for fw load
return
run timer softirq
return
At some point in the multi-second qla2xxx interrupt storm we trigger
the NMI watchdog on CPU 1 from the NMI interrupt handler.
The timer softirq, once we get back to running it, is smart enough to
run the timer work enough times to make up for the missed timer
interrupts.
However, the NMI watchdogs (both x86 and sparc) use the timer
interrupt count to notice the cpu is wedged. But in the above
scenerio we'll receive only one such timer interrupt even if we last
all the way back to running the timer softirq.
The default watchdog trigger point is only 5 seconds, which is pretty
low (the softwatchdog triggers at 60 seconds). So increase it to 30
seconds for now.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
arch/sparc/kernel/nmi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -103,7 +103,7 @@ notrace __kprobes void perfctr_irq(int i
}
if (!touched && __get_cpu_var(last_irq_sum) == sum) {
local_inc(&__get_cpu_var(alert_counter));
- if (local_read(&__get_cpu_var(alert_counter)) == 5 * nmi_hz)
+ if (local_read(&__get_cpu_var(alert_counter)) == 30 * nmi_hz)
die_nmi("BUG: NMI Watchdog detected LOCKUP",
regs, panic_on_timeout);
} else {
From gregkh@mini.kroah.org Thu Sep 10 17:24:10 2009
Message-Id: <20090911002410.161673663@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:54 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
"David S. Miller" <davem@davemloft.net>
Subject: [patch 08/22] sparc64: Validate linear D-TLB misses.
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=sparc64-validate-linear-d-tlb-misses.patch
Content-Length: 7747
Lines: 236
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: David S. Miller <davem@davemloft.net>
[ Upstream commit d8ed1d43e17898761c7221014a15a4c7501d2ff3 ]
When page alloc debugging is not enabled, we essentially accept any
virtual address for linear kernel TLB misses. But with kgdb, kernel
address probing, and other facilities we can try to access arbitrary
crap.
So, make sure the address we miss on will translate to physical memory
that actually exists.
In order to make this work we have to embed the valid address bitmap
into the kernel image. And in order to make that less expensive we
make an adjustment, in that the max physical memory address is
decreased to "1 << 41", even on the chips that support a 42-bit
physical address space. We can do this because bit 41 indicates
"I/O space" and thus covers non-memory ranges.
The result of this is that:
1) kpte_linear_bitmap shrinks from 2K to 1K in size
2) we need 64K more for the valid address bitmap
We can't let the valid address bitmap be dynamically allocated
once we start using it to validate TLB misses, otherwise we have
crazy issues to deal with wrt. recursive TLB misses and such.
If we're in a TLB miss it could be the deepest trap level that's legal
inside of the cpu. So if we TLB miss referencing the bitmap, the cpu
will be out of trap levels and enter RED state.
To guard against out-of-range accesses to the bitmap, we have to check
to make sure no bits in the physical address above bit 40 are set. We
could export and use last_valid_pfn for this check, but that's just an
unnecessary extra memory reference.
On the plus side of all this, since we load all of these translations
into the special 4MB mapping TSB, and we check the TSB first for TLB
misses, there should be absolutely no real cost for these new checks
in the TLB miss path.
Reported-by: heyongli@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
arch/sparc/include/asm/pgtable_64.h | 12 +++++++---
arch/sparc/kernel/ktlb.S | 42 +++++++++++++++++++++++++++++++----
arch/sparc/mm/init_64.c | 43 ++++++++++++++++++++----------------
arch/sparc/mm/init_64.h | 7 ++++-
4 files changed, 76 insertions(+), 28 deletions(-)
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -726,11 +726,17 @@ extern unsigned long pte_file(pte_t);
extern pte_t pgoff_to_pte(unsigned long);
#define PTE_FILE_MAX_BITS (64UL - PAGE_SHIFT - 1UL)
-extern unsigned long *sparc64_valid_addr_bitmap;
+extern unsigned long sparc64_valid_addr_bitmap[];
/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
-#define kern_addr_valid(addr) \
- (test_bit(__pa((unsigned long)(addr))>>22, sparc64_valid_addr_bitmap))
+static inline bool kern_addr_valid(unsigned long addr)
+{
+ unsigned long paddr = __pa(addr);
+
+ if ((paddr >> 41UL) != 0UL)
+ return false;
+ return test_bit(paddr >> 22, sparc64_valid_addr_bitmap);
+}
extern int page_in_phys_avail(unsigned long paddr);
--- a/arch/sparc/kernel/ktlb.S
+++ b/arch/sparc/kernel/ktlb.S
@@ -151,12 +151,46 @@ kvmap_dtlb_4v:
* Must preserve %g1 and %g6 (TAG).
*/
kvmap_dtlb_tsb4m_miss:
- sethi %hi(kpte_linear_bitmap), %g2
- or %g2, %lo(kpte_linear_bitmap), %g2
+ /* Clear the PAGE_OFFSET top virtual bits, shift
+ * down to get PFN, and make sure PFN is in range.
+ */
+ sllx %g4, 21, %g5
- /* Clear the PAGE_OFFSET top virtual bits, then shift
- * down to get a 256MB physical address index.
+ /* Check to see if we know about valid memory at the 4MB
+ * chunk this physical address will reside within.
*/
+ srlx %g5, 21 + 41, %g2
+ brnz,pn %g2, kvmap_dtlb_longpath
+ nop
+
+ /* This unconditional branch and delay-slot nop gets patched
+ * by the sethi sequence once the bitmap is properly setup.
+ */
+ .globl valid_addr_bitmap_insn
+valid_addr_bitmap_insn:
+ ba,pt %xcc, 2f
+ nop
+ .subsection 2
+ .globl valid_addr_bitmap_patch
+valid_addr_bitmap_patch:
+ sethi %hi(sparc64_valid_addr_bitmap), %g7
+ or %g7, %lo(sparc64_valid_addr_bitmap), %g7
+ .previous
+
+ srlx %g5, 21 + 22, %g2
+ srlx %g2, 6, %g5
+ and %g2, 63, %g2
+ sllx %g5, 3, %g5
+ ldx [%g7 + %g5], %g5
+ mov 1, %g7
+ sllx %g7, %g2, %g7
+ andcc %g5, %g7, %g0
+ be,pn %xcc, kvmap_dtlb_longpath
+
+2: sethi %hi(kpte_linear_bitmap), %g2
+ or %g2, %lo(kpte_linear_bitmap), %g2
+
+ /* Get the 256MB physical address index. */
sllx %g4, 21, %g5
mov 1, %g7
srlx %g5, 21 + 28, %g5
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -145,7 +145,8 @@ static void __init read_obp_memory(const
cmp_p64, NULL);
}
-unsigned long *sparc64_valid_addr_bitmap __read_mostly;
+unsigned long sparc64_valid_addr_bitmap[VALID_ADDR_BITMAP_BYTES /
+ sizeof(unsigned long)];
EXPORT_SYMBOL(sparc64_valid_addr_bitmap);
/* Kernel physical address base and size in bytes. */
@@ -1876,7 +1877,7 @@ static int pavail_rescan_ents __initdata
* memory list again, and make sure it provides at least as much
* memory as 'pavail' does.
*/
-static void __init setup_valid_addr_bitmap_from_pavail(void)
+static void __init setup_valid_addr_bitmap_from_pavail(unsigned long *bitmap)
{
int i;
@@ -1899,8 +1900,7 @@ static void __init setup_valid_addr_bitm
if (new_start <= old_start &&
new_end >= (old_start + PAGE_SIZE)) {
- set_bit(old_start >> 22,
- sparc64_valid_addr_bitmap);
+ set_bit(old_start >> 22, bitmap);
goto do_next_page;
}
}
@@ -1921,20 +1921,21 @@ static void __init setup_valid_addr_bitm
}
}
+static void __init patch_tlb_miss_handler_bitmap(void)
+{
+ extern unsigned int valid_addr_bitmap_insn[];
+ extern unsigned int valid_addr_bitmap_patch[];
+
+ valid_addr_bitmap_insn[1] = valid_addr_bitmap_patch[1];
+ mb();
+ valid_addr_bitmap_insn[0] = valid_addr_bitmap_patch[0];
+ flushi(&valid_addr_bitmap_insn[0]);
+}
+
void __init mem_init(void)
{
unsigned long codepages, datapages, initpages;
unsigned long addr, last;
- int i;
-
- i = last_valid_pfn >> ((22 - PAGE_SHIFT) + 6);
- i += 1;
- sparc64_valid_addr_bitmap = (unsigned long *) alloc_bootmem(i << 3);
- if (sparc64_valid_addr_bitmap == NULL) {
- prom_printf("mem_init: Cannot alloc valid_addr_bitmap.\n");
- prom_halt();
- }
- memset(sparc64_valid_addr_bitmap, 0, i << 3);
addr = PAGE_OFFSET + kern_base;
last = PAGE_ALIGN(kern_size) + addr;
@@ -1943,15 +1944,19 @@ void __init mem_init(void)
addr += PAGE_SIZE;
}
- setup_valid_addr_bitmap_from_pavail();
+ setup_valid_addr_bitmap_from_pavail(sparc64_valid_addr_bitmap);
+ patch_tlb_miss_handler_bitmap();
high_memory = __va(last_valid_pfn << PAGE_SHIFT);
#ifdef CONFIG_NEED_MULTIPLE_NODES
- for_each_online_node(i) {
- if (NODE_DATA(i)->node_spanned_pages != 0) {
- totalram_pages +=
- free_all_bootmem_node(NODE_DATA(i));
+ {
+ int i;
+ for_each_online_node(i) {
+ if (NODE_DATA(i)->node_spanned_pages != 0) {
+ totalram_pages +=
+ free_all_bootmem_node(NODE_DATA(i));
+ }
}
}
#else
--- a/arch/sparc/mm/init_64.h
+++ b/arch/sparc/mm/init_64.h
@@ -5,10 +5,13 @@
* marked non-static so that assembler code can get at them.
*/
-#define MAX_PHYS_ADDRESS (1UL << 42UL)
-#define KPTE_BITMAP_CHUNK_SZ (256UL * 1024UL * 1024UL)
+#define MAX_PHYS_ADDRESS (1UL << 41UL)
+#define KPTE_BITMAP_CHUNK_SZ (256UL * 1024UL * 1024UL)
#define KPTE_BITMAP_BYTES \
((MAX_PHYS_ADDRESS / KPTE_BITMAP_CHUNK_SZ) / 8)
+#define VALID_ADDR_BITMAP_CHUNK_SZ (4UL * 1024UL * 1024UL)
+#define VALID_ADDR_BITMAP_BYTES \
+ ((MAX_PHYS_ADDRESS / VALID_ADDR_BITMAP_CHUNK_SZ) / 8)
extern unsigned long kern_linear_pte_xor[2];
extern unsigned long kpte_linear_bitmap[KPTE_BITMAP_BYTES / sizeof(unsigned long)];
From gregkh@mini.kroah.org Thu Sep 10 17:24:10 2009
Message-Id: <20090911002410.310962876@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:55 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
"David S. Miller" <davem@davemloft.net>
Subject: [patch 09/22] sparc64: Fix bootup with mcount in some configs.
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=sparc64-fix-bootup-with-mcount-in-some-configs.patch
Content-Length: 2493
Lines: 81
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: David S. Miller <davem@davemloft.net>
[ Upstream commit bd4352cadfacb9084c97c853b025fac010266c26 ]
Functions invoked early when booting up a cpu can't use
tracing because mcount requires a valid 'current_thread_info()'
and TLB mappings to be setup.
The code path of sun4v_register_mondo_queues --> register_one_mondo
is one such case. sun4v_register_mondo_queues already has the
necessary 'notrace' annotation, but register_one_mondo does not.
Normally register_one_mondo is inlined so the bug doesn't trigger,
but with some config/compiler combinations, it won't be so we
must properly mark it notrace.
While we're here, add 'notrace' annoations to prom_printf and
prom_halt so that early error handling won't have the same problem.
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Reported-by: Leif Sawyer <lsawyer@gci.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
arch/sparc/kernel/irq_64.c | 2 +-
arch/sparc/prom/misc_64.c | 2 +-
arch/sparc/prom/printf.c | 7 +++----
3 files changed, 5 insertions(+), 6 deletions(-)
--- a/arch/sparc/kernel/irq_64.c
+++ b/arch/sparc/kernel/irq_64.c
@@ -902,7 +902,7 @@ void notrace init_irqwork_curcpu(void)
* Therefore you cannot make any OBP calls, not even prom_printf,
* from these two routines.
*/
-static void __cpuinit register_one_mondo(unsigned long paddr, unsigned long type, unsigned long qmask)
+static void __cpuinit notrace register_one_mondo(unsigned long paddr, unsigned long type, unsigned long qmask)
{
unsigned long num_entries = (qmask + 1) / 64;
unsigned long status;
--- a/arch/sparc/prom/misc_64.c
+++ b/arch/sparc/prom/misc_64.c
@@ -88,7 +88,7 @@ void prom_cmdline(void)
/* Drop into the prom, but completely terminate the program.
* No chance of continuing.
*/
-void prom_halt(void)
+void notrace prom_halt(void)
{
#ifdef CONFIG_SUN_LDOMS
if (ldom_domaining_enabled)
--- a/arch/sparc/prom/printf.c
+++ b/arch/sparc/prom/printf.c
@@ -14,14 +14,14 @@
*/
#include <linux/kernel.h>
+#include <linux/compiler.h>
#include <asm/openprom.h>
#include <asm/oplib.h>
static char ppbuf[1024];
-void
-prom_write(const char *buf, unsigned int n)
+void notrace prom_write(const char *buf, unsigned int n)
{
char ch;
@@ -33,8 +33,7 @@ prom_write(const char *buf, unsigned int
}
}
-void
-prom_printf(const char *fmt, ...)
+void notrace prom_printf(const char *fmt, ...)
{
va_list args;
int i;
From gregkh@mini.kroah.org Thu Sep 10 17:24:10 2009
Message-Id: <20090911002410.490751860@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:56 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
"David S. Miller" <davem@davemloft.net>
Subject: [patch 10/22] sparc: sys32.S incorrect compat-layer splice() system call
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=sparc-sys32.s-incorrect-compat-layer-splice-system-call.patch
Content-Length: 1243
Lines: 36
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
[ Upstream commit e2c6cbd9ace61039d3de39e717195e38f1492aee ]
I think arch/sparc/kernel/sys32.S has an incorrect splice definition:
SIGN2(sys32_splice, sys_splice, %o0, %o1)
The splice() prototype looks like :
long splice(int fd_in, loff_t *off_in, int fd_out,
loff_t *off_out, size_t len, unsigned int flags);
So I think we should have :
SIGN2(sys32_splice, sys_splice, %o0, %o2)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
arch/sparc/kernel/sys32.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/sparc/kernel/sys32.S
+++ b/arch/sparc/kernel/sys32.S
@@ -134,7 +134,7 @@ SIGN1(sys32_getpeername, sys_getpeername
SIGN1(sys32_getsockname, sys_getsockname, %o0)
SIGN2(sys32_ioprio_get, sys_ioprio_get, %o0, %o1)
SIGN3(sys32_ioprio_set, sys_ioprio_set, %o0, %o1, %o2)
-SIGN2(sys32_splice, sys_splice, %o0, %o1)
+SIGN2(sys32_splice, sys_splice, %o0, %o2)
SIGN2(sys32_sync_file_range, compat_sync_file_range, %o0, %o5)
SIGN2(sys32_tee, sys_tee, %o0, %o1)
SIGN1(sys32_vmsplice, compat_sys_vmsplice, %o0)
From gregkh@mini.kroah.org Thu Sep 10 17:24:10 2009
Message-Id: <20090911002410.660886830@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:57 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Massimo Cirillo <maxcir@gmail.com>,
Artem Bityutskiy <Artem.Bityutskiy@nokia.com>,
David Woodhouse <David.Woodhouse@intel.com>
Subject: [patch 11/22] JFFS2: add missing verify buffer allocation/deallocation
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=jffs2-add-missing-verify-buffer-allocation-deallocation.patch
Content-Length: 1287
Lines: 45
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Massimo Cirillo <maxcir@gmail.com>
commit bc8cec0dff072f1a45ce7f6b2c5234bb3411ac51 upstream.
The function jffs2_nor_wbuf_flash_setup() doesn't allocate the verify buffer
if CONFIG_JFFS2_FS_WBUF_VERIFY is defined, so causing a kernel panic when
that macro is enabled and the verify function is called. Similarly the
jffs2_nor_wbuf_flash_cleanup() must free the buffer if
CONFIG_JFFS2_FS_WBUF_VERIFY is enabled.
The following patch fixes the problem.
The following patch applies to 2.6.30 kernel.
Signed-off-by: Massimo Cirillo <maxcir@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/jffs2/wbuf.c | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/fs/jffs2/wbuf.c
+++ b/fs/jffs2/wbuf.c
@@ -1268,10 +1268,20 @@ int jffs2_nor_wbuf_flash_setup(struct jf
if (!c->wbuf)
return -ENOMEM;
+#ifdef CONFIG_JFFS2_FS_WBUF_VERIFY
+ c->wbuf_verify = kmalloc(c->wbuf_pagesize, GFP_KERNEL);
+ if (!c->wbuf_verify) {
+ kfree(c->wbuf);
+ return -ENOMEM;
+ }
+#endif
return 0;
}
void jffs2_nor_wbuf_flash_cleanup(struct jffs2_sb_info *c) {
+#ifdef CONFIG_JFFS2_FS_WBUF_VERIFY
+ kfree(c->wbuf_verify);
+#endif
kfree(c->wbuf);
}
From gregkh@mini.kroah.org Thu Sep 10 17:24:10 2009
Message-Id: <20090911002410.794447090@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:58 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Eric Dumazet <eric.dumazet@gmail.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Pekka Enberg <penberg@cs.helsinki.fi>
Subject: [patch 12/22] slub: Fix kmem_cache_destroy() with SLAB_DESTROY_BY_RCU
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=slub-fix-kmem_cache_destroy-with-slab_destroy_by_rcu.patch
Content-Length: 1275
Lines: 42
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Eric Dumazet <eric.dumazet@gmail.com>
commit d76b1590e06a63a3d8697168cd0aabf1c4b3cb3a upstream.
kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close() and
*before* sysfs_slab_remove() or risk rcu_free_slab() being called after
kmem_cache is deleted (kfreed).
rmmod nf_conntrack can crash the machine because it has to kmem_cache_destroy()
a SLAB_DESTROY_BY_RCU enabled cache.
Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
mm/slub.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2490,8 +2490,6 @@ static inline int kmem_cache_close(struc
*/
void kmem_cache_destroy(struct kmem_cache *s)
{
- if (s->flags & SLAB_DESTROY_BY_RCU)
- rcu_barrier();
down_write(&slub_lock);
s->refcount--;
if (!s->refcount) {
@@ -2502,6 +2500,8 @@ void kmem_cache_destroy(struct kmem_cach
"still has objects.\n", s->name, __func__);
dump_stack();
}
+ if (s->flags & SLAB_DESTROY_BY_RCU)
+ rcu_barrier();
sysfs_slab_remove(s);
} else
up_write(&slub_lock);
From gregkh@mini.kroah.org Thu Sep 10 17:24:11 2009
Message-Id: <20090911002410.971758034@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:59 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Subject: [patch 13/22] nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=nilfs2-fix-preempt-count-underflow-in-nilfs_btnode_prepare_change_key.patch
Content-Length: 3658
Lines: 73
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
commit b1f1b8ce0a1d71cbc72f7540134d52b79bd8f5ac upstream.
This will fix the following preempt count underflow reported from
users with the title "[NILFS users] segctord problem" (Message-ID:
<949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID:
<debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>):
WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0()
Hardware name: HP Compaq 6530b (KR980UT#ABC)
Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod
Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7
Call Trace:
[<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0
[<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0
[<ffffffff8024715f>] warn_slowpath_null+0xf/0x20
[<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0
[<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2]
[<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2]
[<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2]
[<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2]
[<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2]
[<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40
[<ffffffff803c06e0>] ? __up_write+0xe0/0x150
[<ffffffff80262959>] ? up_write+0x9/0x10
[<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2]
[<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2]
[<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2]
[<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2]
[<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2]
[<ffffffff80252633>] ? add_timer+0x13/0x20
[<ffffffff802370da>] ? __wake_up_common+0x5a/0x90
[<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2]
[<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2]
[<ffffffff8025e556>] kthread+0x56/0x90
[<ffffffff8020cdea>] child_rip+0xa/0x20
[<ffffffff8025e500>] ? kthread+0x0/0x90
[<ffffffff8020cde0>] ? child_rip+0x0/0x20
This problem was caused due to a missing radix_tree_preload() call in
the retry path of nilfs_btnode_prepare_change_key() function.
Reported-by: Eric A <eric225125@yahoo.com>
Reported-by: Jerome Poulin <jeromepoulin@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Jerome Poulin <jeromepoulin@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/nilfs2/btnode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -206,6 +206,7 @@ int nilfs_btnode_prepare_change_key(stru
* We cannot call radix_tree_preload for the kernels older
* than 2.6.23, because it is not exported for modules.
*/
+retry:
err = radix_tree_preload(GFP_NOFS & ~__GFP_HIGHMEM);
if (err)
goto failed_unlock;
@@ -216,7 +217,6 @@ int nilfs_btnode_prepare_change_key(stru
(unsigned long long)oldkey,
(unsigned long long)newkey);
-retry:
spin_lock_irq(&btnc->tree_lock);
err = radix_tree_insert(&btnc->page_tree, newkey, obh->b_page);
spin_unlock_irq(&btnc->tree_lock);
From gregkh@mini.kroah.org Thu Sep 10 17:24:11 2009
Message-Id: <20090911002411.157180912@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:23:00 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Chris Wright <chrisw@sous-sol.org>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
Matthew Wilcox <matthew@wil.cx>,
Yu Zhao <yu.zhao@intel.com>,
Jesse Barnes <jbarnes@virtuousgeek.org>
Subject: [patch 14/22] PCI SR-IOV: correct broken resource alignment calculations
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=pci-sr-iov-correct-broken-resource-alignment-calculations.patch
Content-Length: 5432
Lines: 155
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Chris Wright <chrisw@sous-sol.org>
commit 6faf17f6f1ffc586d16efc2f9fa2083a7785ee74 upstream.
An SR-IOV capable device includes an SR-IOV PCIe capability which
describes the Virtual Function (VF) BAR requirements. A typical SR-IOV
device can support multiple VFs whose BARs must be in a contiguous region,
effectively an array of VF BARs. The BAR reports the size requirement
for a single VF. We calculate the full range needed by simply multiplying
the VF BAR size with the number of possible VFs and create a resource
spanning the full range.
This all seems sane enough except it artificially inflates the alignment
requirement for the VF BAR. The VF BAR need only be aligned to the size
of a single BAR not the contiguous range of VF BARs. This can cause us
to fail to allocate resources for the BAR despite the fact that we
actually have enough space.
This patch adds a thin PCI specific layer over the generic
resource_alignment() function which is aware of the special nature of
VF BARs and does sorting and allocation based on the smaller alignment
requirement.
I recognize that while resource_alignment is generic, it's basically a
PCI helper. An alternative to this patch is to add PCI VF BAR specific
information to struct resource. I opted for the extra layer rather than
adding such PCI specific information to struct resource. This does
have the slight downside that we don't cache the BAR size and re-read
for each alignment query (happens a small handful of times during boot
for each VF BAR).
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/pci/iov.c | 23 +++++++++++++++++++++++
drivers/pci/pci.h | 13 +++++++++++++
drivers/pci/setup-bus.c | 4 ++--
drivers/pci/setup-res.c | 6 +++---
4 files changed, 41 insertions(+), 5 deletions(-)
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -595,6 +595,29 @@ int pci_iov_resource_bar(struct pci_dev
}
/**
+ * pci_sriov_resource_alignment - get resource alignment for VF BAR
+ * @dev: the PCI device
+ * @resno: the resource number
+ *
+ * Returns the alignment of the VF BAR found in the SR-IOV capability.
+ * This is not the same as the resource size which is defined as
+ * the VF BAR size multiplied by the number of VFs. The alignment
+ * is just the VF BAR size.
+ */
+int pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
+{
+ struct resource tmp;
+ enum pci_bar_type type;
+ int reg = pci_iov_resource_bar(dev, resno, &type);
+
+ if (!reg)
+ return 0;
+
+ __pci_read_base(dev, type, &tmp, reg);
+ return resource_alignment(&tmp);
+}
+
+/**
* pci_restore_iov_state - restore the state of the IOV capability
* @dev: the PCI device
*/
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -234,6 +234,7 @@ extern int pci_iov_init(struct pci_dev *
extern void pci_iov_release(struct pci_dev *dev);
extern int pci_iov_resource_bar(struct pci_dev *dev, int resno,
enum pci_bar_type *type);
+extern int pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
extern void pci_restore_iov_state(struct pci_dev *dev);
extern int pci_iov_bus_range(struct pci_bus *bus);
#else
@@ -259,4 +260,16 @@ static inline int pci_iov_bus_range(stru
}
#endif /* CONFIG_PCI_IOV */
+static inline int pci_resource_alignment(struct pci_dev *dev,
+ struct resource *res)
+{
+#ifdef CONFIG_PCI_IOV
+ int resno = res - dev->resource;
+
+ if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END)
+ return pci_sriov_resource_alignment(dev, resno);
+#endif
+ return resource_alignment(res);
+}
+
#endif /* DRIVERS_PCI_H */
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -25,7 +25,7 @@
#include <linux/ioport.h>
#include <linux/cache.h>
#include <linux/slab.h>
-
+#include "pci.h"
static void pbus_assign_resources_sorted(const struct pci_bus *bus)
{
@@ -355,7 +355,7 @@ static int pbus_size_mem(struct pci_bus
continue;
r_size = resource_size(r);
/* For bridges size != alignment */
- align = resource_alignment(r);
+ align = pci_resource_alignment(dev, r);
order = __ffs(align) - 20;
if (order > 11) {
dev_warn(&dev->dev, "BAR %d bad alignment %llx: "
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -145,7 +145,7 @@ int pci_assign_resource(struct pci_dev *
size = resource_size(res);
min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
- align = resource_alignment(res);
+ align = pci_resource_alignment(dev, res);
if (!align) {
dev_info(&dev->dev, "BAR %d: can't allocate resource (bogus "
"alignment) %pR flags %#lx\n",
@@ -236,7 +236,7 @@ void pdev_sort_resources(struct pci_dev
if (!(r->flags) || r->parent)
continue;
- r_align = resource_alignment(r);
+ r_align = pci_resource_alignment(dev, r);
if (!r_align) {
dev_warn(&dev->dev, "BAR %d: bogus alignment "
"%pR flags %#lx\n",
@@ -248,7 +248,7 @@ void pdev_sort_resources(struct pci_dev
struct resource_list *ln = list->next;
if (ln)
- align = resource_alignment(ln->res);
+ align = pci_resource_alignment(ln->dev, ln->res);
if (r_align > align) {
tmp = kmalloc(sizeof(*tmp), GFP_KERNEL);
From gregkh@mini.kroah.org Thu Sep 10 17:24:11 2009
Message-Id: <20090911002411.326362527@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:23:01 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
James Bottomley <James.Bottomley@HansenPartnership.com>
Subject: [patch 15/22] SCSI: sd: fix bug in SCSI async probing
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=scsi-sd-fix-bug-in-scsi-async-probing.patch
Content-Length: 2696
Lines: 94
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: James Bottomley <James.Bottomley@HansenPartnership.com>
commit 601e7638254c118fca135af9b1a9f35061420f62 upstream.
The async split up of probing in sd.c created a potential failure case where
something goes wrong with device_add(), but which we don't recover properly.
Since, in general, asynchronous error handling is hard, move the device_add()
into the asynchronous path (it should be fast) and make sure all the deferred
processing cannot fail.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/scsi/sd.c | 45 +++++++++++++++++++++------------------------
1 file changed, 21 insertions(+), 24 deletions(-)
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1902,24 +1902,6 @@ static void sd_probe_async(void *data, a
index = sdkp->index;
dev = &sdp->sdev_gendev;
- if (!sdp->request_queue->rq_timeout) {
- if (sdp->type != TYPE_MOD)
- blk_queue_rq_timeout(sdp->request_queue, SD_TIMEOUT);
- else
- blk_queue_rq_timeout(sdp->request_queue,
- SD_MOD_TIMEOUT);
- }
-
- device_initialize(&sdkp->dev);
- sdkp->dev.parent = &sdp->sdev_gendev;
- sdkp->dev.class = &sd_disk_class;
- dev_set_name(&sdkp->dev, dev_name(&sdp->sdev_gendev));
-
- if (device_add(&sdkp->dev))
- goto out_free_index;
-
- get_device(&sdp->sdev_gendev);
-
if (index < SD_MAX_DISKS) {
gd->major = sd_major((index & 0xf0) >> 4);
gd->first_minor = ((index & 0xf) << 4) | (index & 0xfff00);
@@ -1954,11 +1936,6 @@ static void sd_probe_async(void *data, a
sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
sdp->removable ? "removable " : "");
-
- return;
-
- out_free_index:
- ida_remove(&sd_index_ida, index);
}
/**
@@ -2026,6 +2003,24 @@ static int sd_probe(struct device *dev)
sdkp->openers = 0;
sdkp->previous_state = 1;
+ if (!sdp->request_queue->rq_timeout) {
+ if (sdp->type != TYPE_MOD)
+ blk_queue_rq_timeout(sdp->request_queue, SD_TIMEOUT);
+ else
+ blk_queue_rq_timeout(sdp->request_queue,
+ SD_MOD_TIMEOUT);
+ }
+
+ device_initialize(&sdkp->dev);
+ sdkp->dev.parent = &sdp->sdev_gendev;
+ sdkp->dev.class = &sd_disk_class;
+ dev_set_name(&sdkp->dev, dev_name(&sdp->sdev_gendev));
+
+ if (device_add(&sdkp->dev))
+ goto out_free_index;
+
+ get_device(&sdp->sdev_gendev);
+
async_schedule(sd_probe_async, sdkp);
return 0;
@@ -2055,8 +2050,10 @@ static int sd_probe(struct device *dev)
**/
static int sd_remove(struct device *dev)
{
- struct scsi_disk *sdkp = dev_get_drvdata(dev);
+ struct scsi_disk *sdkp;
+ async_synchronize_full();
+ sdkp = dev_get_drvdata(dev);
device_del(&sdkp->dev);
del_gendisk(sdkp->disk);
sd_shutdown(dev);
From gregkh@mini.kroah.org Thu Sep 10 17:24:11 2009
Message-Id: <20090911002411.498265593@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:23:02 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Clemens Ladisch <clemens@ladisch.de>,
Takashi Iwai <tiwai@suse.de>
Subject: [patch 16/22] sound: oxygen: handle cards with missing EEPROM
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=sound-oxygen-handle-cards-with-missing-eeprom.patch
Content-Length: 1145
Lines: 31
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Clemens Ladisch <clemens@ladisch.de>
commit 92653453c3015c083b9fe0ad48261c6b2267d482 upstream.
The card model detection code introduced in 2.6.30 that tries to work
around partially broken EEPROM contents by reading the EEPROM directly
does not handle cards where the EEPROM has been omitted. In this case,
we have to use the default ID to allow the driver to load.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Reported-and-tested-by: Ozan Çağlayan <ozan@pardus.org.tr>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
sound/pci/oxygen/oxygen_lib.c | 3 +++
1 file changed, 3 insertions(+)
--- a/sound/pci/oxygen/oxygen_lib.c
+++ b/sound/pci/oxygen/oxygen_lib.c
@@ -260,6 +260,9 @@ oxygen_search_pci_id(struct oxygen *chip
* chip didn't if the first EEPROM word was overwritten.
*/
subdevice = oxygen_read_eeprom(chip, 2);
+ /* use default ID if EEPROM is missing */
+ if (subdevice == 0xffff)
+ subdevice = 0x8788;
/*
* We use only the subsystem device ID for searching because it is
* unique even without the subsystem vendor ID, which may have been
From gregkh@mini.kroah.org Thu Sep 10 17:24:11 2009
Message-Id: <20090911002411.613796207@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:23:03 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Clemens Ladisch <clemens@ladisch.de>,
Takashi Iwai <tiwai@suse.de>
Subject: [patch 17/22] sound: oxygen: fix MCLK rate for 192 kHz playback
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=sound-oxygen-fix-mclk-rate-for-192-khz-playback.patch
Content-Length: 1042
Lines: 31
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Clemens Ladisch <clemens@ladisch.de>
commit b91ab72b830e1494c2c7f8de05ccb2ab2c9cfb26 upstream.
Do not forget to program the MCLK ratio for the I2S output.
Otherwise, the master clock frequency can be too high for
the DACs at sample frequencies above 96 kHz.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
sound/pci/oxygen/oxygen_pcm.c | 2 ++
1 file changed, 2 insertions(+)
--- a/sound/pci/oxygen/oxygen_pcm.c
+++ b/sound/pci/oxygen/oxygen_pcm.c
@@ -469,9 +469,11 @@ static int oxygen_multich_hw_params(stru
oxygen_write16_masked(chip, OXYGEN_I2S_MULTICH_FORMAT,
oxygen_rate(hw_params) |
chip->model.dac_i2s_format |
+ oxygen_i2s_mclk(hw_params) |
oxygen_i2s_bits(hw_params),
OXYGEN_I2S_RATE_MASK |
OXYGEN_I2S_FORMAT_MASK |
+ OXYGEN_I2S_MCLK_MASK |
OXYGEN_I2S_BITS_MASK);
oxygen_update_dac_routing(chip);
oxygen_update_spdif_source(chip);
From gregkh@mini.kroah.org Thu Sep 10 17:24:11 2009
Message-Id: <20090911002411.791389158@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:23:04 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Jonathan Brassow <jbrassow@redhat.com>,
Alasdair G Kergon <agk@redhat.com>
Subject: [patch 18/22] dm raid1: do not allow log_failure variable to unset after being set
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=dm-raid1-do-not-allow-log_failure-variable-to-unset-after-being-set.patch
Content-Length: 2236
Lines: 57
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Jonathan Brassow <jbrassow@redhat.com>
commit d2b698644c97cb033261536a4f2010924a00eac9 upstream.
This patch fixes a bug which was triggering a case where the primary leg
could not be changed on failure even when the mirror was in-sync.
The case involves the failure of the primary device along with
the transient failure of the log device. The problem is that
bios can be put on the 'failures' list (due to log failure)
before 'fail_mirror' is called due to the primary device failure.
Normally, this is fine, but if the log device failure is transient,
a subsequent iteration of the work thread, 'do_mirror', will
reset 'log_failure'. The 'do_failures' function then resets
the 'in_sync' variable when processing bios on the failures list.
The 'in_sync' variable is what is used to determine if the
primary device can be switched in the event of a failure. Since
this has been reset, the primary device is incorrectly assumed
to be not switchable.
The case has been seen in the cluster mirror context, where one
machine realizes the log device is dead before the other machines.
As the responsibilities of the server migrate from one node to
another (because the mirror is being reconfigured due to the failure),
the new server may think for a moment that the log device is fine -
thus resetting the 'log_failure' variable.
In any case, it is inappropiate for us to reset the 'log_failure'
variable. The above bug simply illustrates that it can actually
hurt us.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/md/dm-raid1.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -648,7 +648,13 @@ static void do_writes(struct mirror_set
*/
dm_rh_inc_pending(ms->rh, &sync);
dm_rh_inc_pending(ms->rh, &nosync);
- ms->log_failure = dm_rh_flush(ms->rh) ? 1 : 0;
+
+ /*
+ * If the flush fails on a previous call and succeeds here,
+ * we must not reset the log_failure variable. We need
+ * userspace interaction to do that.
+ */
+ ms->log_failure = dm_rh_flush(ms->rh) ? 1 : ms->log_failure;
/*
* Dispatch io.
From gregkh@mini.kroah.org Thu Sep 10 17:24:12 2009
Message-Id: <20090911002411.963437354@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:23:05 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Mikulas Patocka <mpatocka@redhat.com>,
Alasdair G Kergon <agk@redhat.com>
Subject: [patch 19/22] dm snapshot: refactor zero_disk_area to use chunk_io
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=dm-snapshot-refactor-zero_disk_area-to-use-chunk_io.patch
Content-Length: 2547
Lines: 88
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Mikulas Patocka <mpatocka@redhat.com>
commit 02d2fd31defce6ff77146ad0fef4f19006055d86 upstream.
Refactor chunk_io to prepare for the fix in the following patch.
Pass an area pointer to chunk_io and simplify zero_disk_area to use
chunk_io. No functional change.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/md/dm-snap-persistent.c | 26 +++++++-------------------
1 file changed, 7 insertions(+), 19 deletions(-)
--- a/drivers/md/dm-snap-persistent.c
+++ b/drivers/md/dm-snap-persistent.c
@@ -188,7 +188,8 @@ static void do_metadata(struct work_stru
/*
* Read or write a chunk aligned and sized block of data from a device.
*/
-static int chunk_io(struct pstore *ps, chunk_t chunk, int rw, int metadata)
+static int chunk_io(struct pstore *ps, void *area, chunk_t chunk, int rw,
+ int metadata)
{
struct dm_io_region where = {
.bdev = ps->store->cow->bdev,
@@ -198,7 +199,7 @@ static int chunk_io(struct pstore *ps, c
struct dm_io_request io_req = {
.bi_rw = rw,
.mem.type = DM_IO_VMA,
- .mem.ptr.vma = ps->area,
+ .mem.ptr.vma = area,
.client = ps->io_client,
.notify.fn = NULL,
};
@@ -240,7 +241,7 @@ static int area_io(struct pstore *ps, in
chunk = area_location(ps, ps->current_area);
- r = chunk_io(ps, chunk, rw, 0);
+ r = chunk_io(ps, ps->area, chunk, rw, 0);
if (r)
return r;
@@ -254,20 +255,7 @@ static void zero_memory_area(struct psto
static int zero_disk_area(struct pstore *ps, chunk_t area)
{
- struct dm_io_region where = {
- .bdev = ps->store->cow->bdev,
- .sector = ps->store->chunk_size * area_location(ps, area),
- .count = ps->store->chunk_size,
- };
- struct dm_io_request io_req = {
- .bi_rw = WRITE,
- .mem.type = DM_IO_VMA,
- .mem.ptr.vma = ps->zero_area,
- .client = ps->io_client,
- .notify.fn = NULL,
- };
-
- return dm_io(&io_req, 1, &where, NULL);
+ return chunk_io(ps, ps->zero_area, area_location(ps, area), WRITE, 0);
}
static int read_header(struct pstore *ps, int *new_snapshot)
@@ -297,7 +285,7 @@ static int read_header(struct pstore *ps
if (r)
return r;
- r = chunk_io(ps, 0, READ, 1);
+ r = chunk_io(ps, ps->area, 0, READ, 1);
if (r)
goto bad;
@@ -359,7 +347,7 @@ static int write_header(struct pstore *p
dh->version = cpu_to_le32(ps->version);
dh->chunk_size = cpu_to_le32(ps->store->chunk_size);
- return chunk_io(ps, 0, WRITE, 1);
+ return chunk_io(ps, ps->area, 0, WRITE, 1);
}
/*
From gregkh@mini.kroah.org Thu Sep 10 17:24:12 2009
Message-Id: <20090911002412.104202809@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:23:06 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Mikulas Patocka <mpatocka@redhat.com>,
Alasdair G Kergon <agk@redhat.com>
Subject: [patch 20/22] dm snapshot: fix header corruption race on invalidation
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=dm-snapshot-fix-header-corruption-race-on-invalidation.patch
Content-Length: 3890
Lines: 139
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Mikulas Patocka <mpatocka@redhat.com>
commit 61578dcd3fafe6babd72e8db32110cc0b630a432 upstream.
If a persistent snapshot fills up, a race can corrupt the on-disk header
which causes a crash on any future attempt to activate the snapshot
(typically while booting). This patch fixes the race.
When the snapshot overflows, __invalidate_snapshot is called, which calls
snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that
calls write_header. write_header constructs the new header in the "area"
location.
Concurrently, an existing kcopyd job may finish, call copy_callback
and commit_exception method, that goes to persistent_commit_exception.
persistent_commit_exception doesn't do locking, relying on the fact that
callbacks are single-threaded, but it can race with snapshot invalidation and
overwrite the header that is just being written while the snapshot is being
invalidated.
The result of this race is a corrupted header being written that can
lead to a crash on further reactivation (if chunk_size is zero in the
corrupted header).
The fix is to use separate memory areas for each.
See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/md/dm-snap-persistent.c | 44 ++++++++++++++++++++++++++++++----------
1 file changed, 34 insertions(+), 10 deletions(-)
--- a/drivers/md/dm-snap-persistent.c
+++ b/drivers/md/dm-snap-persistent.c
@@ -106,6 +106,13 @@ struct pstore {
void *zero_area;
/*
+ * An area used for header. The header can be written
+ * concurrently with metadata (when invalidating the snapshot),
+ * so it needs a separate buffer.
+ */
+ void *header_area;
+
+ /*
* Used to keep track of which metadata area the data in
* 'chunk' refers to.
*/
@@ -148,16 +155,27 @@ static int alloc_area(struct pstore *ps)
*/
ps->area = vmalloc(len);
if (!ps->area)
- return r;
+ goto err_area;
ps->zero_area = vmalloc(len);
- if (!ps->zero_area) {
- vfree(ps->area);
- return r;
- }
+ if (!ps->zero_area)
+ goto err_zero_area;
memset(ps->zero_area, 0, len);
+ ps->header_area = vmalloc(len);
+ if (!ps->header_area)
+ goto err_header_area;
+
return 0;
+
+err_header_area:
+ vfree(ps->zero_area);
+
+err_zero_area:
+ vfree(ps->area);
+
+err_area:
+ return r;
}
static void free_area(struct pstore *ps)
@@ -169,6 +187,10 @@ static void free_area(struct pstore *ps)
if (ps->zero_area)
vfree(ps->zero_area);
ps->zero_area = NULL;
+
+ if (ps->header_area)
+ vfree(ps->header_area);
+ ps->header_area = NULL;
}
struct mdata_req {
@@ -285,11 +307,11 @@ static int read_header(struct pstore *ps
if (r)
return r;
- r = chunk_io(ps, ps->area, 0, READ, 1);
+ r = chunk_io(ps, ps->header_area, 0, READ, 1);
if (r)
goto bad;
- dh = (struct disk_header *) ps->area;
+ dh = ps->header_area;
if (le32_to_cpu(dh->magic) == 0) {
*new_snapshot = 1;
@@ -339,15 +361,15 @@ static int write_header(struct pstore *p
{
struct disk_header *dh;
- memset(ps->area, 0, ps->store->chunk_size << SECTOR_SHIFT);
+ memset(ps->header_area, 0, ps->store->chunk_size << SECTOR_SHIFT);
- dh = (struct disk_header *) ps->area;
+ dh = ps->header_area;
dh->magic = cpu_to_le32(SNAP_MAGIC);
dh->valid = cpu_to_le32(ps->valid);
dh->version = cpu_to_le32(ps->version);
dh->chunk_size = cpu_to_le32(ps->store->chunk_size);
- return chunk_io(ps, ps->area, 0, WRITE, 1);
+ return chunk_io(ps, ps->header_area, 0, WRITE, 1);
}
/*
@@ -667,6 +689,8 @@ static int persistent_ctr(struct dm_exce
ps->valid = 1;
ps->version = SNAPSHOT_DISK_VERSION;
ps->area = NULL;
+ ps->zero_area = NULL;
+ ps->header_area = NULL;
ps->next_free = 2; /* skipping the header and first area */
ps->current_committed = 0;
From gregkh@mini.kroah.org Thu Sep 10 17:24:12 2009
Message-Id: <20090911002412.247250943@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:23:07 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Mikulas Patocka <mpatocka@redhat.com>,
Alasdair G Kergon <agk@redhat.com>
Subject: [patch 21/22] dm exception store: split set_chunk_size
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=dm-exception-store-split-set_chunk_size.patch
Content-Length: 1663
Lines: 47
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Mikulas Patocka <mpatocka@redhat.com>
commit 2defcc3fb4661e7351cb2ac48d843efc4c64db13 upstream.
Break the function set_chunk_size to two functions in preparation for
the fix in the following patch.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/md/dm-exception-store.c | 8 ++++++++
drivers/md/dm-exception-store.h | 4 ++++
2 files changed, 12 insertions(+)
--- a/drivers/md/dm-exception-store.c
+++ b/drivers/md/dm-exception-store.c
@@ -171,6 +171,14 @@ static int set_chunk_size(struct dm_exce
*/
chunk_size_ulong = round_up(chunk_size_ulong, PAGE_SIZE >> 9);
+ return dm_exception_store_set_chunk_size(store, chunk_size_ulong,
+ error);
+}
+
+int dm_exception_store_set_chunk_size(struct dm_exception_store *store,
+ unsigned long chunk_size_ulong,
+ char **error)
+{
/* Check chunk_size is a power of 2 */
if (!is_power_of_2(chunk_size_ulong)) {
*error = "Chunk size is not a power of 2";
--- a/drivers/md/dm-exception-store.h
+++ b/drivers/md/dm-exception-store.h
@@ -168,6 +168,10 @@ static inline chunk_t sector_to_chunk(st
int dm_exception_store_type_register(struct dm_exception_store_type *type);
int dm_exception_store_type_unregister(struct dm_exception_store_type *type);
+int dm_exception_store_set_chunk_size(struct dm_exception_store *store,
+ unsigned long chunk_size_ulong,
+ char **error);
+
int dm_exception_store_create(struct dm_target *ti, int argc, char **argv,
unsigned *args_used,
struct dm_exception_store **store);
From gregkh@mini.kroah.org Thu Sep 10 17:24:12 2009
Message-Id: <20090911002412.397568158@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:23:08 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk,
Mikulas Patocka <mpatocka@redhat.com>,
Alasdair G Kergon <agk@redhat.com>
Subject: [patch 22/22] dm snapshot: fix on disk chunk size validation
References: <20090911002246.666327880@mini.kroah.org>
Content-Disposition: inline; filename=dm-snapshot-fix-on-disk-chunk-size-validation.patch
Content-Length: 3106
Lines: 89
2.6.30-stable review patch. If anyone has any objections, please let us know.
------------------
From: Mikulas Patocka <mpatocka@redhat.com>
commit ae0b7448e91353ea5f821601a055aca6b58042cd upstream.
Fix some problems seen in the chunk size processing when activating a
pre-existing snapshot.
For a new snapshot, the chunk size can either be supplied by the creator
or a default value can be used. For an existing snapshot, the
chunk size in the snapshot header on disk should always be used.
If someone attempts to load an existing snapshot and has the 'default
chunk size' option set, the kernel uses its default value even when it
is incorrect for the snapshot being loaded. This patch ensures the
correct on-disk value is always used.
Secondly, when the code does use the chunk size stored on the disk it is
prudent to revalidate it, so the code can exit cleanly if it got
corrupted as happened in
https://bugzilla.redhat.com/show_bug.cgi?id=461506 .
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/md/dm-exception-store.c | 5 +++++
drivers/md/dm-snap-persistent.c | 22 ++++++++++++++--------
2 files changed, 19 insertions(+), 8 deletions(-)
--- a/drivers/md/dm-exception-store.c
+++ b/drivers/md/dm-exception-store.c
@@ -191,6 +191,11 @@ int dm_exception_store_set_chunk_size(st
return -EINVAL;
}
+ if (chunk_size_ulong > INT_MAX >> SECTOR_SHIFT) {
+ *error = "Chunk size is too high";
+ return -EINVAL;
+ }
+
store->chunk_size = chunk_size_ulong;
store->chunk_mask = chunk_size_ulong - 1;
store->chunk_shift = ffs(chunk_size_ulong) - 1;
--- a/drivers/md/dm-snap-persistent.c
+++ b/drivers/md/dm-snap-persistent.c
@@ -286,6 +286,7 @@ static int read_header(struct pstore *ps
struct disk_header *dh;
chunk_t chunk_size;
int chunk_size_supplied = 1;
+ char *chunk_err;
/*
* Use default chunk size (or hardsect_size, if larger) if none supplied
@@ -329,20 +330,25 @@ static int read_header(struct pstore *ps
ps->version = le32_to_cpu(dh->version);
chunk_size = le32_to_cpu(dh->chunk_size);
- if (!chunk_size_supplied || ps->store->chunk_size == chunk_size)
+ if (ps->store->chunk_size == chunk_size)
return 0;
- DMWARN("chunk size %llu in device metadata overrides "
- "table chunk size of %llu.",
- (unsigned long long)chunk_size,
- (unsigned long long)ps->store->chunk_size);
+ if (chunk_size_supplied)
+ DMWARN("chunk size %llu in device metadata overrides "
+ "table chunk size of %llu.",
+ (unsigned long long)chunk_size,
+ (unsigned long long)ps->store->chunk_size);
/* We had a bogus chunk_size. Fix stuff up. */
free_area(ps);
- ps->store->chunk_size = chunk_size;
- ps->store->chunk_mask = chunk_size - 1;
- ps->store->chunk_shift = ffs(chunk_size) - 1;
+ r = dm_exception_store_set_chunk_size(ps->store, chunk_size,
+ &chunk_err);
+ if (r) {
+ DMERR("invalid on-disk chunk size %llu: %s.",
+ (unsigned long long)chunk_size, chunk_err);
+ return r;
+ }
r = dm_io_client_resize(sectors_to_pages(ps->store->chunk_size),
ps->io_client);
From gregkh@mini.kroah.org Thu Sep 10 17:24:09 2009
Message-Id: <20090911002246.666327880@mini.kroah.org>
User-Agent: quilt/0.48-1
Date: Thu, 10 Sep 2009 17:22:46 -0700
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: stable-review@kernel.org,
torvalds@linux-foundation.org,
akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk
Subject: [patch 00/22] 2.6.30.7-stable review
Content-Length: 2473
Lines: 56
This is the start of the stable review cycle for the 2.6.30.7 release.
There are 22 patches in this series, all will be posted as a response to
this one. If anyone has any issues with these being applied, please let
us know. If anyone is a maintainer of the proper subsystem, and wants
to add a Signed-off-by: line to the patch, please respond with it.
These patches are sent out with a number of different people on the Cc:
line. If you wish to be a reviewer, please email stable@kernel.org to
add your name to the list. If you want to be off the reviewer list,
also email us.
Responses should be made by Sunday, September 12, 2009 00:00:00 UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v2.6/stable-review/patch-2.6.30.7-rc1.gz
and the diffstat can be found below.
thanks,
greg k-h
-----------
Makefile | 2 +-
arch/sparc/include/asm/pgtable_64.h | 12 ++++-
arch/sparc/kernel/irq_64.c | 2 +-
arch/sparc/kernel/ktlb.S | 42 +++++++++++++++--
arch/sparc/kernel/nmi.c | 2 +-
arch/sparc/kernel/sys32.S | 2 +-
arch/sparc/mm/init_64.c | 43 +++++++++--------
arch/sparc/mm/init_64.h | 7 ++-
arch/sparc/prom/misc_64.c | 2 +-
arch/sparc/prom/printf.c | 7 +--
drivers/md/dm-exception-store.c | 13 +++++
drivers/md/dm-exception-store.h | 4 ++
drivers/md/dm-raid1.c | 8 +++-
drivers/md/dm-snap-persistent.c | 88 +++++++++++++++++++++--------------
drivers/net/e100.c | 2 +-
drivers/net/ppp_generic.c | 34 +++++++------
drivers/net/pppol2tp.c | 1 +
drivers/pci/iov.c | 23 +++++++++
drivers/pci/pci.h | 13 +++++
drivers/pci/setup-bus.c | 4 +-
drivers/pci/setup-res.c | 6 +-
drivers/scsi/sd.c | 45 ++++++++---------
fs/jffs2/wbuf.c | 10 ++++
fs/nilfs2/btnode.c | 2 +-
mm/slub.c | 4 +-
net/core/net_namespace.c | 2 +-
net/dccp/proto.c | 1 +
net/ipv4/ip_gre.c | 2 +-
sound/pci/oxygen/oxygen_lib.c | 3 +
sound/pci/oxygen/oxygen_pcm.c | 2 +
30 files changed, 264 insertions(+), 124 deletions(-)