releases/2.6.32.1/mbox - pub/scm/linux/kernel/git/stable/stable-queue - Git at Google

 From linux@linux.site Thu Dec 10 21:25:40 2009
 Message-Id: <20091211052540.442199443@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:13 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Sebastian Andrzej Siewior <sebastian@breakpoint.cc>,
  Oleg Nesterov <oleg@redhat.com>,
  Roland McGrath <roland@redhat.com>,
  Kyle McMartin <kyle@mcmartin.ca>,
  Thomas Gleixner <tglx@linutronix.de>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [01/34] signal: Fix alternate signal stack check
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=signal-fix-alternate-signal-stack-check.patch
 Content-Length: 2919
 Lines: 83

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 From: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>

 commit 2a855dd01bc1539111adb7233f587c5c468732ac upstream.

 All architectures in the kernel increment/decrement the stack pointer
 before storing values on the stack.

 On architectures which have the stack grow down sas_ss_sp == sp is not
 on the alternate signal stack while sas_ss_sp + sas_ss_size == sp is
 on the alternate signal stack.

 On architectures which have the stack grow up sas_ss_sp == sp is on
 the alternate signal stack while sas_ss_sp + sas_ss_size == sp is not
 on the alternate signal stack.

 The current implementation fails for architectures which have the
 stack grow down on the corner case where sas_ss_sp == sp.This was
 reported as Debian bug #544905 on AMD64.
 Simplified test case: http://download.breakpoint.cc/tc-sig-stack.c

 The test case creates the following stack scenario:
    0xn0300	stack top
    0xn0200	alt stack pointer top (when switching to alt stack)
    0xn01ff	alt stack end
    0xn0100	alt stack start == stack pointer

 If the signal is sent the stack pointer is pointing to the base
 address of the alt stack and the kernel erroneously decides that it
 has already switched to the alternate stack because of the current
 check for "sp - sas_ss_sp < sas_ss_size"

 On parisc (stack grows up) the scenario would be:
    0xn0200	stack pointer
    0xn01ff	alt stack end
    0xn0100	alt stack start = alt stack pointer base
    		    	  	  (when switching to alt stack)
    0xn0000	stack base

 This is handled correctly by the current implementation.

 [ tglx: Modified for archs which have the stack grow up (parisc) which
   	would fail with the correct implementation for stack grows
   	down. Added a check for sp >= current->sas_ss_sp which is
   	strictly not necessary but makes the code symetric for both
   	variants ]

 Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
 Cc: Oleg Nesterov <oleg@redhat.com>
 Cc: Roland McGrath <roland@redhat.com>
 Cc: Kyle McMartin <kyle@mcmartin.ca>
 LKML-Reference: <20091025143758.GA6653@Chamillionaire.breakpoint.cc>
 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

 ---
  include/linux/sched.h |   13 ++++++++++---
  1 file changed, 10 insertions(+), 3 deletions(-)

 --- a/include/linux/sched.h
 +++ b/include/linux/sched.h
 @@ -2086,11 +2086,18 @@ static inline int is_si_special(const st
  	return info <= SEND_SIG_FORCED;
  }

 -/* True if we are on the alternate signal stack.  */
 -
 +/*
 + * True if we are on the alternate signal stack.
 + */
  static inline int on_sig_stack(unsigned long sp)
  {
 -	return (sp - current->sas_ss_sp < current->sas_ss_size);
 +#ifdef CONFIG_STACK_GROWSUP
 +	return sp >= current->sas_ss_sp &&
 +		sp - current->sas_ss_sp < current->sas_ss_size;
 +#else
 +	return sp > current->sas_ss_sp &&
 +		sp - current->sas_ss_sp <= current->sas_ss_size;
 +#endif
  }

  static inline int sas_ss_flags(unsigned long sp)


 From linux@linux.site Thu Dec 10 21:25:41 2009
 Message-Id: <20091211052540.941627509@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:14 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  James Smart <james.smart@emulex.com>,
  James Bottomley <James.Bottomley@suse.de>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [02/34] SCSI: scsi_lib_dma: fix bug with dma maps on nested scsi objects
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=scsi-scsi_lib_dma-fix-bug-with-dma-maps-on-nested-scsi-objects.patch
 Content-Length: 5210
 Lines: 149

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 From: James Bottomley <James.Bottomley@suse.de>

 commit d139b9bd0e52dda14fd13412e7096e68b56d0076 upstream.

 Some of our virtual SCSI hosts don't have a proper bus parent at the
 top, which can be a problem for doing DMA on them

 This patch makes the host device cache a pointer to the physical bus
 device and provides an extra API for setting it (the normal API picks
 it up from the parent).  This patch also modifies the qla2xxx and lpfc
 vport logic to use the new DMA host setting API.

 Acked-By: James Smart  <james.smart@emulex.com>
 Signed-off-by: James Bottomley <James.Bottomley@suse.de>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

 ---
  drivers/scsi/hosts.c            |   13 ++++++++++---
  drivers/scsi/lpfc/lpfc_init.c   |    2 +-
  drivers/scsi/qla2xxx/qla_attr.c |    3 ++-
  drivers/scsi/scsi_lib_dma.c     |    4 ++--
  include/scsi/scsi_host.h        |   16 +++++++++++++++-
  5 files changed, 30 insertions(+), 8 deletions(-)

 --- a/drivers/scsi/hosts.c
 +++ b/drivers/scsi/hosts.c
 @@ -180,14 +180,20 @@ void scsi_remove_host(struct Scsi_Host *
  EXPORT_SYMBOL(scsi_remove_host);

  /**
 - * scsi_add_host - add a scsi host
 + * scsi_add_host_with_dma - add a scsi host with dma device
   * @shost:	scsi host pointer to add
   * @dev:	a struct device of type scsi class
 + * @dma_dev:	dma device for the host
 + *
 + * Note: You rarely need to worry about this unless you're in a
 + * virtualised host environments, so use the simpler scsi_add_host()
 + * function instead.
   *
   * Return value:
   * 	0 on success / != 0 for error
   **/
 -int scsi_add_host(struct Scsi_Host *shost, struct device *dev)
 +int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 +			   struct device *dma_dev)
  {
  	struct scsi_host_template *sht = shost->hostt;
  	int error = -EINVAL;
 @@ -207,6 +213,7 @@ int scsi_add_host(struct Scsi_Host *shos

  	if (!shost->shost_gendev.parent)
  		shost->shost_gendev.parent = dev ? dev : &platform_bus;
 +	shost->dma_dev = dma_dev;

  	error = device_add(&shost->shost_gendev);
  	if (error)
 @@ -262,7 +269,7 @@ int scsi_add_host(struct Scsi_Host *shos
   fail:
  	return error;
  }
 -EXPORT_SYMBOL(scsi_add_host);
 +EXPORT_SYMBOL(scsi_add_host_with_dma);

  static void scsi_host_dev_release(struct device *dev)
  {
 --- a/drivers/scsi/lpfc/lpfc_init.c
 +++ b/drivers/scsi/lpfc/lpfc_init.c
 @@ -2408,7 +2408,7 @@ lpfc_create_port(struct lpfc_hba *phba,
  	vport->els_tmofunc.function = lpfc_els_timeout;
  	vport->els_tmofunc.data = (unsigned long)vport;

 -	error = scsi_add_host(shost, dev);
 +	error = scsi_add_host_with_dma(shost, dev, &phba->pcidev->dev);
  	if (error)
  		goto out_put_shost;

 --- a/drivers/scsi/qla2xxx/qla_attr.c
 +++ b/drivers/scsi/qla2xxx/qla_attr.c
 @@ -1654,7 +1654,8 @@ qla24xx_vport_create(struct fc_vport *fc
  			fc_vport_set_state(fc_vport, FC_VPORT_LINKDOWN);
  	}

 -	if (scsi_add_host(vha->host, &fc_vport->dev)) {
 +	if (scsi_add_host_with_dma(vha->host, &fc_vport->dev,
 +				   &ha->pdev->dev)) {
  		DEBUG15(printk("scsi(%ld): scsi_add_host failure for VP[%d].\n",
  			vha->host_no, vha->vp_idx));
  		goto vport_create_failed_2;
 --- a/drivers/scsi/scsi_lib_dma.c
 +++ b/drivers/scsi/scsi_lib_dma.c
 @@ -23,7 +23,7 @@ int scsi_dma_map(struct scsi_cmnd *cmd)
  	int nseg = 0;

  	if (scsi_sg_count(cmd)) {
 -		struct device *dev = cmd->device->host->shost_gendev.parent;
 +		struct device *dev = cmd->device->host->dma_dev;

  		nseg = dma_map_sg(dev, scsi_sglist(cmd), scsi_sg_count(cmd),
  				  cmd->sc_data_direction);
 @@ -41,7 +41,7 @@ EXPORT_SYMBOL(scsi_dma_map);
  void scsi_dma_unmap(struct scsi_cmnd *cmd)
  {
  	if (scsi_sg_count(cmd)) {
 -		struct device *dev = cmd->device->host->shost_gendev.parent;
 +		struct device *dev = cmd->device->host->dma_dev;

  		dma_unmap_sg(dev, scsi_sglist(cmd), scsi_sg_count(cmd),
  			     cmd->sc_data_direction);
 --- a/include/scsi/scsi_host.h
 +++ b/include/scsi/scsi_host.h
 @@ -677,6 +677,12 @@ struct Scsi_Host {
  	void *shost_data;

  	/*
 +	 * Points to the physical bus device we'd use to do DMA
 +	 * Needed just in case we have virtual hosts.
 +	 */
 +	struct device *dma_dev;
 +
 +	/*
  	 * We should ensure that this is aligned, both for better performance
  	 * and also because some compilers (m68k) don't automatically force
  	 * alignment to a long boundary.
 @@ -720,7 +726,9 @@ extern int scsi_queue_work(struct Scsi_H
  extern void scsi_flush_work(struct Scsi_Host *);

  extern struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *, int);
 -extern int __must_check scsi_add_host(struct Scsi_Host *, struct device *);
 +extern int __must_check scsi_add_host_with_dma(struct Scsi_Host *,
 +					       struct device *,
 +					       struct device *);
  extern void scsi_scan_host(struct Scsi_Host *);
  extern void scsi_rescan_device(struct device *);
  extern void scsi_remove_host(struct Scsi_Host *);
 @@ -731,6 +739,12 @@ extern const char *scsi_host_state_name(

  extern u64 scsi_calculate_bounce_limit(struct Scsi_Host *);

 +static inline int __must_check scsi_add_host(struct Scsi_Host *host,
 +					     struct device *dev)
 +{
 +	return scsi_add_host_with_dma(host, dev, dev);
 +}
 +
  static inline struct device *scsi_get_device(struct Scsi_Host *shost)
  {
          return shost->shost_gendev.parent;


 From linux@linux.site Thu Dec 10 21:25:42 2009
 Message-Id: <20091211052541.550415868@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:15 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Martin Michlmayr <tbm@cyrius.com>,
  Boaz Harrosh <bharrosh@panasas.com>,
  James Bottomley <James.Bottomley@suse.de>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [03/34] SCSI: osd_protocol.h: Add missing #include
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=scsi-osd_protocol.h-add-missing-include.patch
 Content-Length: 708
 Lines: 24

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 From: Martin Michlmayr <tbm@cyrius.com>

 commit 0899638688f223fd9e9fee60d662665e11693d12 upstream.

 include/scsi/osd_protocol.h uses ALIGN() without an #include
 <linux/kernel.h>, leading to:
 | include/scsi/osd_protocol.h:362: error: implicit declaration of function 'ALIGN'

 Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
 Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
 Signed-off-by: James Bottomley <James.Bottomley@suse.de>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

 --- a/include/scsi/osd_protocol.h
 +++ b/include/scsi/osd_protocol.h
 @@ -17,6 +17,7 @@
  #define __OSD_PROTOCOL_H__

  #include <linux/types.h>
 +#include <linux/kernel.h>
  #include <asm/unaligned.h>
  #include <scsi/scsi.h>


 From linux@linux.site Thu Dec 10 21:25:42 2009
 Message-Id: <20091211052542.045664905@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:16 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  James Bottomley <James.Bottomley@suse.de>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [04/34] SCSI: megaraid_sas: fix 64 bit sense pointer truncation
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=scsi-megaraid_sas-fix-64-bit-sense-pointer-truncation.patch
 Content-Length: 1456
 Lines: 47

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 From: Yang, Bo <Bo.Yang@lsi.com>

 commit 7b2519afa1abd1b9f63aa1e90879307842422dae upstream.

 The current sense pointer is cast to a u32 pointer, which can truncate
 on 64 bits.  Fix by using unsigned long instead.

 Signed-off-by Bo Yang<bo.yang@lsi.com>
 Signed-off-by: James Bottomley <James.Bottomley@suse.de>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

 ---
  drivers/scsi/megaraid/megaraid_sas.c |    8 ++++----
  1 file changed, 4 insertions(+), 4 deletions(-)

 --- a/drivers/scsi/megaraid/megaraid_sas.c
 +++ b/drivers/scsi/megaraid/megaraid_sas.c
 @@ -3032,7 +3032,7 @@ megasas_mgmt_fw_ioctl(struct megasas_ins
  	int error = 0, i;
  	void *sense = NULL;
  	dma_addr_t sense_handle;
 -	u32 *sense_ptr;
 +	unsigned long *sense_ptr;

  	memset(kbuff_arr, 0, sizeof(kbuff_arr));

 @@ -3109,7 +3109,7 @@ megasas_mgmt_fw_ioctl(struct megasas_ins
  		}

  		sense_ptr =
 -		    (u32 *) ((unsigned long)cmd->frame + ioc->sense_off);
 +		(unsigned long *) ((unsigned long)cmd->frame + ioc->sense_off);
  		*sense_ptr = sense_handle;
  	}

 @@ -3140,8 +3140,8 @@ megasas_mgmt_fw_ioctl(struct megasas_ins
  		 * sense_ptr points to the location that has the user
  		 * sense buffer address
  		 */
 -		sense_ptr = (u32 *) ((unsigned long)ioc->frame.raw +
 -				     ioc->sense_off);
 +		sense_ptr = (unsigned long *) ((unsigned long)ioc->frame.raw +
 +				ioc->sense_off);

  		if (copy_to_user((void __user *)((unsigned long)(*sense_ptr)),
  				 sense, ioc->sense_len)) {


 From linux@linux.site Thu Dec 10 21:25:43 2009
 Message-Id: <20091211052542.664737460@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:17 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  "Theodore Tso" <tytso@mit.edu>,
  Curt Wohlgemuth <curtw@google.com>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [05/34] ext4: fix potential buffer head leak when add_dirent_to_buf() returns ENOSPC
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0001-ext4-fix-potential-buffer-head-leak-when-add_dirent_.patch
 Content-Length: 3833
 Lines: 118

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 2de770a406b06dfc619faabbf5d85c835ed3f2e1)

 Previously add_dirent_to_buf() did not free its passed-in buffer head
 in the case of ENOSPC, since in some cases the caller still needed it.
 However, this led to potential buffer head leaks since not all callers
 dealt with this correctly.  Fix this by making simplifying the freeing
 convention; now add_dirent_to_buf() *never* frees the passed-in buffer
 head, and leaves that to the responsibility of its caller.  This makes
 things cleaner and easier to prove that the code is neither leaking
 buffer heads or calling brelse() one time too many.

 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Cc: Curt Wohlgemuth <curtw@google.com>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/namei.c |   30 ++++++++++++------------------
  1 file changed, 12 insertions(+), 18 deletions(-)

 --- a/fs/ext4/namei.c
 +++ b/fs/ext4/namei.c
 @@ -1292,9 +1292,6 @@ errout:
   * add_dirent_to_buf will attempt search the directory block for
   * space.  It will return -ENOSPC if no space is available, and -EIO
   * and -EEXIST if directory entry already exists.
 - *
 - * NOTE!  bh is NOT released in the case where ENOSPC is returned.  In
 - * all other cases bh is released.
   */
  static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,
  			     struct inode *inode, struct ext4_dir_entry_2 *de,
 @@ -1315,14 +1312,10 @@ static int add_dirent_to_buf(handle_t *h
  		top = bh->b_data + blocksize - reclen;
  		while ((char *) de <= top) {
  			if (!ext4_check_dir_entry("ext4_add_entry", dir, de,
 -						  bh, offset)) {
 -				brelse(bh);
 +						  bh, offset))
  				return -EIO;
 -			}
 -			if (ext4_match(namelen, name, de)) {
 -				brelse(bh);
 +			if (ext4_match(namelen, name, de))
  				return -EEXIST;
 -			}
  			nlen = EXT4_DIR_REC_LEN(de->name_len);
  			rlen = ext4_rec_len_from_disk(de->rec_len, blocksize);
  			if ((de->inode? rlen - nlen: rlen) >= reclen)
 @@ -1337,7 +1330,6 @@ static int add_dirent_to_buf(handle_t *h
  	err = ext4_journal_get_write_access(handle, bh);
  	if (err) {
  		ext4_std_error(dir->i_sb, err);
 -		brelse(bh);
  		return err;
  	}

 @@ -1377,7 +1369,6 @@ static int add_dirent_to_buf(handle_t *h
  	err = ext4_handle_dirty_metadata(handle, dir, bh);
  	if (err)
  		ext4_std_error(dir->i_sb, err);
 -	brelse(bh);
  	return 0;
  }

 @@ -1471,7 +1462,9 @@ static int make_indexed_dir(handle_t *ha
  	if (!(de))
  		return retval;

 -	return add_dirent_to_buf(handle, dentry, inode, de, bh);
 +	retval = add_dirent_to_buf(handle, dentry, inode, de, bh);
 +	brelse(bh);
 +	return retval;
  }

  /*
 @@ -1514,8 +1507,10 @@ static int ext4_add_entry(handle_t *hand
  		if(!bh)
  			return retval;
  		retval = add_dirent_to_buf(handle, dentry, inode, NULL, bh);
 -		if (retval != -ENOSPC)
 +		if (retval != -ENOSPC) {
 +			brelse(bh);
  			return retval;
 +		}

  		if (blocks == 1 && !dx_fallback &&
  		    EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_DIR_INDEX))
 @@ -1528,7 +1523,9 @@ static int ext4_add_entry(handle_t *hand
  	de = (struct ext4_dir_entry_2 *) bh->b_data;
  	de->inode = 0;
  	de->rec_len = ext4_rec_len_to_disk(blocksize, blocksize);
 -	return add_dirent_to_buf(handle, dentry, inode, de, bh);
 +	retval = add_dirent_to_buf(handle, dentry, inode, de, bh);
 +	brelse(bh);
 +	return retval;
  }

  /*
 @@ -1561,10 +1558,8 @@ static int ext4_dx_add_entry(handle_t *h
  		goto journal_error;

  	err = add_dirent_to_buf(handle, dentry, inode, NULL, bh);
 -	if (err != -ENOSPC) {
 -		bh = NULL;
 +	if (err != -ENOSPC)
  		goto cleanup;
 -	}

  	/* Block full, should compress but for now just split */
  	dxtrace(printk(KERN_DEBUG "using %u of %u node entries\n",
 @@ -1657,7 +1652,6 @@ static int ext4_dx_add_entry(handle_t *h
  	if (!de)
  		goto cleanup;
  	err = add_dirent_to_buf(handle, dentry, inode, de, bh);
 -	bh = NULL;
  	goto cleanup;

  journal_error:


 From linux@linux.site Thu Dec 10 21:25:43 2009
 Message-Id: <20091211052543.277851362@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:18 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [06/34] ext4: avoid divide by zero when trying to mount a corrupted file system
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0002-ext4-avoid-divide-by-zero-when-trying-to-mount-a-cor.patch
 Content-Length: 1267
 Lines: 39

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 503358ae01b70ce6909d19dd01287093f6b6271c)

 If s_log_groups_per_flex is greater than 31, then groups_per_flex will
 will overflow and cause a divide by zero error.  This can cause kernel
 BUG if such a file system is mounted.

 Thanks to Nageswara R Sastry for analyzing the failure and providing
 an initial patch.

 http://bugzilla.kernel.org/show_bug.cgi?id=14287

 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/super.c |    8 ++++----
  1 file changed, 4 insertions(+), 4 deletions(-)

 --- a/fs/ext4/super.c
 +++ b/fs/ext4/super.c
 @@ -1673,14 +1673,14 @@ static int ext4_fill_flex_info(struct su
  	size_t size;
  	int i;

 -	if (!sbi->s_es->s_log_groups_per_flex) {
 +	sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
 +	groups_per_flex = 1 << sbi->s_log_groups_per_flex;
 +
 +	if (groups_per_flex < 2) {
  		sbi->s_log_groups_per_flex = 0;
  		return 1;
  	}

 -	sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
 -	groups_per_flex = 1 << sbi->s_log_groups_per_flex;
 -
  	/* We allocate both existing and potentially added groups */
  	flex_group_count = ((sbi->s_groups_count + groups_per_flex - 1) +
  			((le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks) + 1) <<


 From linux@linux.site Thu Dec 10 21:25:44 2009
 Message-Id: <20091211052543.772152436@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:19 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Akira Fujita <a-fujita@rs.jp.nec.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [07/34] ext4: fix the returned block count if EXT4_IOC_MOVE_EXT fails
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0003-ext4-fix-the-returned-block-count-if-EXT4_IOC_MOVE_E.patch
 Content-Length: 10970
 Lines: 349

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit f868a48d06f8886cb0367568a12367fa4f21ea0d)

 If the EXT4_IOC_MOVE_EXT ioctl fails, the number of blocks that were
 exchanged before the failure should be returned to the userspace
 caller.  Unfortunately, currently if the block size is not the same as
 the page size, the returned block count that is returned is the
 page-aligned block count instead of the actual block count.  This
 commit addresses this bug.

 Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/move_extent.c |  139 ++++++++++++++++++++++++++------------------------
  1 file changed, 73 insertions(+), 66 deletions(-)

 --- a/fs/ext4/move_extent.c
 +++ b/fs/ext4/move_extent.c
 @@ -661,6 +661,7 @@ mext_calc_swap_extents(struct ext4_exten
   * @donor_inode:	donor inode
   * @from:		block offset of orig_inode
   * @count:		block count to be replaced
 + * @err:		pointer to save return value
   *
   * Replace original inode extents and donor inode extents page by page.
   * We implement this replacement in the following three steps:
 @@ -671,19 +672,18 @@ mext_calc_swap_extents(struct ext4_exten
   * 3. Change the block information of donor inode to point at the saved
   *    original inode blocks in the dummy extents.
   *
 - * Return 0 on success, or a negative error value on failure.
 + * Return replaced block count.
   */
  static int
  mext_replace_branches(handle_t *handle, struct inode *orig_inode,
  			   struct inode *donor_inode, ext4_lblk_t from,
 -			   ext4_lblk_t count)
 +			   ext4_lblk_t count, int *err)
  {
  	struct ext4_ext_path *orig_path = NULL;
  	struct ext4_ext_path *donor_path = NULL;
  	struct ext4_extent *oext, *dext;
  	struct ext4_extent tmp_dext, tmp_oext;
  	ext4_lblk_t orig_off = from, donor_off = from;
 -	int err = 0;
  	int depth;
  	int replaced_count = 0;
  	int dext_alen;
 @@ -691,13 +691,13 @@ mext_replace_branches(handle_t *handle,
  	mext_double_down_write(orig_inode, donor_inode);

  	/* Get the original extent for the block "orig_off" */
 -	err = get_ext_path(orig_inode, orig_off, &orig_path);
 -	if (err)
 +	*err = get_ext_path(orig_inode, orig_off, &orig_path);
 +	if (*err)
  		goto out;

  	/* Get the donor extent for the head */
 -	err = get_ext_path(donor_inode, donor_off, &donor_path);
 -	if (err)
 +	*err = get_ext_path(donor_inode, donor_off, &donor_path);
 +	if (*err)
  		goto out;
  	depth = ext_depth(orig_inode);
  	oext = orig_path[depth].p_ext;
 @@ -707,9 +707,9 @@ mext_replace_branches(handle_t *handle,
  	dext = donor_path[depth].p_ext;
  	tmp_dext = *dext;

 -	err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
 +	*err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
  				      donor_off, count);
 -	if (err)
 +	if (*err)
  		goto out;

  	/* Loop for the donor extents */
 @@ -718,7 +718,7 @@ mext_replace_branches(handle_t *handle,
  		if (!dext) {
  			ext4_error(donor_inode->i_sb, __func__,
  				   "The extent for donor must be found");
 -			err = -EIO;
 +			*err = -EIO;
  			goto out;
  		} else if (donor_off != le32_to_cpu(tmp_dext.ee_block)) {
  			ext4_error(donor_inode->i_sb, __func__,
 @@ -726,20 +726,20 @@ mext_replace_branches(handle_t *handle,
  				"extent(%u) should be equal",
  				donor_off,
  				le32_to_cpu(tmp_dext.ee_block));
 -			err = -EIO;
 +			*err = -EIO;
  			goto out;
  		}

  		/* Set donor extent to orig extent */
 -		err = mext_leaf_block(handle, orig_inode,
 +		*err = mext_leaf_block(handle, orig_inode,
  					   orig_path, &tmp_dext, &orig_off);
 -		if (err < 0)
 +		if (*err)
  			goto out;

  		/* Set orig extent to donor extent */
 -		err = mext_leaf_block(handle, donor_inode,
 +		*err = mext_leaf_block(handle, donor_inode,
  					   donor_path, &tmp_oext, &donor_off);
 -		if (err < 0)
 +		if (*err)
  			goto out;

  		dext_alen = ext4_ext_get_actual_len(&tmp_dext);
 @@ -753,35 +753,25 @@ mext_replace_branches(handle_t *handle,

  		if (orig_path)
  			ext4_ext_drop_refs(orig_path);
 -		err = get_ext_path(orig_inode, orig_off, &orig_path);
 -		if (err)
 +		*err = get_ext_path(orig_inode, orig_off, &orig_path);
 +		if (*err)
  			goto out;
  		depth = ext_depth(orig_inode);
  		oext = orig_path[depth].p_ext;
 -		if (le32_to_cpu(oext->ee_block) +
 -				ext4_ext_get_actual_len(oext) <= orig_off) {
 -			err = 0;
 -			goto out;
 -		}
  		tmp_oext = *oext;

  		if (donor_path)
  			ext4_ext_drop_refs(donor_path);
 -		err = get_ext_path(donor_inode, donor_off, &donor_path);
 -		if (err)
 +		*err = get_ext_path(donor_inode, donor_off, &donor_path);
 +		if (*err)
  			goto out;
  		depth = ext_depth(donor_inode);
  		dext = donor_path[depth].p_ext;
 -		if (le32_to_cpu(dext->ee_block) +
 -				ext4_ext_get_actual_len(dext) <= donor_off) {
 -			err = 0;
 -			goto out;
 -		}
  		tmp_dext = *dext;

 -		err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
 +		*err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off,
  					   donor_off, count - replaced_count);
 -		if (err)
 +		if (*err)
  			goto out;
  	}

 @@ -796,7 +786,7 @@ out:
  	}

  	mext_double_up_write(orig_inode, donor_inode);
 -	return err;
 +	return replaced_count;
  }

  /**
 @@ -808,16 +798,17 @@ out:
   * @data_offset_in_page:	block index where data swapping starts
   * @block_len_in_page:		the number of blocks to be swapped
   * @uninit:			orig extent is uninitialized or not
 + * @err:			pointer to save return value
   *
   * Save the data in original inode blocks and replace original inode extents
   * with donor inode extents by calling mext_replace_branches().
 - * Finally, write out the saved data in new original inode blocks. Return 0
 - * on success, or a negative error value on failure.
 + * Finally, write out the saved data in new original inode blocks. Return
 + * replaced block count.
   */
  static int
  move_extent_per_page(struct file *o_filp, struct inode *donor_inode,
  		  pgoff_t orig_page_offset, int data_offset_in_page,
 -		  int block_len_in_page, int uninit)
 +		  int block_len_in_page, int uninit, int *err)
  {
  	struct inode *orig_inode = o_filp->f_dentry->d_inode;
  	struct address_space *mapping = orig_inode->i_mapping;
 @@ -829,9 +820,11 @@ move_extent_per_page(struct file *o_filp
  	long long offs = orig_page_offset << PAGE_CACHE_SHIFT;
  	unsigned long blocksize = orig_inode->i_sb->s_blocksize;
  	unsigned int w_flags = 0;
 -	unsigned int tmp_data_len, data_len;
 +	unsigned int tmp_data_size, data_size, replaced_size;
  	void *fsdata;
 -	int ret, i, jblocks;
 +	int i, jblocks;
 +	int err2 = 0;
 +	int replaced_count = 0;
  	int blocks_per_page = PAGE_CACHE_SIZE >> orig_inode->i_blkbits;

  	/*
 @@ -841,8 +834,8 @@ move_extent_per_page(struct file *o_filp
  	jblocks = ext4_writepage_trans_blocks(orig_inode) * 2;
  	handle = ext4_journal_start(orig_inode, jblocks);
  	if (IS_ERR(handle)) {
 -		ret = PTR_ERR(handle);
 -		return ret;
 +		*err = PTR_ERR(handle);
 +		return 0;
  	}

  	if (segment_eq(get_fs(), KERNEL_DS))
 @@ -858,9 +851,9 @@ move_extent_per_page(struct file *o_filp
  	 * Just swap data blocks between orig and donor.
  	 */
  	if (uninit) {
 -		ret = mext_replace_branches(handle, orig_inode,
 -						 donor_inode, orig_blk_offset,
 -						 block_len_in_page);
 +		replaced_count = mext_replace_branches(handle, orig_inode,
 +						donor_inode, orig_blk_offset,
 +						block_len_in_page, err);

  		/* Clear the inode cache not to refer to the old data */
  		ext4_ext_invalidate_cache(orig_inode);
 @@ -870,27 +863,28 @@ move_extent_per_page(struct file *o_filp

  	offs = (long long)orig_blk_offset << orig_inode->i_blkbits;

 -	/* Calculate data_len */
 +	/* Calculate data_size */
  	if ((orig_blk_offset + block_len_in_page - 1) ==
  	    ((orig_inode->i_size - 1) >> orig_inode->i_blkbits)) {
  		/* Replace the last block */
 -		tmp_data_len = orig_inode->i_size & (blocksize - 1);
 +		tmp_data_size = orig_inode->i_size & (blocksize - 1);
  		/*
 -		 * If data_len equal zero, it shows data_len is multiples of
 +		 * If data_size equal zero, it shows data_size is multiples of
  		 * blocksize. So we set appropriate value.
  		 */
 -		if (tmp_data_len == 0)
 -			tmp_data_len = blocksize;
 +		if (tmp_data_size == 0)
 +			tmp_data_size = blocksize;

 -		data_len = tmp_data_len +
 +		data_size = tmp_data_size +
  			((block_len_in_page - 1) << orig_inode->i_blkbits);
 -	} else {
 -		data_len = block_len_in_page << orig_inode->i_blkbits;
 -	}
 +	} else
 +		data_size = block_len_in_page << orig_inode->i_blkbits;
 +
 +	replaced_size = data_size;

 -	ret = a_ops->write_begin(o_filp, mapping, offs, data_len, w_flags,
 +	*err = a_ops->write_begin(o_filp, mapping, offs, data_size, w_flags,
  				 &page, &fsdata);
 -	if (unlikely(ret < 0))
 +	if (unlikely(*err < 0))
  		goto out;

  	if (!PageUptodate(page)) {
 @@ -911,10 +905,17 @@ move_extent_per_page(struct file *o_filp
  	/* Release old bh and drop refs */
  	try_to_release_page(page, 0);

 -	ret = mext_replace_branches(handle, orig_inode, donor_inode,
 -					 orig_blk_offset, block_len_in_page);
 -	if (ret < 0)
 -		goto out;
 +	replaced_count = mext_replace_branches(handle, orig_inode, donor_inode,
 +					orig_blk_offset, block_len_in_page,
 +					&err2);
 +	if (err2) {
 +		if (replaced_count) {
 +			block_len_in_page = replaced_count;
 +			replaced_size =
 +				block_len_in_page << orig_inode->i_blkbits;
 +		} else
 +			goto out;
 +	}

  	/* Clear the inode cache not to refer to the old data */
  	ext4_ext_invalidate_cache(orig_inode);
 @@ -928,16 +929,16 @@ move_extent_per_page(struct file *o_filp
  		bh = bh->b_this_page;

  	for (i = 0; i < block_len_in_page; i++) {
 -		ret = ext4_get_block(orig_inode,
 +		*err = ext4_get_block(orig_inode,
  				(sector_t)(orig_blk_offset + i), bh, 0);
 -		if (ret < 0)
 +		if (*err < 0)
  			goto out;

  		if (bh->b_this_page != NULL)
  			bh = bh->b_this_page;
  	}

 -	ret = a_ops->write_end(o_filp, mapping, offs, data_len, data_len,
 +	*err = a_ops->write_end(o_filp, mapping, offs, data_size, replaced_size,
  			       page, fsdata);
  	page = NULL;

 @@ -951,7 +952,10 @@ out:
  out2:
  	ext4_journal_stop(handle);

 -	return ret < 0 ? ret : 0;
 +	if (err2)
 +		*err = err2;
 +
 +	return replaced_count;
  }

  /**
 @@ -1367,15 +1371,17 @@ ext4_move_extents(struct file *o_filp, s
  		while (orig_page_offset <= seq_end_page) {

  			/* Swap original branches with new branches */
 -			ret1 = move_extent_per_page(o_filp, donor_inode,
 +			block_len_in_page = move_extent_per_page(
 +						o_filp, donor_inode,
  						orig_page_offset,
  						data_offset_in_page,
 -						block_len_in_page, uninit);
 -			if (ret1 < 0)
 -				goto out;
 -			orig_page_offset++;
 +						block_len_in_page, uninit,
 +						&ret1);
 +
  			/* Count how many blocks we have exchanged */
  			*moved_len += block_len_in_page;
 +			if (ret1 < 0)
 +				goto out;
  			if (*moved_len > len) {
  				ext4_error(orig_inode->i_sb, __func__,
  					"We replaced blocks too much! "
 @@ -1385,6 +1391,7 @@ ext4_move_extents(struct file *o_filp, s
  				goto out;
  			}

 +			orig_page_offset++;
  			data_offset_in_page = 0;
  			rest_blocks -= block_len_in_page;
  			if (rest_blocks > blocks_per_page)


 From linux@linux.site Thu Dec 10 21:25:44 2009
 Message-Id: <20091211052544.287395070@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:20 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Akira Fujita <a-fujita@rs.jp.nec.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [08/34] ext4: fix lock order problem in ext4_move_extents()
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0004-ext4-fix-lock-order-problem-in-ext4_move_extents.patch
 Content-Length: 10372
 Lines: 310

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit fc04cb49a898c372a22b21fffc47f299d8710801)

 ext4_move_extents() checks the logical block contiguousness
 of original file with ext4_find_extent() and mext_next_extent().
 Therefore the extent which ext4_ext_path structure indicates
 must not be changed between above functions.

 But in current implementation, there is no i_data_sem protection
 between ext4_ext_find_extent() and mext_next_extent().  So the extent
 which ext4_ext_path structure indicates may be overwritten by
 delalloc.  As a result, ext4_move_extents() will exchange wrong blocks
 between original and donor files.  I change the place where
 acquire/release i_data_sem to solve this problem.

 Moreover, I changed move_extent_per_page() to start transaction first,
 and then acquire i_data_sem.  Without this change, there is a
 possibility of the deadlock between mmap() and ext4_move_extents():

 * NOTE: "A", "B" and "C" mean different processes

 A-1: ext4_ext_move_extents() acquires i_data_sem of two inodes.

 B:   do_page_fault() starts the transaction (T),
      and then tries to acquire i_data_sem.
      But process "A" is already holding it, so it is kept waiting.

 C:   While "A" and "B" running, kjournald2 tries to commit transaction (T)
      but it is under updating, so kjournald2 waits for it.

 A-2: Call ext4_journal_start with holding i_data_sem,
      but transaction (T) is locked.

 Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/move_extent.c |  117 ++++++++++++++++++++++----------------------------
  1 file changed, 53 insertions(+), 64 deletions(-)

 --- a/fs/ext4/move_extent.c
 +++ b/fs/ext4/move_extent.c
 @@ -77,12 +77,14 @@ static int
  mext_next_extent(struct inode *inode, struct ext4_ext_path *path,
  		      struct ext4_extent **extent)
  {
 +	struct ext4_extent_header *eh;
  	int ppos, leaf_ppos = path->p_depth;

  	ppos = leaf_ppos;
  	if (EXT_LAST_EXTENT(path[ppos].p_hdr) > path[ppos].p_ext) {
  		/* leaf block */
  		*extent = ++path[ppos].p_ext;
 +		path[ppos].p_block = ext_pblock(path[ppos].p_ext);
  		return 0;
  	}

 @@ -119,9 +121,18 @@ mext_next_extent(struct inode *inode, st
  					ext_block_hdr(path[cur_ppos+1].p_bh);
  			}

 +			path[leaf_ppos].p_ext = *extent = NULL;
 +
 +			eh = path[leaf_ppos].p_hdr;
 +			if (le16_to_cpu(eh->eh_entries) == 0)
 +				/* empty leaf is found */
 +				return -ENODATA;
 +
  			/* leaf block */
  			path[leaf_ppos].p_ext = *extent =
  				EXT_FIRST_EXTENT(path[leaf_ppos].p_hdr);
 +			path[leaf_ppos].p_block =
 +					ext_pblock(path[leaf_ppos].p_ext);
  			return 0;
  		}
  	}
 @@ -155,40 +166,15 @@ mext_check_null_inode(struct inode *inod
  }

  /**
 - * mext_double_down_read - Acquire two inodes' read semaphore
 - *
 - * @orig_inode:		original inode structure
 - * @donor_inode:	donor inode structure
 - * Acquire read semaphore of the two inodes (orig and donor) by i_ino order.
 - */
 -static void
 -mext_double_down_read(struct inode *orig_inode, struct inode *donor_inode)
 -{
 -	struct inode *first = orig_inode, *second = donor_inode;
 -
 -	/*
 -	 * Use the inode number to provide the stable locking order instead
 -	 * of its address, because the C language doesn't guarantee you can
 -	 * compare pointers that don't come from the same array.
 -	 */
 -	if (donor_inode->i_ino < orig_inode->i_ino) {
 -		first = donor_inode;
 -		second = orig_inode;
 -	}
 -
 -	down_read(&EXT4_I(first)->i_data_sem);
 -	down_read(&EXT4_I(second)->i_data_sem);
 -}
 -
 -/**
 - * mext_double_down_write - Acquire two inodes' write semaphore
 + * double_down_write_data_sem - Acquire two inodes' write lock of i_data_sem
   *
   * @orig_inode:		original inode structure
   * @donor_inode:	donor inode structure
 - * Acquire write semaphore of the two inodes (orig and donor) by i_ino order.
 + * Acquire write lock of i_data_sem of the two inodes (orig and donor) by
 + * i_ino order.
   */
  static void
 -mext_double_down_write(struct inode *orig_inode, struct inode *donor_inode)
 +double_down_write_data_sem(struct inode *orig_inode, struct inode *donor_inode)
  {
  	struct inode *first = orig_inode, *second = donor_inode;

 @@ -207,28 +193,14 @@ mext_double_down_write(struct inode *ori
  }

  /**
 - * mext_double_up_read - Release two inodes' read semaphore
 + * double_up_write_data_sem - Release two inodes' write lock of i_data_sem
   *
   * @orig_inode:		original inode structure to be released its lock first
   * @donor_inode:	donor inode structure to be released its lock second
 - * Release read semaphore of two inodes (orig and donor).
 + * Release write lock of i_data_sem of two inodes (orig and donor).
   */
  static void
 -mext_double_up_read(struct inode *orig_inode, struct inode *donor_inode)
 -{
 -	up_read(&EXT4_I(orig_inode)->i_data_sem);
 -	up_read(&EXT4_I(donor_inode)->i_data_sem);
 -}
 -
 -/**
 - * mext_double_up_write - Release two inodes' write semaphore
 - *
 - * @orig_inode:		original inode structure to be released its lock first
 - * @donor_inode:	donor inode structure to be released its lock second
 - * Release write semaphore of two inodes (orig and donor).
 - */
 -static void
 -mext_double_up_write(struct inode *orig_inode, struct inode *donor_inode)
 +double_up_write_data_sem(struct inode *orig_inode, struct inode *donor_inode)
  {
  	up_write(&EXT4_I(orig_inode)->i_data_sem);
  	up_write(&EXT4_I(donor_inode)->i_data_sem);
 @@ -688,8 +660,6 @@ mext_replace_branches(handle_t *handle,
  	int replaced_count = 0;
  	int dext_alen;

 -	mext_double_down_write(orig_inode, donor_inode);
 -
  	/* Get the original extent for the block "orig_off" */
  	*err = get_ext_path(orig_inode, orig_off, &orig_path);
  	if (*err)
 @@ -785,7 +755,6 @@ out:
  		kfree(donor_path);
  	}

 -	mext_double_up_write(orig_inode, donor_inode);
  	return replaced_count;
  }

 @@ -851,6 +820,11 @@ move_extent_per_page(struct file *o_filp
  	 * Just swap data blocks between orig and donor.
  	 */
  	if (uninit) {
 +		/*
 +		 * Protect extent trees against block allocations
 +		 * via delalloc
 +		 */
 +		double_down_write_data_sem(orig_inode, donor_inode);
  		replaced_count = mext_replace_branches(handle, orig_inode,
  						donor_inode, orig_blk_offset,
  						block_len_in_page, err);
 @@ -858,6 +832,7 @@ move_extent_per_page(struct file *o_filp
  		/* Clear the inode cache not to refer to the old data */
  		ext4_ext_invalidate_cache(orig_inode);
  		ext4_ext_invalidate_cache(donor_inode);
 +		double_up_write_data_sem(orig_inode, donor_inode);
  		goto out2;
  	}

 @@ -905,6 +880,8 @@ move_extent_per_page(struct file *o_filp
  	/* Release old bh and drop refs */
  	try_to_release_page(page, 0);

 +	/* Protect extent trees against block allocations via delalloc */
 +	double_down_write_data_sem(orig_inode, donor_inode);
  	replaced_count = mext_replace_branches(handle, orig_inode, donor_inode,
  					orig_blk_offset, block_len_in_page,
  					&err2);
 @@ -913,14 +890,18 @@ move_extent_per_page(struct file *o_filp
  			block_len_in_page = replaced_count;
  			replaced_size =
  				block_len_in_page << orig_inode->i_blkbits;
 -		} else
 +		} else {
 +			double_up_write_data_sem(orig_inode, donor_inode);
  			goto out;
 +		}
  	}

  	/* Clear the inode cache not to refer to the old data */
  	ext4_ext_invalidate_cache(orig_inode);
  	ext4_ext_invalidate_cache(donor_inode);

 +	double_up_write_data_sem(orig_inode, donor_inode);
 +
  	if (!page_has_buffers(page))
  		create_empty_buffers(page, 1 << orig_inode->i_blkbits, 0);

 @@ -1236,16 +1217,16 @@ ext4_move_extents(struct file *o_filp, s
  		return -EINVAL;
  	}

 -	/* protect orig and donor against a truncate */
 +	/* Protect orig and donor inodes against a truncate */
  	ret1 = mext_inode_double_lock(orig_inode, donor_inode);
  	if (ret1 < 0)
  		return ret1;

 -	mext_double_down_read(orig_inode, donor_inode);
 +	/* Protect extent tree against block allocations via delalloc */
 +	double_down_write_data_sem(orig_inode, donor_inode);
  	/* Check the filesystem environment whether move_extent can be done */
  	ret1 = mext_check_arguments(orig_inode, donor_inode, orig_start,
  					donor_start, &len, *moved_len);
 -	mext_double_up_read(orig_inode, donor_inode);
  	if (ret1)
  		goto out;

 @@ -1308,6 +1289,10 @@ ext4_move_extents(struct file *o_filp, s
  			 ext4_ext_get_actual_len(ext_cur), block_end + 1) -
  		     max(le32_to_cpu(ext_cur->ee_block), block_start);

 +	/* Discard preallocations of two inodes */
 +	ext4_discard_preallocations(orig_inode);
 +	ext4_discard_preallocations(donor_inode);
 +
  	while (!last_extent && le32_to_cpu(ext_cur->ee_block) <= block_end) {
  		seq_blocks += add_blocks;

 @@ -1359,14 +1344,14 @@ ext4_move_extents(struct file *o_filp, s
  		seq_start = le32_to_cpu(ext_cur->ee_block);
  		rest_blocks = seq_blocks;

 -		/* Discard preallocations of two inodes */
 -		down_write(&EXT4_I(orig_inode)->i_data_sem);
 -		ext4_discard_preallocations(orig_inode);
 -		up_write(&EXT4_I(orig_inode)->i_data_sem);
 -
 -		down_write(&EXT4_I(donor_inode)->i_data_sem);
 -		ext4_discard_preallocations(donor_inode);
 -		up_write(&EXT4_I(donor_inode)->i_data_sem);
 +		/*
 +		 * Up semaphore to avoid following problems:
 +		 * a. transaction deadlock among ext4_journal_start,
 +		 *    ->write_begin via pagefault, and jbd2_journal_commit
 +		 * b. racing with ->readpage, ->write_begin, and ext4_get_block
 +		 *    in move_extent_per_page
 +		 */
 +		double_up_write_data_sem(orig_inode, donor_inode);

  		while (orig_page_offset <= seq_end_page) {

 @@ -1381,14 +1366,14 @@ ext4_move_extents(struct file *o_filp, s
  			/* Count how many blocks we have exchanged */
  			*moved_len += block_len_in_page;
  			if (ret1 < 0)
 -				goto out;
 +				break;
  			if (*moved_len > len) {
  				ext4_error(orig_inode->i_sb, __func__,
  					"We replaced blocks too much! "
  					"sum of replaced: %llu requested: %llu",
  					*moved_len, len);
  				ret1 = -EIO;
 -				goto out;
 +				break;
  			}

  			orig_page_offset++;
 @@ -1400,6 +1385,10 @@ ext4_move_extents(struct file *o_filp, s
  				block_len_in_page = rest_blocks;
  		}

 +		double_down_write_data_sem(orig_inode, donor_inode);
 +		if (ret1 < 0)
 +			break;
 +
  		/* Decrease buffer counter */
  		if (holecheck_path)
  			ext4_ext_drop_refs(holecheck_path);
 @@ -1429,7 +1418,7 @@ out:
  		ext4_ext_drop_refs(holecheck_path);
  		kfree(holecheck_path);
  	}
 -
 +	double_up_write_data_sem(orig_inode, donor_inode);
  	ret2 = mext_inode_double_unlock(orig_inode, donor_inode);

  	if (ret1)


 From linux@linux.site Thu Dec 10 21:25:45 2009
 Message-Id: <20091211052544.890897126@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:21 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Akira Fujita <a-fujita@rs.jp.nec.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [09/34] ext4: fix possible recursive locking warning in EXT4_IOC_MOVE_EXT
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0005-ext4-fix-possible-recursive-locking-warning-in-EXT4_.patch
 Content-Length: 1075
 Lines: 32

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 49bd22bc4d603a2a4fc2a6a60e156cbea52eb494)

 If CONFIG_PROVE_LOCKING is enabled, the double_down_write_data_sem()
 will trigger a false-positive warning of a recursive lock.  Since we
 take i_data_sem for the two inodes ordered by their inode numbers,
 this isn't a problem.  Use of down_write_nested() will notify the lock
 dependency checker machinery that there is no problem here.

 This problem was reported by Brian Rogers:

 	http://marc.info/?l=linux-ext4&m=125115356928011&w=1

 Reported-by: Brian Rogers <brian@xyzw.org>
 Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/move_extent.c |    2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 --- a/fs/ext4/move_extent.c
 +++ b/fs/ext4/move_extent.c
 @@ -189,7 +189,7 @@ double_down_write_data_sem(struct inode
  	}

  	down_write(&EXT4_I(first)->i_data_sem);
 -	down_write(&EXT4_I(second)->i_data_sem);
 +	down_write_nested(&EXT4_I(second)->i_data_sem, SINGLE_DEPTH_NESTING);
  }

  /**


 From linux@linux.site Thu Dec 10 21:25:45 2009
 Message-Id: <20091211052545.443549269@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:22 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [10/34] ext4: plug a buffer_head leak in an error path of ext4_iget()
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0006-ext4-plug-a-buffer_head-leak-in-an-error-path-of-ext.patch
 Content-Length: 2427
 Lines: 82

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 567f3e9a70d71e5c9be03701b8578be77857293b)

 One of the invalid error paths in ext4_iget() forgot to brelse() the
 inode buffer head.  Fix it by adding a brelse() in the common error
 return path, which also simplifies function.

 Thanks to Andi Kleen <ak@linux.intel.com> reporting the problem.

 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/inode.c |   11 +++--------
  1 file changed, 3 insertions(+), 8 deletions(-)

 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -4781,7 +4781,6 @@ struct inode *ext4_iget(struct super_blo
  	struct ext4_iloc iloc;
  	struct ext4_inode *raw_inode;
  	struct ext4_inode_info *ei;
 -	struct buffer_head *bh;
  	struct inode *inode;
  	long ret;
  	int block;
 @@ -4793,11 +4792,11 @@ struct inode *ext4_iget(struct super_blo
  		return inode;

  	ei = EXT4_I(inode);
 +	iloc.bh = 0;

  	ret = __ext4_get_inode_loc(inode, &iloc, 0);
  	if (ret < 0)
  		goto bad_inode;
 -	bh = iloc.bh;
  	raw_inode = ext4_raw_inode(&iloc);
  	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
  	inode->i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
 @@ -4820,7 +4819,6 @@ struct inode *ext4_iget(struct super_blo
  		if (inode->i_mode == 0 ||
  		    !(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ORPHAN_FS)) {
  			/* this inode is deleted */
 -			brelse(bh);
  			ret = -ESTALE;
  			goto bad_inode;
  		}
 @@ -4852,7 +4850,6 @@ struct inode *ext4_iget(struct super_blo
  		ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize);
  		if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize >
  		    EXT4_INODE_SIZE(inode->i_sb)) {
 -			brelse(bh);
  			ret = -EIO;
  			goto bad_inode;
  		}
 @@ -4905,10 +4902,8 @@ struct inode *ext4_iget(struct super_blo
  		/* Validate block references which are part of inode */
  		ret = ext4_check_inode_blockref(inode);
  	}
 -	if (ret) {
 -		brelse(bh);
 +	if (ret)
  		goto bad_inode;
 -	}

  	if (S_ISREG(inode->i_mode)) {
  		inode->i_op = &ext4_file_inode_operations;
 @@ -4936,7 +4931,6 @@ struct inode *ext4_iget(struct super_blo
  			init_special_inode(inode, inode->i_mode,
  			   new_decode_dev(le32_to_cpu(raw_inode->i_block[1])));
  	} else {
 -		brelse(bh);
  		ret = -EIO;
  		ext4_error(inode->i_sb, __func__,
  			   "bogus i_mode (%o) for inode=%lu",
 @@ -4949,6 +4943,7 @@ struct inode *ext4_iget(struct super_blo
  	return inode;

  bad_inode:
 +	brelse(iloc.bh);
  	iget_failed(inode);
  	return ERR_PTR(ret);
  }


 From linux@linux.site Thu Dec 10 21:25:46 2009
 Message-Id: <20091211052545.995802406@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:23 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [11/34] ext4: make sure directory and symlink blocks are revoked
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0007-ext4-make-sure-directory-and-symlink-blocks-are-revo.patch
 Content-Length: 2052
 Lines: 58

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 50689696867d95b38d9c7be640a311494a04fb86)

 When an inode gets unlinked, the functions ext4_clear_blocks() and
 ext4_remove_blocks() call ext4_forget() for all the buffer heads
 corresponding to the deleted inode's data blocks.  If the inode is a
 directory or a symlink, the is_metadata parameter must be non-zero so
 ext4_forget() will revoke them via jbd2_journal_revoke().  Otherwise,
 if these blocks are reused for a data file, and the system crashes
 before a journal checkpoint, the journal replay could end up
 corrupting these data blocks.

 Thanks to Curt Wohlgemuth for pointing out potential problems in this
 area.

 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/extents.c |    2 +-
  fs/ext4/inode.c   |    6 ++++--
  2 files changed, 5 insertions(+), 3 deletions(-)

 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -2074,7 +2074,7 @@ static int ext4_remove_blocks(handle_t *
  		ext_debug("free last %u blocks starting %llu\n", num, start);
  		for (i = 0; i < num; i++) {
  			bh = sb_find_get_block(inode->i_sb, start + i);
 -			ext4_forget(handle, 0, inode, bh, start + i);
 +			ext4_forget(handle, metadata, inode, bh, start + i);
  		}
  		ext4_free_blocks(handle, inode, start, num, metadata);
  	} else if (from == le32_to_cpu(ex->ee_block)
 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -4120,6 +4120,8 @@ static void ext4_clear_blocks(handle_t *
  			      __le32 *last)
  {
  	__le32 *p;
 +	int	is_metadata = S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode);
 +
  	if (try_to_extend_transaction(handle, inode)) {
  		if (bh) {
  			BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
 @@ -4150,11 +4152,11 @@ static void ext4_clear_blocks(handle_t *

  			*p = 0;
  			tbh = sb_find_get_block(inode->i_sb, nr);
 -			ext4_forget(handle, 0, inode, tbh, nr);
 +			ext4_forget(handle, is_metadata, inode, tbh, nr);
  		}
  	}

 -	ext4_free_blocks(handle, inode, block_to_free, count, 0);
 +	ext4_free_blocks(handle, inode, block_to_free, count, is_metadata);
  }

  /**


 From linux@linux.site Thu Dec 10 21:25:47 2009
 Message-Id: <20091211052546.544464652@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:24 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Julia Lawall <julia@diku.dk>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [12/34] ext4: fix i_flags access in ext4_da_writepages_trans_blocks()
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0008-ext4-fix-i_flags-access-in-ext4_da_writepages_trans_.patch
 Content-Length: 846
 Lines: 25

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 30c6e07a92ea4cb87160d32ffa9bce172576ae4c)

 We need to be testing the i_flags field in the ext4 specific portion
 of the inode, instead of the (confusingly aliased) i_flags field in
 the generic struct inode.

 Signed-off-by: Julia Lawall <julia@diku.dk>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/inode.c |    2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -2788,7 +2788,7 @@ static int ext4_da_writepages_trans_bloc
  	 * number of contiguous block. So we will limit
  	 * number of contiguous block to a sane value
  	 */
 -	if (!(inode->i_flags & EXT4_EXTENTS_FL) &&
 +	if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) &&
  	    (max_blocks > EXT4_MAX_TRANS_DATA))
  		max_blocks = EXT4_MAX_TRANS_DATA;


 From linux@linux.site Thu Dec 10 21:25:47 2009
 Message-Id: <20091211052547.065677730@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:25 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Eric Sandeen <sandeen@redhat.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [13/34] ext4: journal all modifications in ext4_xattr_set_handle
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0009-ext4-journal-all-modifications-in-ext4_xattr_set_han.patch
 Content-Length: 1254
 Lines: 39

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 86ebfd08a1930ccedb8eac0aeb1ed4b8b6a41dbc)

 ext4_xattr_set_handle() was zeroing out an inode outside
 of journaling constraints; this is one of the accesses that
 was causing the crc errors in journal replay as seen in
 kernel.org bugzilla #14354.

 Reviewed-by: Andreas Dilger <adilger@sun.com>
 Signed-off-by: Eric Sandeen <sandeen@redhat.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/xattr.c |    7 ++++---
  1 file changed, 4 insertions(+), 3 deletions(-)

 --- a/fs/ext4/xattr.c
 +++ b/fs/ext4/xattr.c
 @@ -988,6 +988,10 @@ ext4_xattr_set_handle(handle_t *handle,
  	if (error)
  		goto cleanup;

 +	error = ext4_journal_get_write_access(handle, is.iloc.bh);
 +	if (error)
 +		goto cleanup;
 +
  	if (EXT4_I(inode)->i_state & EXT4_STATE_NEW) {
  		struct ext4_inode *raw_inode = ext4_raw_inode(&is.iloc);
  		memset(raw_inode, 0, EXT4_SB(inode->i_sb)->s_inode_size);
 @@ -1013,9 +1017,6 @@ ext4_xattr_set_handle(handle_t *handle,
  		if (flags & XATTR_CREATE)
  			goto cleanup;
  	}
 -	error = ext4_journal_get_write_access(handle, is.iloc.bh);
 -	if (error)
 -		goto cleanup;
  	if (!value) {
  		if (!is.s.not_found)
  			error = ext4_xattr_ibody_set(handle, inode, &i, &is);


 From linux@linux.site Thu Dec 10 21:25:48 2009
 Message-Id: <20091211052547.644399594@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:26 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [14/34] ext4: dont update the superblock in ext4_statfs()
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0010-ext4-don-t-update-the-superblock-in-ext4_statfs.patch
 Content-Length: 1341
 Lines: 31

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 3f8fb9490efbd300887470a2a880a64e04dcc3f5)

 commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext4_statfs()
 to update the on-disk superblock counters, but modified this buffer
 directly without any journaling of the change.  This is one of the
 accesses that was causing the crc errors in journal replay as seen in
 kernel.org bugzilla #14354.

 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/super.c |    2 --
  1 file changed, 2 deletions(-)

 --- a/fs/ext4/super.c
 +++ b/fs/ext4/super.c
 @@ -3668,13 +3668,11 @@ static int ext4_statfs(struct dentry *de
  	buf->f_blocks = ext4_blocks_count(es) - sbi->s_overhead_last;
  	buf->f_bfree = percpu_counter_sum_positive(&sbi->s_freeblocks_counter) -
  		       percpu_counter_sum_positive(&sbi->s_dirtyblocks_counter);
 -	ext4_free_blocks_count_set(es, buf->f_bfree);
  	buf->f_bavail = buf->f_bfree - ext4_r_blocks_count(es);
  	if (buf->f_bfree < ext4_r_blocks_count(es))
  		buf->f_bavail = 0;
  	buf->f_files = le32_to_cpu(es->s_inodes_count);
  	buf->f_ffree = percpu_counter_sum_positive(&sbi->s_freeinodes_counter);
 -	es->s_free_inodes_count = cpu_to_le32(buf->f_ffree);
  	buf->f_namelen = EXT4_NAME_LEN;
  	fsid = le64_to_cpup((void *)es->s_uuid) ^
  	       le64_to_cpup((void *)es->s_uuid + sizeof(u64));


 From linux@linux.site Thu Dec 10 21:25:48 2009
 Message-Id: <20091211052548.201782286@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:27 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [15/34] ext4: fix uninit block bitmap initialization when s_meta_first_bg is non-zero
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0011-ext4-fix-uninit-block-bitmap-initialization-when-s_m.patch
 Content-Length: 875
 Lines: 29

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 8dadb198cb70ef811916668fe67eeec82e8858dd)

 The number of old-style block group descriptor blocks is
 s_meta_first_bg when the meta_bg feature flag is set.

 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/balloc.c |    8 +++++++-
  1 file changed, 7 insertions(+), 1 deletion(-)

 --- a/fs/ext4/balloc.c
 +++ b/fs/ext4/balloc.c
 @@ -761,7 +761,13 @@ static unsigned long ext4_bg_num_gdb_met
  static unsigned long ext4_bg_num_gdb_nometa(struct super_block *sb,
  					ext4_group_t group)
  {
 -	return ext4_bg_has_super(sb, group) ? EXT4_SB(sb)->s_gdb_count : 0;
 +	if (!ext4_bg_has_super(sb, group))
 +		return 0;
 +
 +	if (EXT4_HAS_INCOMPAT_FEATURE(sb,EXT4_FEATURE_INCOMPAT_META_BG))
 +		return le32_to_cpu(EXT4_SB(sb)->s_es->s_first_meta_bg);
 +	else
 +		return EXT4_SB(sb)->s_gdb_count;
  }

  /**


 From linux@linux.site Thu Dec 10 21:25:49 2009
 Message-Id: <20091211052548.726431621@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:28 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [16/34] ext4: fix block validity checks so they work correctly with meta_bg
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0012-ext4-fix-block-validity-checks-so-they-work-correctl.patch
 Content-Length: 1411
 Lines: 39

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 1032988c71f3f85483b2b4319684d1205a704c02)

 The block validity checks used by ext4_data_block_valid() wasn't
 correctly written to check file systems with the meta_bg feature.  Fix
 this.

 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/block_validity.c |    2 +-
  fs/ext4/inode.c          |    5 +----
  2 files changed, 2 insertions(+), 5 deletions(-)

 --- a/fs/ext4/block_validity.c
 +++ b/fs/ext4/block_validity.c
 @@ -160,7 +160,7 @@ int ext4_setup_system_zone(struct super_
  		if (ext4_bg_has_super(sb, i) &&
  		    ((i < 5) || ((i % flex_size) == 0)))
  			add_system_zone(sbi, ext4_group_first_block_no(sb, i),
 -					sbi->s_gdb_count + 1);
 +					ext4_bg_num_gdb(sb, i) + 1);
  		gdp = ext4_get_group_desc(sb, i, NULL);
  		ret = add_system_zone(sbi, ext4_block_bitmap(sb, gdp), 1);
  		if (ret)
 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -4883,10 +4883,7 @@ struct inode *ext4_iget(struct super_blo

  	ret = 0;
  	if (ei->i_file_acl &&
 -	    ((ei->i_file_acl <
 -	      (le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block) +
 -	       EXT4_SB(sb)->s_gdb_count)) ||
 -	     (ei->i_file_acl >= ext4_blocks_count(EXT4_SB(sb)->s_es)))) {
 +	    !ext4_data_block_valid(EXT4_SB(sb), ei->i_file_acl, 1)) {
  		ext4_error(sb, __func__,
  			   "bad extended attribute block %llu in inode #%lu",
  			   ei->i_file_acl, inode->i_ino);


 From linux@linux.site Thu Dec 10 21:25:49 2009
 Message-Id: <20091211052549.341684525@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:29 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  "Theodore Tso" <tytso@mit.edu>,
  Jan Kara <jack@suse.cz>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [17/34] ext4: avoid issuing unnecessary barriers
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0013-ext4-avoid-issuing-unnecessary-barriers.patch
 Content-Length: 1115
 Lines: 37

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 6b17d902fdd241adfa4ce780df20547b28bf5801)

 We don't to issue an I/O barrier on an error or if we force commit
 because we are doing data journaling.

 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Cc: Jan Kara <jack@suse.cz>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/fsync.c |    8 +++-----
  1 file changed, 3 insertions(+), 5 deletions(-)

 --- a/fs/ext4/fsync.c
 +++ b/fs/ext4/fsync.c
 @@ -60,7 +60,7 @@ int ext4_sync_file(struct file *file, st

  	ret = flush_aio_dio_completed_IO(inode);
  	if (ret < 0)
 -		goto out;
 +		return ret;
  	/*
  	 * data=writeback:
  	 *  The caller's filemap_fdatawrite()/wait will sync the data.
 @@ -79,10 +79,8 @@ int ext4_sync_file(struct file *file, st
  	 *  (they were dirtied by commit).  But that's OK - the blocks are
  	 *  safe in-journal, which is all fsync() needs to ensure.
  	 */
 -	if (ext4_should_journal_data(inode)) {
 -		ret = ext4_force_commit(inode->i_sb);
 -		goto out;
 -	}
 +	if (ext4_should_journal_data(inode))
 +		return ext4_force_commit(inode->i_sb);

  	if (!journal)
  		ret = sync_mapping_buffers(inode->i_mapping);


 From linux@linux.site Thu Dec 10 21:25:50 2009
 Message-Id: <20091211052549.883933582@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:30 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Jan Kara <jack@suse.cz>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [18/34] ext4: fix error handling in ext4_ind_get_blocks()
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0014-ext4-fix-error-handling-in-ext4_ind_get_blocks.patch
 Content-Length: 733
 Lines: 25

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 2bba702d4f88d7b010ec37e2527b552588404ae7)

 When an error happened in ext4_splice_branch we failed to notice that
 in ext4_ind_get_blocks and mapped the buffer anyway. Fix the problem
 by checking for error properly.

 Signed-off-by: Jan Kara <jack@suse.cz>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/inode.c |    2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -1021,7 +1021,7 @@ static int ext4_ind_get_blocks(handle_t
  	if (!err)
  		err = ext4_splice_branch(handle, inode, iblock,
  					 partial, indirect_blks, count);
 -	else
 +	if (err)
  		goto cleanup;

  	set_buffer_new(bh_result);


 From linux@linux.site Thu Dec 10 21:25:50 2009
 Message-Id: <20091211052550.441771142@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:31 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Eric Sandeen <sandeen@redhat.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [19/34] ext4: make trim/discard optional (and off by default)
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0015-ext4-make-trim-discard-optional-and-off-by-default.patch
 Content-Length: 4275
 Lines: 124

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 5328e635315734d42080de9a5a1ee87bf4cae0a4)

 It is anticipated that when sb_issue_discard starts doing
 real work on trim-capable devices, we may see issues.  Make
 this mount-time optional, and default it to off until we know
 that things are working out OK.

 Signed-off-by: Eric Sandeen <sandeen@redhat.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  Documentation/filesystems/ext4.txt |    6 ++++++
  fs/ext4/ext4.h                     |    1 +
  fs/ext4/mballoc.c                  |   21 +++++++++++++--------
  fs/ext4/super.c                    |   14 +++++++++++++-
  4 files changed, 33 insertions(+), 9 deletions(-)

 --- a/Documentation/filesystems/ext4.txt
 +++ b/Documentation/filesystems/ext4.txt
 @@ -353,6 +353,12 @@ noauto_da_alloc		replacing existing file
  			system crashes before the delayed allocation
  			blocks are forced to disk.

 +discard		Controls whether ext4 should issue discard/TRIM
 +nodiscard(*)		commands to the underlying block device when
 +			blocks are freed.  This is useful for SSD devices
 +			and sparse/thinly-provisioned LUNs, but it is off
 +			by default until sufficient testing has been done.
 +
  Data Mode
  =========
  There are 3 different data modes:
 --- a/fs/ext4/ext4.h
 +++ b/fs/ext4/ext4.h
 @@ -750,6 +750,7 @@ struct ext4_inode_info {
  #define EXT4_MOUNT_DELALLOC		0x8000000 /* Delalloc support */
  #define EXT4_MOUNT_DATA_ERR_ABORT	0x10000000 /* Abort on file data write */
  #define EXT4_MOUNT_BLOCK_VALIDITY	0x20000000 /* Block validity checking */
 +#define EXT4_MOUNT_DISCARD		0x40000000 /* Issue DISCARD requests */

  #define clear_opt(o, opt)		o &= ~EXT4_MOUNT_##opt
  #define set_opt(o, opt)			o |= EXT4_MOUNT_##opt
 --- a/fs/ext4/mballoc.c
 +++ b/fs/ext4/mballoc.c
 @@ -2529,7 +2529,6 @@ static void release_blocks_on_commit(jou
  	struct ext4_group_info *db;
  	int err, count = 0, count2 = 0;
  	struct ext4_free_data *entry;
 -	ext4_fsblk_t discard_block;
  	struct list_head *l, *ltmp;

  	list_for_each_safe(l, ltmp, &txn->t_private_list) {
 @@ -2559,13 +2558,19 @@ static void release_blocks_on_commit(jou
  			page_cache_release(e4b.bd_bitmap_page);
  		}
  		ext4_unlock_group(sb, entry->group);
 -		discard_block = (ext4_fsblk_t) entry->group * EXT4_BLOCKS_PER_GROUP(sb)
 -			+ entry->start_blk
 -			+ le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
 -		trace_ext4_discard_blocks(sb, (unsigned long long)discard_block,
 -					  entry->count);
 -		sb_issue_discard(sb, discard_block, entry->count);
 -
 +		if (test_opt(sb, DISCARD)) {
 +			ext4_fsblk_t discard_block;
 +			struct ext4_super_block *es = EXT4_SB(sb)->s_es;
 +
 +			discard_block = (ext4_fsblk_t)entry->group *
 +						EXT4_BLOCKS_PER_GROUP(sb)
 +					+ entry->start_blk
 +					+ le32_to_cpu(es->s_first_data_block);
 +			trace_ext4_discard_blocks(sb,
 +					(unsigned long long)discard_block,
 +					entry->count);
 +			sb_issue_discard(sb, discard_block, entry->count);
 +		}
  		kmem_cache_free(ext4_free_ext_cachep, entry);
  		ext4_mb_release_desc(&e4b);
  	}
 --- a/fs/ext4/super.c
 +++ b/fs/ext4/super.c
 @@ -899,6 +899,9 @@ static int ext4_show_options(struct seq_
  	if (test_opt(sb, NO_AUTO_DA_ALLOC))
  		seq_puts(seq, ",noauto_da_alloc");

 +	if (test_opt(sb, DISCARD))
 +		seq_puts(seq, ",discard");
 +
  	ext4_show_quota_options(seq, sb);

  	return 0;
 @@ -1079,7 +1082,8 @@ enum {
  	Opt_usrquota, Opt_grpquota, Opt_i_version,
  	Opt_stripe, Opt_delalloc, Opt_nodelalloc,
  	Opt_block_validity, Opt_noblock_validity,
 -	Opt_inode_readahead_blks, Opt_journal_ioprio
 +	Opt_inode_readahead_blks, Opt_journal_ioprio,
 +	Opt_discard, Opt_nodiscard,
  };

  static const match_table_t tokens = {
 @@ -1144,6 +1148,8 @@ static const match_table_t tokens = {
  	{Opt_auto_da_alloc, "auto_da_alloc=%u"},
  	{Opt_auto_da_alloc, "auto_da_alloc"},
  	{Opt_noauto_da_alloc, "noauto_da_alloc"},
 +	{Opt_discard, "discard"},
 +	{Opt_nodiscard, "nodiscard"},
  	{Opt_err, NULL},
  };

 @@ -1565,6 +1571,12 @@ set_qf_format:
  			else
  				set_opt(sbi->s_mount_opt,NO_AUTO_DA_ALLOC);
  			break;
 +		case Opt_discard:
 +			set_opt(sbi->s_mount_opt, DISCARD);
 +			break;
 +		case Opt_nodiscard:
 +			clear_opt(sbi->s_mount_opt, DISCARD);
 +			break;
  		default:
  			ext4_msg(sb, KERN_ERR,
  			       "Unrecognized mount option \"%s\" "


 From linux@linux.site Thu Dec 10 21:25:51 2009
 Message-Id: <20091211052551.004437667@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:32 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Eric Sandeen <sandeen@redhat.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [20/34] ext4: make "norecovery" an alias for "noload"
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0016-ext4-make-norecovery-an-alias-for-noload.patch
 Content-Length: 1856
 Lines: 53

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit e3bb52ae2bb9573e84c17b8e3560378d13a5c798)

 Users on the linux-ext4 list recently complained about differences
 across filesystems w.r.t. how to mount without a journal replay.

 In the discussion it was noted that xfs's "norecovery" option is
 perhaps more descriptively accurate than "noload," so let's make
 that an alias for ext4.

 Also show this status in /proc/mounts

 Signed-off-by: Eric Sandeen <sandeen@redhat.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  Documentation/filesystems/ext4.txt |    4 ++--
  fs/ext4/super.c                    |    4 ++++
  2 files changed, 6 insertions(+), 2 deletions(-)

 --- a/Documentation/filesystems/ext4.txt
 +++ b/Documentation/filesystems/ext4.txt
 @@ -153,8 +153,8 @@ journal_dev=devnum	When the external jou
  			identified through its new major/minor numbers encoded
  			in devnum.

 -noload			Don't load the journal on mounting.  Note that
 -                     	if the filesystem was not unmounted cleanly,
 +norecovery		Don't load the journal on mounting.  Note that
 +noload			if the filesystem was not unmounted cleanly,
                       	skipping the journal replay will lead to the
                       	filesystem containing inconsistencies that can
                       	lead to any number of problems.
 --- a/fs/ext4/super.c
 +++ b/fs/ext4/super.c
 @@ -902,6 +902,9 @@ static int ext4_show_options(struct seq_
  	if (test_opt(sb, DISCARD))
  		seq_puts(seq, ",discard");

 +	if (test_opt(sb, NOLOAD))
 +		seq_puts(seq, ",norecovery");
 +
  	ext4_show_quota_options(seq, sb);

  	return 0;
 @@ -1108,6 +1111,7 @@ static const match_table_t tokens = {
  	{Opt_acl, "acl"},
  	{Opt_noacl, "noacl"},
  	{Opt_noload, "noload"},
 +	{Opt_noload, "norecovery"},
  	{Opt_nobh, "nobh"},
  	{Opt_bh, "bh"},
  	{Opt_commit, "commit=%u"},


 From linux@linux.site Thu Dec 10 21:25:52 2009
 Message-Id: <20091211052551.564396025@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:33 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Akira Fujita <a-fujita@rs.jp.nec.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [21/34] ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0017-ext4-Fix-double-free-of-blocks-with-EXT4_IOC_MOVE_EX.patch
 Content-Length: 2565
 Lines: 75

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 94d7c16cbbbd0e03841fcf272bcaf0620ad39618)

 At the beginning of ext4_move_extent(), we call
 ext4_discard_preallocations() to discard inode PAs of orig and donor
 inodes.  But in the following case, blocks can be double freed, so
 move ext4_discard_preallocations() to the end of ext4_move_extents().

 1. Discard inode PAs of orig and donor inodes with
    ext4_discard_preallocations() in ext4_move_extents().

    orig : [ DATA1 ]
    donor: [ DATA2 ]

 2. While data blocks are exchanging between orig and donor inodes, new
    inode PAs is created to orig by other process's block allocation.
    (Since there are semaphore gaps in ext4_move_extents().)  And new
    inode PAs is used partially (2-1).

    2-1 Create new inode PAs to orig inode
    orig : [ DATA1 | used PA1 | free PA1 ]
    donor: [ DATA2 ]

 3. Donor inode which has old orig inode's blocks is deleted after
    EXT4_IOC_MOVE_EXT finished (3-1, 3-2).  So the block bitmap
    corresponds to old orig inode's blocks are freed.

    3-1 After EXT4_IOC_MOVE_EXT finished
    orig : [ DATA2 |  free PA1 ]
    donor: [ DATA1 |  used PA1 ]

    3-2 Delete donor inode
    orig : [ DATA2 |  free PA1 ]
    donor: [ FREE SPACE(DATA1) | FREE SPACE(used PA1) ]

 4. The double-free of blocks is occurred, when close() is called to
    orig inode.  Because ext4_discard_preallocations() for orig inode
    frees used PA1 and free PA1, though used PA1 is already freed in 3.

    4-1 Double-free of blocks is occurred
    orig : [ DATA2 |  FREE SPACE(free PA1) ]
    donor: [ FREE SPACE(DATA1) | DOUBLE FREE(used PA1) ]

 Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/move_extent.c |    9 +++++----
  1 file changed, 5 insertions(+), 4 deletions(-)

 --- a/fs/ext4/move_extent.c
 +++ b/fs/ext4/move_extent.c
 @@ -1289,10 +1289,6 @@ ext4_move_extents(struct file *o_filp, s
  			 ext4_ext_get_actual_len(ext_cur), block_end + 1) -
  		     max(le32_to_cpu(ext_cur->ee_block), block_start);

 -	/* Discard preallocations of two inodes */
 -	ext4_discard_preallocations(orig_inode);
 -	ext4_discard_preallocations(donor_inode);
 -
  	while (!last_extent && le32_to_cpu(ext_cur->ee_block) <= block_end) {
  		seq_blocks += add_blocks;

 @@ -1410,6 +1406,11 @@ ext4_move_extents(struct file *o_filp, s

  	}
  out:
 +	if (*moved_len) {
 +		ext4_discard_preallocations(orig_inode);
 +		ext4_discard_preallocations(donor_inode);
 +	}
 +
  	if (orig_path) {
  		ext4_ext_drop_refs(orig_path);
  		kfree(orig_path);


 From linux@linux.site Thu Dec 10 21:25:52 2009
 Message-Id: <20091211052552.134440580@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:34 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Kazuya Mio <k-mio@sx.jp.nec.com>,
  Akira Fujita <a-fujita@rs.jp.nec.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [22/34] ext4: initialize moved_len before calling ext4_move_extents()
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0018-ext4-initialize-moved_len-before-calling-ext4_move_e.patch
 Content-Length: 2439
 Lines: 72

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 446aaa6e7e993b38a6f21c6acfa68f3f1af3dbe3)

 The move_extent.moved_len is used to pass back the number of exchanged
 blocks count to user space.  Currently the caller must clear this
 field; but we spend more code space checking for this requirement than
 simply zeroing the field ourselves, so let's just make life easier for
 everyone all around.

 Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
 Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/ioctl.c       |    1 +
  fs/ext4/move_extent.c |   14 +++-----------
  2 files changed, 4 insertions(+), 11 deletions(-)

 --- a/fs/ext4/ioctl.c
 +++ b/fs/ext4/ioctl.c
 @@ -239,6 +239,7 @@ setversion_out:
  			}
  		}

 +		me.moved_len = 0;
  		err = ext4_move_extents(filp, donor_filp, me.orig_start,
  					me.donor_start, me.len, &me.moved_len);
  		fput(donor_filp);
 --- a/fs/ext4/move_extent.c
 +++ b/fs/ext4/move_extent.c
 @@ -947,7 +947,6 @@ out2:
   * @orig_start:		logical start offset in block for orig
   * @donor_start:	logical start offset in block for donor
   * @len:		the number of blocks to be moved
 - * @moved_len:		moved block length
   *
   * Check the arguments of ext4_move_extents() whether the files can be
   * exchanged with each other.
 @@ -955,8 +954,8 @@ out2:
   */
  static int
  mext_check_arguments(struct inode *orig_inode,
 -			  struct inode *donor_inode, __u64 orig_start,
 -			  __u64 donor_start, __u64 *len, __u64 moved_len)
 +		     struct inode *donor_inode, __u64 orig_start,
 +		     __u64 donor_start, __u64 *len)
  {
  	ext4_lblk_t orig_blocks, donor_blocks;
  	unsigned int blkbits = orig_inode->i_blkbits;
 @@ -1010,13 +1009,6 @@ mext_check_arguments(struct inode *orig_
  		return -EINVAL;
  	}

 -	if (moved_len) {
 -		ext4_debug("ext4 move extent: moved_len should be 0 "
 -			"[ino:orig %lu, donor %lu]\n", orig_inode->i_ino,
 -			donor_inode->i_ino);
 -		return -EINVAL;
 -	}
 -
  	if ((orig_start > EXT_MAX_BLOCK) ||
  	    (donor_start > EXT_MAX_BLOCK) ||
  	    (*len > EXT_MAX_BLOCK) ||
 @@ -1226,7 +1218,7 @@ ext4_move_extents(struct file *o_filp, s
  	double_down_write_data_sem(orig_inode, donor_inode);
  	/* Check the filesystem environment whether move_extent can be done */
  	ret1 = mext_check_arguments(orig_inode, donor_inode, orig_start,
 -					donor_start, &len, *moved_len);
 +				    donor_start, &len);
  	if (ret1)
  		goto out;


 From linux@linux.site Thu Dec 10 21:25:53 2009
 Message-Id: <20091211052552.682377360@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:35 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Akira Fujita <a-fujita@rs.jp.nec.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [23/34] ext4: move_extent_per_page() cleanup
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0019-ext4-move_extent_per_page-cleanup.patch
 Content-Length: 2733
 Lines: 87

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit ac48b0a1d068887141581bea8285de5fcab182b0)

 Integrate duplicate lines (acquire/release semaphore and invalidate
 extent cache in move_extent_per_page()) into mext_replace_branches(),
 to reduce source and object code size.

 Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/move_extent.c |   30 +++++++++---------------------
  1 file changed, 9 insertions(+), 21 deletions(-)

 --- a/fs/ext4/move_extent.c
 +++ b/fs/ext4/move_extent.c
 @@ -660,6 +660,9 @@ mext_replace_branches(handle_t *handle,
  	int replaced_count = 0;
  	int dext_alen;

 +	/* Protect extent trees against block allocations via delalloc */
 +	double_down_write_data_sem(orig_inode, donor_inode);
 +
  	/* Get the original extent for the block "orig_off" */
  	*err = get_ext_path(orig_inode, orig_off, &orig_path);
  	if (*err)
 @@ -755,6 +758,11 @@ out:
  		kfree(donor_path);
  	}

 +	ext4_ext_invalidate_cache(orig_inode);
 +	ext4_ext_invalidate_cache(donor_inode);
 +
 +	double_up_write_data_sem(orig_inode, donor_inode);
 +
  	return replaced_count;
  }

 @@ -820,19 +828,9 @@ move_extent_per_page(struct file *o_filp
  	 * Just swap data blocks between orig and donor.
  	 */
  	if (uninit) {
 -		/*
 -		 * Protect extent trees against block allocations
 -		 * via delalloc
 -		 */
 -		double_down_write_data_sem(orig_inode, donor_inode);
  		replaced_count = mext_replace_branches(handle, orig_inode,
  						donor_inode, orig_blk_offset,
  						block_len_in_page, err);
 -
 -		/* Clear the inode cache not to refer to the old data */
 -		ext4_ext_invalidate_cache(orig_inode);
 -		ext4_ext_invalidate_cache(donor_inode);
 -		double_up_write_data_sem(orig_inode, donor_inode);
  		goto out2;
  	}

 @@ -880,8 +878,6 @@ move_extent_per_page(struct file *o_filp
  	/* Release old bh and drop refs */
  	try_to_release_page(page, 0);

 -	/* Protect extent trees against block allocations via delalloc */
 -	double_down_write_data_sem(orig_inode, donor_inode);
  	replaced_count = mext_replace_branches(handle, orig_inode, donor_inode,
  					orig_blk_offset, block_len_in_page,
  					&err2);
 @@ -890,18 +886,10 @@ move_extent_per_page(struct file *o_filp
  			block_len_in_page = replaced_count;
  			replaced_size =
  				block_len_in_page << orig_inode->i_blkbits;
 -		} else {
 -			double_up_write_data_sem(orig_inode, donor_inode);
 +		} else
  			goto out;
 -		}
  	}

 -	/* Clear the inode cache not to refer to the old data */
 -	ext4_ext_invalidate_cache(orig_inode);
 -	ext4_ext_invalidate_cache(donor_inode);
 -
 -	double_up_write_data_sem(orig_inode, donor_inode);
 -
  	if (!page_has_buffers(page))
  		create_empty_buffers(page, 1 << orig_inode->i_blkbits, 0);


 From linux@linux.site Thu Dec 10 21:25:53 2009
 Message-Id: <20091211052553.196951652@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:36 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [24/34] jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer()
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0020-jbd2-Add-ENOMEM-checking-in-and-for-jbd2_journal_wri.patch
 Content-Length: 1035
 Lines: 38

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit e6ec116b67f46e0e7808276476554727b2e6240b)

 OOM happens.

 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/jbd2/commit.c  |    4 ++++
  fs/jbd2/journal.c |    4 ++++
  2 files changed, 8 insertions(+)

 --- a/fs/jbd2/commit.c
 +++ b/fs/jbd2/commit.c
 @@ -636,6 +636,10 @@ void jbd2_journal_commit_transaction(jou
  		JBUFFER_TRACE(jh, "ph3: write metadata");
  		flags = jbd2_journal_write_metadata_buffer(commit_transaction,
  						      jh, &new_jh, blocknr);
 +		if (flags < 0) {
 +			jbd2_journal_abort(journal, flags);
 +			continue;
 +		}
  		set_bit(BH_JWrite, &jh2bh(new_jh)->b_state);
  		wbuf[bufs++] = jh2bh(new_jh);

 --- a/fs/jbd2/journal.c
 +++ b/fs/jbd2/journal.c
 @@ -358,6 +358,10 @@ repeat:

  		jbd_unlock_bh_state(bh_in);
  		tmp = jbd2_alloc(bh_in->b_size, GFP_NOFS);
 +		if (!tmp) {
 +			jbd2_journal_put_journal_head(new_jh);
 +			return -ENOMEM;
 +		}
  		jbd_lock_bh_state(bh_in);
  		if (jh_in->b_frozen_data) {
  			jbd2_free(tmp, bh_in->b_size);


 From linux@linux.site Thu Dec 10 21:25:54 2009
 Message-Id: <20091211052553.749907435@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:37 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Roel Kluin <roel.kluin@gmail.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [25/34] ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks()
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0021-ext4-Return-the-PTR_ERR-of-the-correct-pointer-in-se.patch
 Content-Length: 595
 Lines: 21

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit c09eef305dd43846360944ad072f051f964fa383)

 Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/resize.c |    2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 --- a/fs/ext4/resize.c
 +++ b/fs/ext4/resize.c
 @@ -247,7 +247,7 @@ static int setup_new_group_blocks(struct
  			goto exit_bh;

  		if (IS_ERR(gdb = bclean(handle, sb, block))) {
 -			err = PTR_ERR(bh);
 +			err = PTR_ERR(gdb);
  			goto exit_bh;
  		}
  		ext4_handle_dirty_metadata(handle, NULL, gdb);


 From linux@linux.site Thu Dec 10 21:25:54 2009
 Message-Id: <20091211052554.355331485@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:38 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Jan Kara <jack@suse.cz>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [26/34] ext4: Avoid data / filesystem corruption when write fails to copy data
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0022-ext4-Avoid-data-filesystem-corruption-when-write-fai.patch
 Content-Length: 2923
 Lines: 84

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit b9a4207d5e911b938f73079a83cc2ae10524ec7f)

 When ext4_write_begin fails after allocating some blocks or
 generic_perform_write fails to copy data to write, we truncate blocks
 already instantiated beyond i_size.  Although these blocks were never
 inside i_size, we have to truncate the pagecache of these blocks so
 that corresponding buffers get unmapped.  Otherwise subsequent
 __block_prepare_write (called because we are retrying the write) will
 find the buffers mapped, not call ->get_block, and thus the page will
 be backed by already freed blocks leading to filesystem and data
 corruption.

 Signed-off-by: Jan Kara <jack@suse.cz>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/inode.c |   20 +++++++++++++++-----
  1 file changed, 15 insertions(+), 5 deletions(-)

 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -1534,6 +1534,16 @@ static int do_journal_get_write_access(h
  	return ext4_journal_get_write_access(handle, bh);
  }

 +/*
 + * Truncate blocks that were not used by write. We have to truncate the
 + * pagecache as well so that corresponding buffers get properly unmapped.
 + */
 +static void ext4_truncate_failed_write(struct inode *inode)
 +{
 +	truncate_inode_pages(inode->i_mapping, inode->i_size);
 +	ext4_truncate(inode);
 +}
 +
  static int ext4_write_begin(struct file *file, struct address_space *mapping,
  			    loff_t pos, unsigned len, unsigned flags,
  			    struct page **pagep, void **fsdata)
 @@ -1599,7 +1609,7 @@ retry:

  		ext4_journal_stop(handle);
  		if (pos + len > inode->i_size) {
 -			ext4_truncate(inode);
 +			ext4_truncate_failed_write(inode);
  			/*
  			 * If truncate failed early the inode might
  			 * still be on the orphan list; we need to
 @@ -1709,7 +1719,7 @@ static int ext4_ordered_write_end(struct
  		ret = ret2;

  	if (pos + len > inode->i_size) {
 -		ext4_truncate(inode);
 +		ext4_truncate_failed_write(inode);
  		/*
  		 * If truncate failed early the inode might still be
  		 * on the orphan list; we need to make sure the inode
 @@ -1751,7 +1761,7 @@ static int ext4_writeback_write_end(stru
  		ret = ret2;

  	if (pos + len > inode->i_size) {
 -		ext4_truncate(inode);
 +		ext4_truncate_failed_write(inode);
  		/*
  		 * If truncate failed early the inode might still be
  		 * on the orphan list; we need to make sure the inode
 @@ -1814,7 +1824,7 @@ static int ext4_journalled_write_end(str
  	if (!ret)
  		ret = ret2;
  	if (pos + len > inode->i_size) {
 -		ext4_truncate(inode);
 +		ext4_truncate_failed_write(inode);
  		/*
  		 * If truncate failed early the inode might still be
  		 * on the orphan list; we need to make sure the inode
 @@ -3091,7 +3101,7 @@ retry:
  		 * i_size_read because we hold i_mutex.
  		 */
  		if (pos + len > inode->i_size)
 -			ext4_truncate(inode);
 +			ext4_truncate_failed_write(inode);
  	}

  	if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))


 From linux@linux.site Thu Dec 10 21:25:55 2009
 Message-Id: <20091211052554.925382177@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:39 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Josef Bacik <josef@redhat.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [27/34] ext4: wait for log to commit when umounting
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0023-ext4-wait-for-log-to-commit-when-umounting.patch
 Content-Length: 1540
 Lines: 46

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit d4edac314e9ad0b21ba20ba8bc61b61f186f79e1)

 There is a potential race when a transaction is committing right when
 the file system is being umounting.  This could reduce in a race
 because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super
 before the commit code calls a callback so the mballoc code can
 release freed blocks in the transaction, resulting in a panic trying
 to access the freed s_group_info.

 The fix is to wait for the transaction to finish committing before we
 shutdown the multiblock allocator.

 Signed-off-by: Josef Bacik <josef@redhat.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/super.c |   10 ++++++----
  1 file changed, 6 insertions(+), 4 deletions(-)

 --- a/fs/ext4/super.c
 +++ b/fs/ext4/super.c
 @@ -603,10 +603,6 @@ static void ext4_put_super(struct super_
  	if (sb->s_dirt)
  		ext4_commit_super(sb, 1);

 -	ext4_release_system_zone(sb);
 -	ext4_mb_release(sb);
 -	ext4_ext_release(sb);
 -	ext4_xattr_put_super(sb);
  	if (sbi->s_journal) {
  		err = jbd2_journal_destroy(sbi->s_journal);
  		sbi->s_journal = NULL;
 @@ -614,6 +610,12 @@ static void ext4_put_super(struct super_
  			ext4_abort(sb, __func__,
  				   "Couldn't clean up the journal");
  	}
 +
 +	ext4_release_system_zone(sb);
 +	ext4_mb_release(sb);
 +	ext4_ext_release(sb);
 +	ext4_xattr_put_super(sb);
 +
  	if (!(sb->s_flags & MS_RDONLY)) {
  		EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
  		es->s_state = cpu_to_le16(sbi->s_mount_state);


 From linux@linux.site Thu Dec 10 21:25:55 2009
 Message-Id: <20091211052555.487338959@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:40 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Curt Wohlgemuth <curtw@google.com>,
  "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [28/34] ext4: remove blocks from inode prealloc list on failure
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0024-ext4-remove-blocks-from-inode-prealloc-list-on-failu.patch
 Content-Length: 1476
 Lines: 49

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit b844167edc7fcafda9623955c05e4c1b3c32ebc7)

 This fixes a leak of blocks in an inode prealloc list if device failures
 cause ext4_mb_mark_diskspace_used() to fail.

 Signed-off-by: Curt Wohlgemuth <curtw@google.com>
 Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/mballoc.c |   19 +++++++++++++++++++
  1 file changed, 19 insertions(+)

 --- a/fs/ext4/mballoc.c
 +++ b/fs/ext4/mballoc.c
 @@ -3011,6 +3011,24 @@ static void ext4_mb_collect_stats(struct
  }

  /*
 + * Called on failure; free up any blocks from the inode PA for this
 + * context.  We don't need this for MB_GROUP_PA because we only change
 + * pa_free in ext4_mb_release_context(), but on failure, we've already
 + * zeroed out ac->ac_b_ex.fe_len, so group_pa->pa_free is not changed.
 + */
 +static void ext4_discard_allocated_blocks(struct ext4_allocation_context *ac)
 +{
 +	struct ext4_prealloc_space *pa = ac->ac_pa;
 +	int len;
 +
 +	if (pa && pa->pa_type == MB_INODE_PA) {
 +		len = ac->ac_b_ex.fe_len;
 +		pa->pa_free += len;
 +	}
 +
 +}
 +
 +/*
   * use blocks preallocated to inode
   */
  static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
 @@ -4295,6 +4313,7 @@ repeat:
  			ac->ac_status = AC_STATUS_CONTINUE;
  			goto repeat;
  		} else if (*errp) {
 +			ext4_discard_allocated_blocks(ac);
  			ac->ac_b_ex.fe_len = 0;
  			ar->len = 0;
  			ext4_mb_show_ac(ac);


 From linux@linux.site Thu Dec 10 21:25:56 2009
 Message-Id: <20091211052556.043172197@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:41 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Dmitry Monakhov <dmonakhov@openvz.org>,
  Mingming Cao <cmm@us.ibm.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [29/34] ext4: ext4_get_reserved_space() must return bytes instead of blocks
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0025-ext4-ext4_get_reserved_space-must-return-bytes-inste.patch
 Content-Length: 718
 Lines: 23

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 8aa6790f876e81f5a2211fe1711a5fe3fe2d7b20)

 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
 Reviewed-by: Eric Sandeen <sandeen@redhat.com>
 Acked-by: Mingming Cao <cmm@us.ibm.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/inode.c |    2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -1052,7 +1052,7 @@ qsize_t ext4_get_reserved_space(struct i
  		EXT4_I(inode)->i_reserved_meta_blocks;
  	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);

 -	return total;
 +	return (total << inode->i_blkbits);
  }
  /*
   * Calculate the number of metadata blocks need to reserve


 From linux@linux.site Thu Dec 10 21:25:57 2009
 Message-Id: <20091211052556.560487193@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:42 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Dmitry Monakhov <dmonakhov@openvz.org>,
  Mingming Cao <cmm@us.ibm.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [30/34] ext4: quota macros cleanup
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0026-ext4-quota-macros-cleanup.patch
 Content-Length: 5167
 Lines: 138

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 5aca07eb7d8f14d90c740834d15ca15277f4820c)

 Currently all quota block reservation macros contains hard-coded "2"
 aka MAXQUOTAS value. This is no good because in some places it is not
 obvious to understand what does this digit represent. Let's introduce
 new macro with self descriptive name.

 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
 Acked-by: Mingming Cao <cmm@us.ibm.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/ext4_jbd2.h |    8 ++++++--
  fs/ext4/extents.c   |    2 +-
  fs/ext4/inode.c     |    2 +-
  fs/ext4/migrate.c   |    4 ++--
  fs/ext4/namei.c     |    8 ++++----
  5 files changed, 14 insertions(+), 10 deletions(-)

 --- a/fs/ext4/ext4_jbd2.h
 +++ b/fs/ext4/ext4_jbd2.h
 @@ -49,7 +49,7 @@

  #define EXT4_DATA_TRANS_BLOCKS(sb)	(EXT4_SINGLEDATA_TRANS_BLOCKS(sb) + \
  					 EXT4_XATTR_TRANS_BLOCKS - 2 + \
 -					 2*EXT4_QUOTA_TRANS_BLOCKS(sb))
 +					 EXT4_MAXQUOTAS_TRANS_BLOCKS(sb))

  /*
   * Define the number of metadata blocks we need to account to modify data.
 @@ -57,7 +57,7 @@
   * This include super block, inode block, quota blocks and xattr blocks
   */
  #define EXT4_META_TRANS_BLOCKS(sb)	(EXT4_XATTR_TRANS_BLOCKS + \
 -					2*EXT4_QUOTA_TRANS_BLOCKS(sb))
 +					EXT4_MAXQUOTAS_TRANS_BLOCKS(sb))

  /* Delete operations potentially hit one directory's namespace plus an
   * entire inode, plus arbitrary amounts of bitmap/indirection data.  Be
 @@ -92,6 +92,7 @@
   * but inode, sb and group updates are done only once */
  #define EXT4_QUOTA_INIT_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_INIT_ALLOC*\
  		(EXT4_SINGLEDATA_TRANS_BLOCKS(sb)-3)+3+DQUOT_INIT_REWRITE) : 0)
 +
  #define EXT4_QUOTA_DEL_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_DEL_ALLOC*\
  		(EXT4_SINGLEDATA_TRANS_BLOCKS(sb)-3)+3+DQUOT_DEL_REWRITE) : 0)
  #else
 @@ -99,6 +100,9 @@
  #define EXT4_QUOTA_INIT_BLOCKS(sb) 0
  #define EXT4_QUOTA_DEL_BLOCKS(sb) 0
  #endif
 +#define EXT4_MAXQUOTAS_TRANS_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_TRANS_BLOCKS(sb))
 +#define EXT4_MAXQUOTAS_INIT_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_INIT_BLOCKS(sb))
 +#define EXT4_MAXQUOTAS_DEL_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_DEL_BLOCKS(sb))

  int
  ext4_mark_iloc_dirty(handle_t *handle,
 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -2167,7 +2167,7 @@ ext4_ext_rm_leaf(handle_t *handle, struc
  			correct_index = 1;
  			credits += (ext_depth(inode)) + 1;
  		}
 -		credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
 +		credits += EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb);

  		err = ext4_ext_truncate_extend_restart(handle, inode, credits);
  		if (err)
 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -5231,7 +5231,7 @@ int ext4_setattr(struct dentry *dentry,

  		/* (user+group)*(old+new) structure, inode write (sb,
  		 * inode block, ? - but truncate inode update has it) */
 -		handle = ext4_journal_start(inode, 2*(EXT4_QUOTA_INIT_BLOCKS(inode->i_sb)+
 +		handle = ext4_journal_start(inode, (EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)+
  					EXT4_QUOTA_DEL_BLOCKS(inode->i_sb))+3);
  		if (IS_ERR(handle)) {
  			error = PTR_ERR(handle);
 --- a/fs/ext4/migrate.c
 +++ b/fs/ext4/migrate.c
 @@ -238,7 +238,7 @@ static int extend_credit_for_blkdel(hand
  	 * So allocate a credit of 3. We may update
  	 * quota (user and group).
  	 */
 -	needed = 3 + 2*EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
 +	needed = 3 + EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb);

  	if (ext4_journal_extend(handle, needed) != 0)
  		retval = ext4_journal_restart(handle, needed);
 @@ -477,7 +477,7 @@ int ext4_ext_migrate(struct inode *inode
  	handle = ext4_journal_start(inode,
  					EXT4_DATA_TRANS_BLOCKS(inode->i_sb) +
  					EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
 -					2 * EXT4_QUOTA_INIT_BLOCKS(inode->i_sb)
 +					EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)
  					+ 1);
  	if (IS_ERR(handle)) {
  		retval = PTR_ERR(handle);
 --- a/fs/ext4/namei.c
 +++ b/fs/ext4/namei.c
 @@ -1769,7 +1769,7 @@ static int ext4_create(struct inode *dir
  retry:
  	handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
  					EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
 -					2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb));
 +					EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
  	if (IS_ERR(handle))
  		return PTR_ERR(handle);

 @@ -1803,7 +1803,7 @@ static int ext4_mknod(struct inode *dir,
  retry:
  	handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
  					EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
 -					2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb));
 +					EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
  	if (IS_ERR(handle))
  		return PTR_ERR(handle);

 @@ -1840,7 +1840,7 @@ static int ext4_mkdir(struct inode *dir,
  retry:
  	handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
  					EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
 -					2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb));
 +					EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
  	if (IS_ERR(handle))
  		return PTR_ERR(handle);

 @@ -2253,7 +2253,7 @@ static int ext4_symlink(struct inode *di
  retry:
  	handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
  					EXT4_INDEX_EXTRA_TRANS_BLOCKS + 5 +
 -					2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb));
 +					EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
  	if (IS_ERR(handle))
  		return PTR_ERR(handle);


 From linux@linux.site Thu Dec 10 21:25:57 2009
 Message-Id: <20091211052557.153813326@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:43 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Dmitry Monakhov <dmonakhov@openvz.org>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [31/34] ext4: fix incorrect block reservation on quota transfer.
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0027-ext4-fix-incorrect-block-reservation-on-quota-transf.patch
 Content-Length: 1036
 Lines: 27

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 194074acacebc169ded90a4657193f5180015051)

 Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid
 This means that we may end-up with transferring all quotas. Add
 we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in
 case of QUOTA_INIT_BLOCKS.

 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
 Reviewed-by: Mingming Cao <cmm@us.ibm.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/inode.c |    2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -5232,7 +5232,7 @@ int ext4_setattr(struct dentry *dentry,
  		/* (user+group)*(old+new) structure, inode write (sb,
  		 * inode block, ? - but truncate inode update has it) */
  		handle = ext4_journal_start(inode, (EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)+
 -					EXT4_QUOTA_DEL_BLOCKS(inode->i_sb))+3);
 +					EXT4_MAXQUOTAS_DEL_BLOCKS(inode->i_sb))+3);
  		if (IS_ERR(handle)) {
  			error = PTR_ERR(handle);
  			goto err_out;


 From linux@linux.site Thu Dec 10 21:25:58 2009
 Message-Id: <20091211052557.723287400@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:44 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Jan Kara <jack@suse.cz>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [32/34] ext4: Wait for proper transaction commit on fsync
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0028-ext4-Wait-for-proper-transaction-commit-on-fsync.patch
 Content-Length: 7849
 Lines: 252

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit b436b9bef84de6893e86346d8fbf7104bc520645)

 We cannot rely on buffer dirty bits during fsync because pdflush can come
 before fsync is called and clear dirty bits without forcing a transaction
 commit. What we do is that we track which transaction has last changed
 the inode and which transaction last changed allocation and force it to
 disk on fsync.

 Signed-off-by: Jan Kara <jack@suse.cz>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/ext4.h      |    7 +++++++
  fs/ext4/ext4_jbd2.h |   13 +++++++++++++
  fs/ext4/extents.c   |   14 ++++++++++++--
  fs/ext4/fsync.c     |   46 +++++++++++++++++-----------------------------
  fs/ext4/inode.c     |   29 +++++++++++++++++++++++++++++
  fs/ext4/super.c     |    2 ++
  fs/jbd2/journal.c   |    1 +
  7 files changed, 81 insertions(+), 31 deletions(-)

 --- a/fs/ext4/ext4.h
 +++ b/fs/ext4/ext4.h
 @@ -703,6 +703,13 @@ struct ext4_inode_info {
  	struct list_head i_aio_dio_complete_list;
  	/* current io_end structure for async DIO write*/
  	ext4_io_end_t *cur_aio_dio;
 +
 +	/*
 +	 * Transactions that contain inode's metadata needed to complete
 +	 * fsync and fdatasync, respectively.
 +	 */
 +	tid_t i_sync_tid;
 +	tid_t i_datasync_tid;
  };

  /*
 --- a/fs/ext4/ext4_jbd2.h
 +++ b/fs/ext4/ext4_jbd2.h
 @@ -258,6 +258,19 @@ static inline int ext4_jbd2_file_inode(h
  	return 0;
  }

 +static inline void ext4_update_inode_fsync_trans(handle_t *handle,
 +						 struct inode *inode,
 +						 int datasync)
 +{
 +	struct ext4_inode_info *ei = EXT4_I(inode);
 +
 +	if (ext4_handle_valid(handle)) {
 +		ei->i_sync_tid = handle->h_transaction->t_tid;
 +		if (datasync)
 +			ei->i_datasync_tid = handle->h_transaction->t_tid;
 +	}
 +}
 +
  /* super.c */
  int ext4_force_commit(struct super_block *sb);

 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -3064,6 +3064,8 @@ ext4_ext_handle_uninitialized_extents(ha
  	if (flags == EXT4_GET_BLOCKS_DIO_CONVERT_EXT) {
  		ret = ext4_convert_unwritten_extents_dio(handle, inode,
  							path);
 +		if (ret >= 0)
 +			ext4_update_inode_fsync_trans(handle, inode, 1);
  		goto out2;
  	}
  	/* buffered IO case */
 @@ -3091,6 +3093,8 @@ ext4_ext_handle_uninitialized_extents(ha
  	ret = ext4_ext_convert_to_initialized(handle, inode,
  						path, iblock,
  						max_blocks);
 +	if (ret >= 0)
 +		ext4_update_inode_fsync_trans(handle, inode, 1);
  out:
  	if (ret <= 0) {
  		err = ret;
 @@ -3329,10 +3333,16 @@ int ext4_ext_get_blocks(handle_t *handle
  	allocated = ext4_ext_get_actual_len(&newex);
  	set_buffer_new(bh_result);

 -	/* Cache only when it is _not_ an uninitialized extent */
 -	if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0)
 +	/*
 +	 * Cache the extent and update transaction to commit on fdatasync only
 +	 * when it is _not_ an uninitialized extent.
 +	 */
 +	if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0) {
  		ext4_ext_put_in_cache(inode, iblock, allocated, newblock,
  						EXT4_EXT_CACHE_EXTENT);
 +		ext4_update_inode_fsync_trans(handle, inode, 1);
 +	} else
 +		ext4_update_inode_fsync_trans(handle, inode, 0);
  out:
  	if (allocated > max_blocks)
  		allocated = max_blocks;
 --- a/fs/ext4/fsync.c
 +++ b/fs/ext4/fsync.c
 @@ -51,25 +51,30 @@
  int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync)
  {
  	struct inode *inode = dentry->d_inode;
 +	struct ext4_inode_info *ei = EXT4_I(inode);
  	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
 -	int err, ret = 0;
 +	int ret;
 +	tid_t commit_tid;

  	J_ASSERT(ext4_journal_current_handle() == NULL);

  	trace_ext4_sync_file(file, dentry, datasync);

 +	if (inode->i_sb->s_flags & MS_RDONLY)
 +		return 0;
 +
  	ret = flush_aio_dio_completed_IO(inode);
  	if (ret < 0)
  		return ret;
 +
 +	if (!journal)
 +		return simple_fsync(file, dentry, datasync);
 +
  	/*
 -	 * data=writeback:
 +	 * data=writeback,ordered:
  	 *  The caller's filemap_fdatawrite()/wait will sync the data.
 -	 *  sync_inode() will sync the metadata
 -	 *
 -	 * data=ordered:
 -	 *  The caller's filemap_fdatawrite() will write the data and
 -	 *  sync_inode() will write the inode if it is dirty.  Then the caller's
 -	 *  filemap_fdatawait() will wait on the pages.
 +	 *  Metadata is in the journal, we wait for proper transaction to
 +	 *  commit here.
  	 *
  	 * data=journal:
  	 *  filemap_fdatawrite won't do anything (the buffers are clean).
 @@ -82,27 +87,10 @@ int ext4_sync_file(struct file *file, st
  	if (ext4_should_journal_data(inode))
  		return ext4_force_commit(inode->i_sb);

 -	if (!journal)
 -		ret = sync_mapping_buffers(inode->i_mapping);
 -
 -	if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
 -		goto out;
 -
 -	/*
 -	 * The VFS has written the file data.  If the inode is unaltered
 -	 * then we need not start a commit.
 -	 */
 -	if (inode->i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC)) {
 -		struct writeback_control wbc = {
 -			.sync_mode = WB_SYNC_ALL,
 -			.nr_to_write = 0, /* sys_fsync did this */
 -		};
 -		err = sync_inode(inode, &wbc);
 -		if (ret == 0)
 -			ret = err;
 -	}
 -out:
 -	if (journal && (journal->j_flags & JBD2_BARRIER))
 +	commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid;
 +	if (jbd2_log_start_commit(journal, commit_tid))
 +		jbd2_log_wait_commit(journal, commit_tid);
 +	else if (journal->j_flags & JBD2_BARRIER)
  		blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
  	return ret;
  }
 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -1025,6 +1025,8 @@ static int ext4_ind_get_blocks(handle_t
  		goto cleanup;

  	set_buffer_new(bh_result);
 +
 +	ext4_update_inode_fsync_trans(handle, inode, 1);
  got_it:
  	map_bh(bh_result, inode->i_sb, le32_to_cpu(chain[depth-1].key));
  	if (count > blocks_to_boundary)
 @@ -4794,6 +4796,7 @@ struct inode *ext4_iget(struct super_blo
  	struct ext4_inode *raw_inode;
  	struct ext4_inode_info *ei;
  	struct inode *inode;
 +	journal_t *journal = EXT4_SB(sb)->s_journal;
  	long ret;
  	int block;

 @@ -4858,6 +4861,31 @@ struct inode *ext4_iget(struct super_blo
  		ei->i_data[block] = raw_inode->i_block[block];
  	INIT_LIST_HEAD(&ei->i_orphan);

 +	/*
 +	 * Set transaction id's of transactions that have to be committed
 +	 * to finish f[data]sync. We set them to currently running transaction
 +	 * as we cannot be sure that the inode or some of its metadata isn't
 +	 * part of the transaction - the inode could have been reclaimed and
 +	 * now it is reread from disk.
 +	 */
 +	if (journal) {
 +		transaction_t *transaction;
 +		tid_t tid;
 +
 +		spin_lock(&journal->j_state_lock);
 +		if (journal->j_running_transaction)
 +			transaction = journal->j_running_transaction;
 +		else
 +			transaction = journal->j_committing_transaction;
 +		if (transaction)
 +			tid = transaction->t_tid;
 +		else
 +			tid = journal->j_commit_sequence;
 +		spin_unlock(&journal->j_state_lock);
 +		ei->i_sync_tid = tid;
 +		ei->i_datasync_tid = tid;
 +	}
 +
  	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
  		ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize);
  		if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize >
 @@ -5112,6 +5140,7 @@ static int ext4_do_update_inode(handle_t
  		err = rc;
  	ei->i_state &= ~EXT4_STATE_NEW;

 +	ext4_update_inode_fsync_trans(handle, inode, 0);
  out_brelse:
  	brelse(bh);
  	ext4_std_error(inode->i_sb, err);
 --- a/fs/ext4/super.c
 +++ b/fs/ext4/super.c
 @@ -706,6 +706,8 @@ static struct inode *ext4_alloc_inode(st
  	spin_lock_init(&(ei->i_block_reservation_lock));
  	INIT_LIST_HEAD(&ei->i_aio_dio_complete_list);
  	ei->cur_aio_dio = NULL;
 +	ei->i_sync_tid = 0;
 +	ei->i_datasync_tid = 0;

  	return &ei->vfs_inode;
  }
 --- a/fs/jbd2/journal.c
 +++ b/fs/jbd2/journal.c
 @@ -78,6 +78,7 @@ EXPORT_SYMBOL(jbd2_journal_errno);
  EXPORT_SYMBOL(jbd2_journal_ack_err);
  EXPORT_SYMBOL(jbd2_journal_clear_err);
  EXPORT_SYMBOL(jbd2_log_wait_commit);
 +EXPORT_SYMBOL(jbd2_log_start_commit);
  EXPORT_SYMBOL(jbd2_journal_start_commit);
  EXPORT_SYMBOL(jbd2_journal_force_commit_nested);
  EXPORT_SYMBOL(jbd2_journal_wipe);


 From linux@linux.site Thu Dec 10 21:25:58 2009
 Message-Id: <20091211052558.272572522@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:45 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  Akira Fujita <a-fujita@rs.jp.nec.com>,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [33/34] ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0029-ext4-Fix-insufficient-checks-in-EXT4_IOC_MOVE_EXT.patch
 Content-Length: 2732
 Lines: 94

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit 4a58579b9e4e2a35d57e6c9c8483e52f6f1b7fd6)

 This patch fixes three problems in the handling of the
 EXT4_IOC_MOVE_EXT ioctl:

 1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for
 original and donor files, but they allow the illegal write access to
 donor file, since donor file is overwritten by original file data.  To
 fix this problem, change access mode checks of original (r->r/w) and
 donor (r->w) files.

 2.  Disallow the use of donor files that have a setuid or setgid bits.

 3.  Call mnt_want_write() and mnt_drop_write() before and after
 ext4_move_extents() calling to get write access to a mount.

 Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/ioctl.c       |   30 ++++++++++++++++++------------
  fs/ext4/move_extent.c |    7 +++++++
  2 files changed, 25 insertions(+), 12 deletions(-)

 --- a/fs/ext4/ioctl.c
 +++ b/fs/ext4/ioctl.c
 @@ -221,32 +221,38 @@ setversion_out:
  		struct file *donor_filp;
  		int err;

 +		if (!(filp->f_mode & FMODE_READ) ||
 +		    !(filp->f_mode & FMODE_WRITE))
 +			return -EBADF;
 +
  		if (copy_from_user(&me,
  			(struct move_extent __user *)arg, sizeof(me)))
  			return -EFAULT;
 +		me.moved_len = 0;

  		donor_filp = fget(me.donor_fd);
  		if (!donor_filp)
  			return -EBADF;

 -		if (!capable(CAP_DAC_OVERRIDE)) {
 -			if ((current->real_cred->fsuid != inode->i_uid) ||
 -				!(inode->i_mode & S_IRUSR) ||
 -				!(donor_filp->f_dentry->d_inode->i_mode &
 -				S_IRUSR)) {
 -				fput(donor_filp);
 -				return -EACCES;
 -			}
 +		if (!(donor_filp->f_mode & FMODE_WRITE)) {
 +			err = -EBADF;
 +			goto mext_out;
  		}

 -		me.moved_len = 0;
 +		err = mnt_want_write(filp->f_path.mnt);
 +		if (err)
 +			goto mext_out;
 +
  		err = ext4_move_extents(filp, donor_filp, me.orig_start,
  					me.donor_start, me.len, &me.moved_len);
 -		fput(donor_filp);
 +		mnt_drop_write(filp->f_path.mnt);
 +		if (me.moved_len > 0)
 +			file_remove_suid(donor_filp);

  		if (copy_to_user((struct move_extent *)arg, &me, sizeof(me)))
 -			return -EFAULT;
 -
 +			err = -EFAULT;
 +mext_out:
 +		fput(donor_filp);
  		return err;
  	}

 --- a/fs/ext4/move_extent.c
 +++ b/fs/ext4/move_extent.c
 @@ -957,6 +957,13 @@ mext_check_arguments(struct inode *orig_
  		return -EINVAL;
  	}

 +	if (donor_inode->i_mode & (S_ISUID|S_ISGID)) {
 +		ext4_debug("ext4 move extent: suid or sgid is set"
 +			   " to donor file [ino:orig %lu, donor %lu]\n",
 +			   orig_inode->i_ino, donor_inode->i_ino);
 +		return -EINVAL;
 +	}
 +
  	/* Ext4 move extent does not support swapfile */
  	if (IS_SWAPFILE(orig_inode) || IS_SWAPFILE(donor_inode)) {
  		ext4_debug("ext4 move extent: The argument files should "


 From linux@linux.site Thu Dec 10 21:25:59 2009
 Message-Id: <20091211052558.863762484@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:46 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk,
  "Theodore Tso" <tytso@mit.edu>,
  Greg Kroah-Hartman <gregkh@suse.de>
 Subject: [34/34] ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem)
 References: <20091211052312.805428372@linux.site>
 Content-Disposition: inline; filename=0030-ext4-Fix-potential-fiemap-deadlock-mmap_sem-vs.-i_da.patch
 Content-Length: 5029
 Lines: 115

 2.6.32-stable review patch.  If anyone has any objections, please let us know.

 ------------------

 (cherry picked from commit fab3a549e204172236779f502eccb4f9bf0dc87d)

 Fix the following potential circular locking dependency between
 mm->mmap_sem and ei->i_data_sem:

     =======================================================
     [ INFO: possible circular locking dependency detected ]
     2.6.32-04115-gec044c5 #37
     -------------------------------------------------------
     ureadahead/1855 is trying to acquire lock:
      (&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac

     but task is already holding lock:
      (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159

     which lock already depends on the new lock.

     the existing dependency chain (in reverse order) is:

     -> #1 (&ei->i_data_sem){++++..}:
            [<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f
            [<ffffffff81099e7e>] lock_acquire+0xdc/0x102
            [<ffffffff81516633>] down_read+0x51/0x84
            [<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5
            [<ffffffff811a3453>] ext4_get_block+0xab/0xef
            [<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d
            [<ffffffff81155360>] mpage_readpages+0xd0/0x114
            [<ffffffff811a104b>] ext4_readpages+0x1d/0x1f
            [<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc
            [<ffffffff810f86f2>] ra_submit+0x21/0x25
            [<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c
            [<ffffffff81107b97>] __do_fault+0x55/0x3a2
            [<ffffffff81109db0>] handle_mm_fault+0x327/0x734
            [<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa
            [<ffffffff81518205>] page_fault+0x25/0x30
            [<ffffffff812a34d8>] clear_user+0x38/0x3c
            [<ffffffff81167e16>] padzero+0x20/0x31
            [<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed
            [<ffffffff81130e95>] search_binary_handler+0xc2/0x259
            [<ffffffff81166d64>] load_script+0x1b8/0x1cc
            [<ffffffff81130e95>] search_binary_handler+0xc2/0x259
            [<ffffffff8113255f>] do_execve+0x1ce/0x2cf
            [<ffffffff81027494>] sys_execve+0x43/0x5a
            [<ffffffff8102918a>] stub_execve+0x6a/0xc0

     -> #0 (&mm->mmap_sem){++++++}:
            [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f
            [<ffffffff81099e7e>] lock_acquire+0xdc/0x102
            [<ffffffff81107251>] might_fault+0x89/0xac
            [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda
            [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157
            [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1
            [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159
            [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6
            [<ffffffff811392ca>] sys_ioctl+0x56/0x79
            [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b

     other info that might help us debug this:

     1 lock held by ureadahead/1855:
      #0:  (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159

     stack backtrace:
     Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37
     Call Trace:
      [<ffffffff81098c70>] print_circular_bug+0xa8/0xb7
      [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f
      [<ffffffff8102f229>] ? sched_clock+0x9/0xd
      [<ffffffff81099e7e>] lock_acquire+0xdc/0x102
      [<ffffffff81107224>] ? might_fault+0x5c/0xac
      [<ffffffff81107251>] might_fault+0x89/0xac
      [<ffffffff81107224>] ? might_fault+0x5c/0xac
      [<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c
      [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda
      [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157
      [<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157
      [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1
      [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159
      [<ffffffff81107224>] ? might_fault+0x5c/0xac
      [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6
      [<ffffffff8129f6d0>] ? __up_read+0x8d/0x95
      [<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b
      [<ffffffff811392ca>] sys_ioctl+0x56/0x79
      [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b

 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
 ---
  fs/ext4/extents.c |    4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -1761,7 +1761,9 @@ int ext4_ext_walk_space(struct inode *in
  	while (block < last && block != EXT_MAX_BLOCK) {
  		num = last - block;
  		/* find extent for this block */
 +		down_read(&EXT4_I(inode)->i_data_sem);
  		path = ext4_ext_find_extent(inode, block, path);
 +		up_read(&EXT4_I(inode)->i_data_sem);
  		if (IS_ERR(path)) {
  			err = PTR_ERR(path);
  			path = NULL;
 @@ -3730,10 +3732,8 @@ int ext4_fiemap(struct inode *inode, str
  		 * Walk the extent tree gathering extent information.
  		 * ext4_ext_fiemap_cb will push extents back to user.
  		 */
 -		down_read(&EXT4_I(inode)->i_data_sem);
  		error = ext4_ext_walk_space(inode, start_blk, len_blks,
  					  ext4_ext_fiemap_cb, fieinfo);
 -		up_read(&EXT4_I(inode)->i_data_sem);
  	}

  	return error;


 From linux@linux.site Thu Dec 10 21:25:40 2009
 Message-Id: <20091211052312.805428372@linux.site>
 User-Agent: quilt/0.47-14.9
 Date: Thu, 10 Dec 2009 21:23:12 -0800
 From: Greg KH <gregkh@suse.de>
 To: linux-kernel@vger.kernel.org,
  stable@kernel.org
 Cc: stable-review@kernel.org,
  torvalds@linux-foundation.org,
  akpm@linux-foundation.org,
  alan@lxorguk.ukuu.org.uk
 Subject: [00/34] 2.6.32.1-stable review
 Content-Length: 2372
 Lines: 51

 This is the start of the stable review cycle for the 2.6.32.1 release.
 There are 34 patches in this series, all will be posted as a response to
 this one.  If anyone has any issues with these being applied, please let
 us know.  If anyone is a maintainer of the proper subsystem, and wants
 to add a Signed-off-by: line to the patch, please respond with it.

 As was done with the 2.6.31.8-rc1 release, this is not all of the
 patches in the -stable queue, just a huge chunk of ext4 patches here,
 and a few scsi ones, which should all get out sooner rather than later.
 So note that there will be more 2.6.32-stable releases coming, this is
 just the first in the series.

 Responses should be made by Sunday, Dec 13 04:00:00 UTC 2009
 Anything received after that time might be too late.

 The whole patch series can be found in one patch at:
 	kernel.org/pub/linux/kernel/v2.6/stable-review/patch-2.6.32.1-rc1.gz
 and the diffstat can be found below.

 thanks,

 greg k-h

  Documentation/filesystems/ext4.txt   |   10 +-
  Makefile                             |    2 +-
  drivers/scsi/hosts.c                 |   13 ++-
  drivers/scsi/lpfc/lpfc_init.c        |    2 +-
  drivers/scsi/megaraid/megaraid_sas.c |    8 +-
  drivers/scsi/qla2xxx/qla_attr.c      |    3 +-
  drivers/scsi/scsi_lib_dma.c          |    4 +-
  fs/ext4/balloc.c                     |    8 +-
  fs/ext4/block_validity.c             |    2 +-
  fs/ext4/ext4.h                       |    8 +
  fs/ext4/ext4_jbd2.h                  |   21 +++-
  fs/ext4/extents.c                    |   22 ++-
  fs/ext4/fsync.c                      |   54 +++----
  fs/ext4/inode.c                      |   81 +++++++---
  fs/ext4/ioctl.c                      |   29 +++--
  fs/ext4/mballoc.c                    |   40 ++++-
  fs/ext4/migrate.c                    |    4 +-
  fs/ext4/move_extent.c                |  278 ++++++++++++++++------------------
  fs/ext4/namei.c                      |   38 ++---
  fs/ext4/resize.c                     |    2 +-
  fs/ext4/super.c                      |   40 ++++--
  fs/ext4/xattr.c                      |    7 +-
  fs/jbd2/commit.c                     |    4 +
  fs/jbd2/journal.c                    |    5 +
  include/linux/sched.h                |   13 ++-
  include/scsi/osd_protocol.h          |    1 +
  include/scsi/scsi_host.h             |   16 ++-
  27 files changed, 424 insertions(+), 291 deletions(-)