| From linux@linux.site Thu Dec 10 20:27:25 2009 |
| Message-Id: <20091211042724.642198428@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:39 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [01/90] ext4: Fix memory leak fix when mounting an ext4 filesystem |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0001-ext4-Fix-memory-leak-fix-when-mounting-an-ext4-files.patch |
| Content-Length: 2371 |
| Lines: 69 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 024eab4d5bf7e3168a2b71038b3e04e6b1f376ed) |
| |
| The allocation of the ext4_group_info array was moved to a new |
| function ext4_mb_add_group_info() in commit 5f21b0e6 so that online |
| resize would use a common (and correct) codepath. Unfortunately, the |
| call to the new ext4_mb_add_group_info() function was added without |
| removing the code which originally allocated the array. This caused a |
| memory leak each time an ext4 filesystem was mounted. |
| |
| The fix is simple; remove the code that did the original allocation, |
| since it is no longer needed. |
| |
| Reported-by: Catalin Marinas <catalin.marinas@arm.com> |
| Tested-by: Catalin Marinas <catalin.marinas@arm.com> |
| Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/mballoc.c | 19 ------------------- |
| 1 file changed, 19 deletions(-) |
| |
| --- a/fs/ext4/mballoc.c |
| +++ b/fs/ext4/mballoc.c |
| @@ -2571,13 +2571,11 @@ static int ext4_mb_init_backend(struct s |
| { |
| ext4_group_t ngroups = ext4_get_groups_count(sb); |
| ext4_group_t i; |
| - int metalen; |
| struct ext4_sb_info *sbi = EXT4_SB(sb); |
| struct ext4_super_block *es = sbi->s_es; |
| int num_meta_group_infos; |
| int num_meta_group_infos_max; |
| int array_size; |
| - struct ext4_group_info **meta_group_info; |
| struct ext4_group_desc *desc; |
| |
| /* This is the number of blocks used by GDT */ |
| @@ -2622,22 +2620,6 @@ static int ext4_mb_init_backend(struct s |
| goto err_freesgi; |
| } |
| EXT4_I(sbi->s_buddy_cache)->i_disksize = 0; |
| - |
| - metalen = sizeof(*meta_group_info) << EXT4_DESC_PER_BLOCK_BITS(sb); |
| - for (i = 0; i < num_meta_group_infos; i++) { |
| - if ((i + 1) == num_meta_group_infos) |
| - metalen = sizeof(*meta_group_info) * |
| - (ngroups - |
| - (i << EXT4_DESC_PER_BLOCK_BITS(sb))); |
| - meta_group_info = kmalloc(metalen, GFP_KERNEL); |
| - if (meta_group_info == NULL) { |
| - printk(KERN_ERR "EXT4-fs: can't allocate mem for a " |
| - "buddy group\n"); |
| - goto err_freemeta; |
| - } |
| - sbi->s_group_info[i] = meta_group_info; |
| - } |
| - |
| for (i = 0; i < ngroups; i++) { |
| desc = ext4_get_group_desc(sb, i, NULL); |
| if (desc == NULL) { |
| @@ -2655,7 +2637,6 @@ err_freebuddy: |
| while (i-- > 0) |
| kfree(ext4_get_group_info(sb, i)); |
| i = num_meta_group_infos; |
| -err_freemeta: |
| while (i-- > 0) |
| kfree(sbi->s_group_info[i]); |
| iput(sbi->s_buddy_cache); |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:25 2009 |
| Message-Id: <20091211042725.199040442@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:40 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Eric Sesterhenn <eric.sesterhenn@lsexperts.de>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [02/90] ext4: Avoid null pointer dereference when decoding EROFS w/o a journal |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0002-ext4-Avoid-null-pointer-dereference-when-decoding-ER.patch |
| Content-Length: 817 |
| Lines: 25 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 78f1ddbb498283c2445c11b0dfa666424c301803) |
| |
| We need to check to make sure a journal is present before checking the |
| journal flags in ext4_decode_error(). |
| |
| Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/super.c | 3 ++- |
| 1 file changed, 2 insertions(+), 1 deletion(-) |
| |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -344,7 +344,8 @@ static const char *ext4_decode_error(str |
| errstr = "Out of memory"; |
| break; |
| case -EROFS: |
| - if (!sb || EXT4_SB(sb)->s_journal->j_flags & JBD2_ABORT) |
| + if (!sb || (EXT4_SB(sb)->s_journal && |
| + EXT4_SB(sb)->s_journal->j_flags & JBD2_ABORT)) |
| errstr = "Journal has aborted"; |
| else |
| errstr = "Readonly filesystem"; |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:26 2009 |
| Message-Id: <20091211042725.802559277@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:41 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Jan Kara <jack@suse.cz>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [03/90] jbd2: Fail to load a journal if it is too short |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0003-jbd2-Fail-to-load-a-journal-if-it-is-too-short.patch |
| Content-Length: 861 |
| Lines: 28 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit f6f50e28f0cb8d7bcdfaacc83129f005dede11b1) |
| |
| Due to on disk corruption, it can happen that journal is too short. Fail |
| to load it in such case so that we don't oops somewhere later. |
| |
| Signed-off-by: Jan Kara <jack@suse.cz> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/jbd2/journal.c | 6 ++++++ |
| 1 file changed, 6 insertions(+) |
| |
| --- a/fs/jbd2/journal.c |
| +++ b/fs/jbd2/journal.c |
| @@ -1187,6 +1187,12 @@ static int journal_reset(journal_t *jour |
| |
| first = be32_to_cpu(sb->s_first); |
| last = be32_to_cpu(sb->s_maxlen); |
| + if (first + JBD2_MIN_JOURNAL_BLOCKS > last + 1) { |
| + printk(KERN_ERR "JBD: Journal too short (blocks %llu-%llu).\n", |
| + first, last); |
| + journal_fail_superblock(journal); |
| + return -EINVAL; |
| + } |
| |
| journal->j_first = first; |
| journal->j_last = last; |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:26 2009 |
| Message-Id: <20091211042726.354254054@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:42 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Alex Zhuravlev (Tomas)" <alex.zhuravlev@sun.com>, |
| Andreas Dilger <adilger@sun.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [04/90] jbd2: round commit timer up to avoid uncommitted transaction |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0004-jbd2-round-commit-timer-up-to-avoid-uncommitted-tran.patch |
| Content-Length: 1099 |
| Lines: 27 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit b1f485f20eb9b02cc7d2009556287f3939d480cc) |
| |
| fix jiffie rounding in jbd commit timer setup code. Rounding down |
| could cause the timer to be fired before the corresponding transaction |
| has expired. That transaction can stay not committed forever if no |
| new transaction is created or expicit sync/umount happens. |
| |
| Signed-off-by: Alex Zhuravlev (Tomas) <alex.zhuravlev@sun.com> |
| Signed-off-by: Andreas Dilger <adilger@sun.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/jbd2/transaction.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/jbd2/transaction.c |
| +++ b/fs/jbd2/transaction.c |
| @@ -57,7 +57,7 @@ jbd2_get_transaction(journal_t *journal, |
| INIT_LIST_HEAD(&transaction->t_private_list); |
| |
| /* Set up the commit timer for the new transaction. */ |
| - journal->j_commit_timer.expires = round_jiffies(transaction->t_expires); |
| + journal->j_commit_timer.expires = round_jiffies_up(transaction->t_expires); |
| add_timer(&journal->j_commit_timer); |
| |
| J_ASSERT(journal->j_running_transaction == NULL); |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:27 2009 |
| Message-Id: <20091211042726.960452135@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:43 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Peng Tao <bergwolf@gmail.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [05/90] ext4: fix journal ref count in move_extent_par_page |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0005-ext4-fix-journal-ref-count-in-move_extent_par_page.patch |
| Content-Length: 920 |
| Lines: 27 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 91cc219ad963731191247c5f2db4118be2bc341a) |
| |
| move_extent_par_page calls a_ops->write_begin() to increase journal |
| handler's reference count. However, if either mext_replace_branches() |
| or ext4_get_block fails, the increased reference count isn't |
| decreased. This will cause a later attempt to umount of the fs to hang |
| forever. The patch addresses the issue by calling ext4_journal_stop() |
| if page is not NULL (which means a_ops->write_end() isn't invoked). |
| |
| Signed-off-by: Peng Tao <bergwolf@gmail.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 1 + |
| 1 file changed, 1 insertion(+) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -871,6 +871,7 @@ out: |
| if (PageLocked(page)) |
| unlock_page(page); |
| page_cache_release(page); |
| + ext4_journal_stop(handle); |
| } |
| out2: |
| ext4_journal_stop(handle); |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:28 2009 |
| Message-Id: <20091211042727.486958503@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:44 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [06/90] ext4: Fix bugs in mballocs stream allocation mode |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0006-ext4-Fix-bugs-in-mballoc-s-stream-allocation-mode.patch |
| Content-Length: 3374 |
| Lines: 99 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 4ba74d00a20256e22f159cb288ff34b587608917) |
| |
| The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all |
| screwed up. These fields were getting unconditionally all the time, |
| set even when stream allocation had not taken place, and if they were |
| being used when the file was smaller than s_mb_stream_request, which |
| is when the allocation should _not_ be doing stream allocation. |
| |
| Fix this by determining whether or not we stream allocation should |
| take place once, in ext4_mb_group_or_file(), and setting a flag which |
| gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found(). |
| This simplifies the code and assures that we are consistently using |
| (or not using) the stream allocation logic. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4.h | 2 ++ |
| fs/ext4/mballoc.c | 23 ++++++++++------------- |
| 2 files changed, 12 insertions(+), 13 deletions(-) |
| |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -88,6 +88,8 @@ typedef unsigned int ext4_group_t; |
| #define EXT4_MB_HINT_TRY_GOAL 512 |
| /* blocks already pre-reserved by delayed allocation */ |
| #define EXT4_MB_DELALLOC_RESERVED 1024 |
| +/* We are doing stream allocation */ |
| +#define EXT4_MB_STREAM_ALLOC 2048 |
| |
| |
| struct ext4_allocation_request { |
| --- a/fs/ext4/mballoc.c |
| +++ b/fs/ext4/mballoc.c |
| @@ -1360,7 +1360,7 @@ static void ext4_mb_use_best_found(struc |
| ac->alloc_semp = e4b->alloc_semp; |
| e4b->alloc_semp = NULL; |
| /* store last allocated for subsequent stream allocation */ |
| - if ((ac->ac_flags & EXT4_MB_HINT_DATA)) { |
| + if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) { |
| spin_lock(&sbi->s_md_lock); |
| sbi->s_mb_last_group = ac->ac_f_ex.fe_group; |
| sbi->s_mb_last_start = ac->ac_f_ex.fe_start; |
| @@ -1938,7 +1938,6 @@ ext4_mb_regular_allocator(struct ext4_al |
| struct ext4_sb_info *sbi; |
| struct super_block *sb; |
| struct ext4_buddy e4b; |
| - loff_t size, isize; |
| |
| sb = ac->ac_sb; |
| sbi = EXT4_SB(sb); |
| @@ -1974,20 +1973,16 @@ ext4_mb_regular_allocator(struct ext4_al |
| } |
| |
| bsbits = ac->ac_sb->s_blocksize_bits; |
| - /* if stream allocation is enabled, use global goal */ |
| - size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len; |
| - isize = i_size_read(ac->ac_inode) >> bsbits; |
| - if (size < isize) |
| - size = isize; |
| |
| - if (size < sbi->s_mb_stream_request && |
| - (ac->ac_flags & EXT4_MB_HINT_DATA)) { |
| + /* if stream allocation is enabled, use global goal */ |
| + if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) { |
| /* TBD: may be hot point */ |
| spin_lock(&sbi->s_md_lock); |
| ac->ac_g_ex.fe_group = sbi->s_mb_last_group; |
| ac->ac_g_ex.fe_start = sbi->s_mb_last_start; |
| spin_unlock(&sbi->s_md_lock); |
| } |
| + |
| /* Let's just scan groups to find more-less suitable blocks */ |
| cr = ac->ac_2order ? 0 : 1; |
| /* |
| @@ -4155,16 +4150,18 @@ static void ext4_mb_group_or_file(struct |
| if (!(ac->ac_flags & EXT4_MB_HINT_DATA)) |
| return; |
| |
| + if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY)) |
| + return; |
| + |
| size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len; |
| isize = i_size_read(ac->ac_inode) >> bsbits; |
| size = max(size, isize); |
| |
| /* don't use group allocation for large files */ |
| - if (size >= sbi->s_mb_stream_request) |
| - return; |
| - |
| - if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY)) |
| + if (size >= sbi->s_mb_stream_request) { |
| + ac->ac_flags |= EXT4_MB_STREAM_ALLOC; |
| return; |
| + } |
| |
| BUG_ON(ac->ac_lg != NULL); |
| /* |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:28 2009 |
| Message-Id: <20091211042728.124212000@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:45 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [07/90] ext4: Avoid group preallocation for closed files |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0007-ext4-Avoid-group-preallocation-for-closed-files.patch |
| Content-Length: 3603 |
| Lines: 103 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 50797481a7bdee548589506d7d7b48b08bc14dcd) |
| |
| Currently the group preallocation code tries to find a large (512) |
| free block from which to do per-cpu group allocation for small files. |
| The problem with this scheme is that it leaves the filesystem horribly |
| fragmented. In the worst case, if the filesystem is unmounted and |
| remounted (after a system shutdown, for example) we forget the fact |
| that wee were using a particular (now-partially filled) 512 block |
| extent. So the next time we try to allocate space for a small file, |
| we will find *another* completely free 512 block chunk to allocate |
| small files. Given that there are 32,768 blocks in a block group, |
| after 64 iterations of "mount, write one 4k file in a directory, |
| unmount", the block group will have 64 files, each separated by 511 |
| blocks, and the block group will no longer have any free 512 |
| completely free chunks of blocks for group preallocation space. |
| |
| So if we try to allocate blocks for a file that has been closed, such |
| that we know the final size of the file, and the filesystem is not |
| busy, avoid using group preallocation. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4.h | 30 +++++++++++++++++++++++++++++- |
| fs/ext4/mballoc.c | 10 +++++++++- |
| 2 files changed, 38 insertions(+), 2 deletions(-) |
| |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -952,6 +952,7 @@ struct ext4_sb_info { |
| atomic_t s_mb_lost_chunks; |
| atomic_t s_mb_preallocated; |
| atomic_t s_mb_discarded; |
| + atomic_t s_lock_busy; |
| |
| /* locality groups */ |
| struct ext4_locality_group *s_locality_groups; |
| @@ -1593,15 +1594,42 @@ struct ext4_group_info { |
| #define EXT4_MB_GRP_NEED_INIT(grp) \ |
| (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state))) |
| |
| +#define EXT4_MAX_CONTENTION 8 |
| +#define EXT4_CONTENTION_THRESHOLD 2 |
| + |
| static inline spinlock_t *ext4_group_lock_ptr(struct super_block *sb, |
| ext4_group_t group) |
| { |
| return bgl_lock_ptr(EXT4_SB(sb)->s_blockgroup_lock, group); |
| } |
| |
| +/* |
| + * Returns true if the filesystem is busy enough that attempts to |
| + * access the block group locks has run into contention. |
| + */ |
| +static inline int ext4_fs_is_busy(struct ext4_sb_info *sbi) |
| +{ |
| + return (atomic_read(&sbi->s_lock_busy) > EXT4_CONTENTION_THRESHOLD); |
| +} |
| + |
| static inline void ext4_lock_group(struct super_block *sb, ext4_group_t group) |
| { |
| - spin_lock(ext4_group_lock_ptr(sb, group)); |
| + spinlock_t *lock = ext4_group_lock_ptr(sb, group); |
| + if (spin_trylock(lock)) |
| + /* |
| + * We're able to grab the lock right away, so drop the |
| + * lock contention counter. |
| + */ |
| + atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, -1, 0); |
| + else { |
| + /* |
| + * The lock is busy, so bump the contention counter, |
| + * and then wait on the spin lock. |
| + */ |
| + atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, 1, |
| + EXT4_MAX_CONTENTION); |
| + spin_lock(lock); |
| + } |
| } |
| |
| static inline void ext4_unlock_group(struct super_block *sb, |
| --- a/fs/ext4/mballoc.c |
| +++ b/fs/ext4/mballoc.c |
| @@ -4154,9 +4154,17 @@ static void ext4_mb_group_or_file(struct |
| return; |
| |
| size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len; |
| - isize = i_size_read(ac->ac_inode) >> bsbits; |
| + isize = (i_size_read(ac->ac_inode) + ac->ac_sb->s_blocksize - 1) |
| + >> bsbits; |
| size = max(size, isize); |
| |
| + if ((size == isize) && |
| + !ext4_fs_is_busy(sbi) && |
| + (atomic_read(&ac->ac_inode->i_writecount) == 0)) { |
| + ac->ac_flags |= EXT4_MB_HINT_NOPREALLOC; |
| + return; |
| + } |
| + |
| /* don't use group allocation for large files */ |
| if (size >= sbi->s_mb_stream_request) { |
| ac->ac_flags |= EXT4_MB_STREAM_ALLOC; |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:29 2009 |
| Message-Id: <20091211042728.672704124@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:46 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Jan Kara <jack@suse.cz>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [08/90] jbd2: Annotate transaction start also for jbd2_journal_restart() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0008-jbd2-Annotate-transaction-start-also-for-jbd2_journa.patch |
| Content-Length: 1349 |
| Lines: 43 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 9599b0e597d810be9b8f759ea6e9619c4f983c5e) |
| |
| lockdep annotation for a transaction start has been at the end of |
| jbd2_journal_start(). But a transaction is also started from |
| jbd2_journal_restart(). Move the lockdep annotation to start_this_handle() |
| which covers both cases. |
| |
| Signed-off-by: Jan Kara <jack@suse.cz> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/jbd2/transaction.c | 5 +++-- |
| 1 file changed, 3 insertions(+), 2 deletions(-) |
| |
| --- a/fs/jbd2/transaction.c |
| +++ b/fs/jbd2/transaction.c |
| @@ -238,6 +238,8 @@ repeat_locked: |
| __jbd2_log_space_left(journal)); |
| spin_unlock(&transaction->t_handle_lock); |
| spin_unlock(&journal->j_state_lock); |
| + |
| + lock_map_acquire(&handle->h_lockdep_map); |
| out: |
| if (unlikely(new_transaction)) /* It's usually NULL */ |
| kfree(new_transaction); |
| @@ -303,8 +305,6 @@ handle_t *jbd2_journal_start(journal_t * |
| handle = ERR_PTR(err); |
| goto out; |
| } |
| - |
| - lock_map_acquire(&handle->h_lockdep_map); |
| out: |
| return handle; |
| } |
| @@ -426,6 +426,7 @@ int jbd2_journal_restart(handle_t *handl |
| __jbd2_log_start_commit(journal, transaction->t_tid); |
| spin_unlock(&journal->j_state_lock); |
| |
| + lock_map_release(&handle->h_lockdep_map); |
| handle->h_buffer_credits = nblocks; |
| ret = start_this_handle(journal, handle); |
| return ret; |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:29 2009 |
| Message-Id: <20091211042729.262525249@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:47 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Jan Kara <jack@suse.cz>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [09/90] ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0009-ext4-Fix-possible-deadlock-between-ext4_truncate-and.patch |
| Content-Length: 4792 |
| Lines: 132 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| During truncate we are sometimes forced to start a new transaction as |
| the amount of blocks to be journaled is both quite large and hard to |
| predict. So far we restarted a transaction while holding i_data_sem |
| and that violates lock ordering because i_data_sem ranks below a |
| transaction start (and it can lead to a real deadlock with |
| ext4_get_blocks() mapping blocks in some page while having a |
| transaction open). |
| |
| (cherry picked from commit 487caeef9fc08c0565e082c40a8aaf58dad92bbb) |
| |
| We fix the problem by dropping the i_data_sem before restarting the |
| transaction and acquire it afterwards. It's slightly subtle that this |
| works: |
| |
| 1) By the time ext4_truncate() is called, all the page cache for the |
| truncated part of the file is dropped so get_block() should not be |
| called on it (we only have to invalidate extent cache after we |
| reacquire i_data_sem because some extent from not-truncated part could |
| extend also into the part we are going to truncate). |
| |
| 2) Writes, migrate or defrag hold i_mutex so they are stopped for all |
| the time of the truncate. |
| |
| This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>. |
| |
| Signed-off-by: Jan Kara <jack@suse.cz> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4.h | 1 + |
| fs/ext4/extents.c | 15 ++++++++++++--- |
| fs/ext4/inode.c | 23 +++++++++++++++++++---- |
| 3 files changed, 32 insertions(+), 7 deletions(-) |
| |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -1370,6 +1370,7 @@ extern int ext4_change_inode_journal_fla |
| extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *); |
| extern int ext4_can_truncate(struct inode *inode); |
| extern void ext4_truncate(struct inode *); |
| +extern int ext4_truncate_restart_trans(handle_t *, struct inode *, int nblocks); |
| extern void ext4_set_inode_flags(struct inode *); |
| extern void ext4_get_inode_flags(struct ext4_inode_info *); |
| extern int ext4_alloc_da_blocks(struct inode *inode); |
| --- a/fs/ext4/extents.c |
| +++ b/fs/ext4/extents.c |
| @@ -93,7 +93,9 @@ static void ext4_idx_store_pblock(struct |
| ix->ei_leaf_hi = cpu_to_le16((unsigned long) ((pb >> 31) >> 1) & 0xffff); |
| } |
| |
| -static int ext4_ext_journal_restart(handle_t *handle, int needed) |
| +static int ext4_ext_truncate_extend_restart(handle_t *handle, |
| + struct inode *inode, |
| + int needed) |
| { |
| int err; |
| |
| @@ -104,7 +106,14 @@ static int ext4_ext_journal_restart(hand |
| err = ext4_journal_extend(handle, needed); |
| if (err <= 0) |
| return err; |
| - return ext4_journal_restart(handle, needed); |
| + err = ext4_truncate_restart_trans(handle, inode, needed); |
| + /* |
| + * We have dropped i_data_sem so someone might have cached again |
| + * an extent we are going to truncate. |
| + */ |
| + ext4_ext_invalidate_cache(inode); |
| + |
| + return err; |
| } |
| |
| /* |
| @@ -2138,7 +2147,7 @@ ext4_ext_rm_leaf(handle_t *handle, struc |
| } |
| credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb); |
| |
| - err = ext4_ext_journal_restart(handle, credits); |
| + err = ext4_ext_truncate_extend_restart(handle, inode, credits); |
| if (err) |
| goto out; |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -192,11 +192,24 @@ static int try_to_extend_transaction(han |
| * so before we call here everything must be consistently dirtied against |
| * this transaction. |
| */ |
| -static int ext4_journal_test_restart(handle_t *handle, struct inode *inode) |
| + int ext4_truncate_restart_trans(handle_t *handle, struct inode *inode, |
| + int nblocks) |
| { |
| + int ret; |
| + |
| + /* |
| + * Drop i_data_sem to avoid deadlock with ext4_get_blocks At this |
| + * moment, get_block can be called only for blocks inside i_size since |
| + * page cache has been already dropped and writes are blocked by |
| + * i_mutex. So we can safely drop the i_data_sem here. |
| + */ |
| BUG_ON(EXT4_JOURNAL(inode) == NULL); |
| jbd_debug(2, "restarting handle %p\n", handle); |
| - return ext4_journal_restart(handle, blocks_for_truncate(inode)); |
| + up_write(&EXT4_I(inode)->i_data_sem); |
| + ret = ext4_journal_restart(handle, blocks_for_truncate(inode)); |
| + down_write(&EXT4_I(inode)->i_data_sem); |
| + |
| + return ret; |
| } |
| |
| /* |
| @@ -3659,7 +3672,8 @@ static void ext4_clear_blocks(handle_t * |
| ext4_handle_dirty_metadata(handle, inode, bh); |
| } |
| ext4_mark_inode_dirty(handle, inode); |
| - ext4_journal_test_restart(handle, inode); |
| + ext4_truncate_restart_trans(handle, inode, |
| + blocks_for_truncate(inode)); |
| if (bh) { |
| BUFFER_TRACE(bh, "retaking write access"); |
| ext4_journal_get_write_access(handle, bh); |
| @@ -3870,7 +3884,8 @@ static void ext4_free_branches(handle_t |
| return; |
| if (try_to_extend_transaction(handle, inode)) { |
| ext4_mark_inode_dirty(handle, inode); |
| - ext4_journal_test_restart(handle, inode); |
| + ext4_truncate_restart_trans(handle, inode, |
| + blocks_for_truncate(inode)); |
| } |
| |
| ext4_free_blocks(handle, inode, nr, 1, 1); |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:30 2009 |
| Message-Id: <20091211042730.000017969@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:48 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Eric Sandeen <sandeen@redhat.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [10/90] ext4: reject too-large filesystems on 32-bit kernels |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0010-ext4-reject-too-large-filesystems-on-32-bit-kernels.patch |
| Content-Length: 1508 |
| Lines: 45 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit bf43d84b185e2ff54598f8c58a5a8e63148b6e90) |
| |
| ext4 will happily mount a > 16T filesystem on a 32-bit box, but |
| this is not safe; writes to the block device will wrap past 16T |
| and the page cache can't index past 16T (232 index * 4k pages). |
| |
| Adding another test to the existing "too many sectors" test |
| should do the trick. |
| |
| Add a comment, a relevant return value, and fix the reference |
| to the CONFIG_LBD(AF) option as well. |
| |
| Signed-off-by: Eric Sandeen <sandeen@redhat.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/super.c | 13 ++++++++++--- |
| 1 file changed, 10 insertions(+), 3 deletions(-) |
| |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -2550,12 +2550,19 @@ static int ext4_fill_super(struct super_ |
| goto failed_mount; |
| } |
| |
| - if (ext4_blocks_count(es) > |
| - (sector_t)(~0ULL) >> (sb->s_blocksize_bits - 9)) { |
| + /* |
| + * Test whether we have more sectors than will fit in sector_t, |
| + * and whether the max offset is addressable by the page cache. |
| + */ |
| + if ((ext4_blocks_count(es) > |
| + (sector_t)(~0ULL) >> (sb->s_blocksize_bits - 9)) || |
| + (ext4_blocks_count(es) > |
| + (pgoff_t)(~0ULL) >> (PAGE_CACHE_SHIFT - sb->s_blocksize_bits))) { |
| ext4_msg(sb, KERN_ERR, "filesystem" |
| - " too large to mount safely"); |
| + " too large to mount safely on this system"); |
| if (sizeof(sector_t) < 8) |
| ext4_msg(sb, KERN_WARNING, "CONFIG_LBDAF not enabled"); |
| + ret = -EFBIG; |
| goto failed_mount; |
| } |
| |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:31 2009 |
| Message-Id: <20091211042730.719283784@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:49 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Eric Sandeen <sandeen@redhat.com>, |
| "Theodore Tso" <tytso@mit.edu> |
| Subject: [11/90] ext4: Add feature set check helper for mount & remount paths |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0011-ext4-Add-feature-set-check-helper-for-mount-remount-.patch |
| Content-Length: 5384 |
| Lines: 157 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit a13fb1a4533f26c1e2b0204d5283b696689645af) |
| |
| A user reported that although his root ext4 filesystem was mounting |
| fine, other filesystems would not mount, with the: |
| |
| "Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF" |
| |
| error on his 32-bit box built without CONFIG_LBDAF. This is because |
| the test at mount time for this situation was not being re-checked |
| on remount, and the normal boot process makes an ro->rw transition, |
| so this was being missed. |
| |
| Refactor to make a common helper function to test the filesystem |
| features against the type of mount request (RO vs. RW) so that we |
| stay consistent. |
| |
| Addresses Red-Hat-Bugzilla: #517650 |
| |
| Signed-off-by: Eric Sandeen <sandeen@redhat.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| --- |
| fs/ext4/super.c | 91 ++++++++++++++++++++++++++++++-------------------------- |
| 1 file changed, 49 insertions(+), 42 deletions(-) |
| |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -2254,6 +2254,49 @@ static struct kobj_type ext4_ktype = { |
| .release = ext4_sb_release, |
| }; |
| |
| +/* |
| + * Check whether this filesystem can be mounted based on |
| + * the features present and the RDONLY/RDWR mount requested. |
| + * Returns 1 if this filesystem can be mounted as requested, |
| + * 0 if it cannot be. |
| + */ |
| +static int ext4_feature_set_ok(struct super_block *sb, int readonly) |
| +{ |
| + if (EXT4_HAS_INCOMPAT_FEATURE(sb, ~EXT4_FEATURE_INCOMPAT_SUPP)) { |
| + ext4_msg(sb, KERN_ERR, |
| + "Couldn't mount because of " |
| + "unsupported optional features (%x)", |
| + (le32_to_cpu(EXT4_SB(sb)->s_es->s_feature_incompat) & |
| + ~EXT4_FEATURE_INCOMPAT_SUPP)); |
| + return 0; |
| + } |
| + |
| + if (readonly) |
| + return 1; |
| + |
| + /* Check that feature set is OK for a read-write mount */ |
| + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, ~EXT4_FEATURE_RO_COMPAT_SUPP)) { |
| + ext4_msg(sb, KERN_ERR, "couldn't mount RDWR because of " |
| + "unsupported optional features (%x)", |
| + (le32_to_cpu(EXT4_SB(sb)->s_es->s_feature_ro_compat) & |
| + ~EXT4_FEATURE_RO_COMPAT_SUPP)); |
| + return 0; |
| + } |
| + /* |
| + * Large file size enabled file system can only be mounted |
| + * read-write on 32-bit systems if kernel is built with CONFIG_LBDAF |
| + */ |
| + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_HUGE_FILE)) { |
| + if (sizeof(blkcnt_t) < sizeof(u64)) { |
| + ext4_msg(sb, KERN_ERR, "Filesystem with huge files " |
| + "cannot be mounted RDWR without " |
| + "CONFIG_LBDAF"); |
| + return 0; |
| + } |
| + } |
| + return 1; |
| +} |
| + |
| static int ext4_fill_super(struct super_block *sb, void *data, int silent) |
| __releases(kernel_lock) |
| __acquires(kernel_lock) |
| @@ -2275,7 +2318,6 @@ static int ext4_fill_super(struct super_ |
| unsigned int db_count; |
| unsigned int i; |
| int needs_recovery, has_huge_files; |
| - int features; |
| __u64 blocks_count; |
| int err; |
| unsigned int journal_ioprio = DEFAULT_JOURNAL_IOPRIO; |
| @@ -2402,39 +2444,9 @@ static int ext4_fill_super(struct super_ |
| * previously didn't change the revision level when setting the flags, |
| * so there is a chance incompat flags are set on a rev 0 filesystem. |
| */ |
| - features = EXT4_HAS_INCOMPAT_FEATURE(sb, ~EXT4_FEATURE_INCOMPAT_SUPP); |
| - if (features) { |
| - ext4_msg(sb, KERN_ERR, |
| - "Couldn't mount because of " |
| - "unsupported optional features (%x)", |
| - (le32_to_cpu(EXT4_SB(sb)->s_es->s_feature_incompat) & |
| - ~EXT4_FEATURE_INCOMPAT_SUPP)); |
| + if (!ext4_feature_set_ok(sb, (sb->s_flags & MS_RDONLY))) |
| goto failed_mount; |
| - } |
| - features = EXT4_HAS_RO_COMPAT_FEATURE(sb, ~EXT4_FEATURE_RO_COMPAT_SUPP); |
| - if (!(sb->s_flags & MS_RDONLY) && features) { |
| - ext4_msg(sb, KERN_ERR, |
| - "Couldn't mount RDWR because of " |
| - "unsupported optional features (%x)", |
| - (le32_to_cpu(EXT4_SB(sb)->s_es->s_feature_ro_compat) & |
| - ~EXT4_FEATURE_RO_COMPAT_SUPP)); |
| - goto failed_mount; |
| - } |
| - has_huge_files = EXT4_HAS_RO_COMPAT_FEATURE(sb, |
| - EXT4_FEATURE_RO_COMPAT_HUGE_FILE); |
| - if (has_huge_files) { |
| - /* |
| - * Large file size enabled file system can only be |
| - * mount if kernel is build with CONFIG_LBDAF |
| - */ |
| - if (sizeof(root->i_blocks) < sizeof(u64) && |
| - !(sb->s_flags & MS_RDONLY)) { |
| - ext4_msg(sb, KERN_ERR, "Filesystem with huge " |
| - "files cannot be mounted read-write " |
| - "without CONFIG_LBDAF"); |
| - goto failed_mount; |
| - } |
| - } |
| + |
| blocksize = BLOCK_SIZE << le32_to_cpu(es->s_log_block_size); |
| |
| if (blocksize < EXT4_MIN_BLOCK_SIZE || |
| @@ -2470,6 +2482,8 @@ static int ext4_fill_super(struct super_ |
| } |
| } |
| |
| + has_huge_files = EXT4_HAS_RO_COMPAT_FEATURE(sb, |
| + EXT4_FEATURE_RO_COMPAT_HUGE_FILE); |
| sbi->s_bitmap_maxbytes = ext4_max_bitmap_size(sb->s_blocksize_bits, |
| has_huge_files); |
| sb->s_maxbytes = ext4_max_size(sb->s_blocksize_bits, has_huge_files); |
| @@ -3485,18 +3499,11 @@ static int ext4_remount(struct super_blo |
| if (sbi->s_journal) |
| ext4_mark_recovery_complete(sb, es); |
| } else { |
| - int ret; |
| - if ((ret = EXT4_HAS_RO_COMPAT_FEATURE(sb, |
| - ~EXT4_FEATURE_RO_COMPAT_SUPP))) { |
| - ext4_msg(sb, KERN_WARNING, "couldn't " |
| - "remount RDWR because of unsupported " |
| - "optional features (%x)", |
| - (le32_to_cpu(sbi->s_es->s_feature_ro_compat) & |
| - ~EXT4_FEATURE_RO_COMPAT_SUPP)); |
| + /* Make sure we can mount this feature set readwrite */ |
| + if (!ext4_feature_set_ok(sb, 0)) { |
| err = -EROFS; |
| goto restore_opts; |
| } |
| - |
| /* |
| * Make sure the group descriptor checksums |
| * are sane. If they aren't, refuse to remount r/w. |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:32 2009 |
| Message-Id: <20091211042731.668869144@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:50 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu> |
| Subject: [12/90] ext4: Add missing unlock_new_inode() call in extent migration code |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0012-ext4-Add-missing-unlock_new_inode-call-in-extent-mig.patch |
| Content-Length: 1785 |
| Lines: 46 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit a8526e84ac758ac6da45cf273aa1538a6a7aa3de) |
| |
| We need to unlock the new inode before iput. This patch fixes the |
| following warning when calling chattr +e to migrate a file to use |
| extents. It also fixes problems in when e4defrag attempts to |
| defragment an inode. |
| |
| [ 470.400044] ------------[ cut here ]------------ |
| [ 470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a() |
| [ 470.400072] Hardware name: N/A |
| ..... |
| ... |
| [ 470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4 |
| [ 470.400359] Call Trace: |
| [ 470.400372] [<ffffffff81037771>] warn_slowpath_common+0x77/0x8f |
| [ 470.400385] [<ffffffff81037798>] warn_slowpath_null+0xf/0x11 |
| [ 470.400395] [<ffffffff810b7f28>] generic_delete_inode+0x65/0x16a |
| [ 470.400405] [<ffffffff810b8044>] generic_drop_inode+0x17/0x1bd |
| [ 470.400413] [<ffffffff810b7083>] iput+0x61/0x65 |
| [ 470.400455] [<ffffffffa003b229>] ext4_ext_migrate+0x5eb/0x66a [ext4] |
| [ 470.400492] [<ffffffffa002b1f8>] ext4_ioctl+0x340/0x756 [ext4] |
| [ 470.400507] [<ffffffff810b1a91>] vfs_ioctl+0x1d/0x82 |
| [ 470.400517] [<ffffffff810b1ff0>] do_vfs_ioctl+0x483/0x4c9 |
| [ 470.400527] [<ffffffff81059c30>] ? trace_hardirqs_on+0xd/0xf |
| [ 470.400537] [<ffffffff810b2087>] sys_ioctl+0x51/0x74 |
| [ 470.400549] [<ffffffff8100ba6b>] system_call_fastpath+0x16/0x1b |
| [ 470.400557] ---[ end trace ab85723542352dac ]--- |
| |
| Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| --- |
| fs/ext4/migrate.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/ext4/migrate.c |
| +++ b/fs/ext4/migrate.c |
| @@ -618,7 +618,7 @@ err_out: |
| tmp_inode->i_nlink = 0; |
| |
| ext4_journal_stop(handle); |
| - |
| + unlock_new_inode(tmp_inode); |
| iput(tmp_inode); |
| |
| return retval; |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:33 2009 |
| Message-Id: <20091211042732.645012343@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:51 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu> |
| Subject: [13/90] ext4: Allow rename to create more than EXT4_LINK_MAX subdirectories |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0013-ext4-Allow-rename-to-create-more-than-EXT4_LINK_MAX-.patch |
| Content-Length: 705 |
| Lines: 23 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 2c94eb86c66e1eaaa1e7d8a2120f4fad5e7e7736) |
| |
| Use EXT4_DIR_LINK_MAX so that rename() can move a directory into new |
| parent directory without running into the EXT4_LINK_MAX limit. |
| |
| Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| --- |
| fs/ext4/namei.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/ext4/namei.c |
| +++ b/fs/ext4/namei.c |
| @@ -2413,7 +2413,7 @@ static int ext4_rename(struct inode *old |
| goto end_rename; |
| retval = -EMLINK; |
| if (!new_inode && new_dir != old_dir && |
| - new_dir->i_nlink >= EXT4_LINK_MAX) |
| + EXT4_DIR_LINK_MAX(new_dir)) |
| goto end_rename; |
| } |
| if (!new_bh) { |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:34 2009 |
| Message-Id: <20091211042733.492375175@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:52 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu> |
| Subject: [14/90] ext4: Limit number of links that can be created by ext4_link() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0014-ext4-Limit-number-of-links-that-can-be-created-by-ex.patch |
| Content-Length: 633 |
| Lines: 23 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit b05ab1dc3795e6f997fb0d34f38fce5012533c3e) |
| |
| In ext4_link we need to check using EXT4_LINK_MAX, and not |
| EXT4_DIR_LINK_MAX(), since ext4_link() is creating hard links of |
| regular files, and not directories. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| --- |
| fs/ext4/namei.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/ext4/namei.c |
| +++ b/fs/ext4/namei.c |
| @@ -2310,7 +2310,7 @@ static int ext4_link(struct dentry *old_ |
| struct inode *inode = old_dentry->d_inode; |
| int err, retries = 0; |
| |
| - if (EXT4_DIR_LINK_MAX(inode)) |
| + if (inode->i_nlink >= EXT4_LINK_MAX) |
| return -EMLINK; |
| |
| /* |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:35 2009 |
| Message-Id: <20091211042734.391001793@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:53 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu> |
| Subject: [15/90] ext4: Restore wbc->range_start in ext4_da_writepages() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0015-ext4-Restore-wbc-range_start-in-ext4_da_writepages.patch |
| Content-Length: 1228 |
| Lines: 35 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit de89de6e0cf4b1eb13f27137cf2aa40d287aabdf) |
| |
| To solve a lock inversion problem, we implement part of the |
| range_cyclic algorithm in ext4_da_writepages(). (See commit 2acf2c26 |
| for more details.) |
| |
| As part of that change wbc->range_start was modified by ext4's |
| writepages function, which causes its callers to get confused since |
| they aren't expecting the filesystem to modify it. The simplest fix |
| is to save and restore wbc->range_start in ext4_da_writepages. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| --- |
| fs/ext4/inode.c | 2 ++ |
| 1 file changed, 2 insertions(+) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -2750,6 +2750,7 @@ static int ext4_da_writepages(struct add |
| long pages_skipped; |
| int range_cyclic, cycled = 1, io_done = 0; |
| int needed_blocks, ret = 0, nr_to_writebump = 0; |
| + loff_t range_start = wbc->range_start; |
| struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb); |
| |
| trace_ext4_da_writepages(inode, wbc); |
| @@ -2918,6 +2919,7 @@ out_writepages: |
| if (!no_nrwrite_index_update) |
| wbc->no_nrwrite_index_update = 0; |
| wbc->nr_to_write -= nr_to_writebump; |
| + wbc->range_start = range_start; |
| trace_ext4_da_writepages_result(inode, wbc, ret, pages_written); |
| return ret; |
| } |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:36 2009 |
| Message-Id: <20091211042735.238833324@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:54 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Christoph Hellwig <hch@lst.de>, |
| Eric Sandeen <sandeen@redhat.com>, |
| "Theodore Tso" <tytso@mit.edu> |
| Subject: [16/90] ext4: fix cache flush in ext4_sync_file |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0016-ext4-fix-cache-flush-in-ext4_sync_file.patch |
| Content-Length: 1069 |
| Lines: 31 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 5f3481e9a80c240f169b36ea886e2325b9aeb745) |
| |
| We need to flush the write cache unconditionally in ->fsync, otherwise |
| writes into already allocated blocks can get lost. Writes into fully |
| allocated files are very common when using disk images for |
| virtualization, and without this fix can easily lose data after |
| an fdatasync, which is the typical implementation for a cache flush on |
| the virtual drive. |
| |
| Signed-off-by: Christoph Hellwig <hch@lst.de> |
| Acked-by: Eric Sandeen <sandeen@redhat.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| --- |
| fs/ext4/fsync.c | 4 ++-- |
| 1 file changed, 2 insertions(+), 2 deletions(-) |
| |
| --- a/fs/ext4/fsync.c |
| +++ b/fs/ext4/fsync.c |
| @@ -92,9 +92,9 @@ int ext4_sync_file(struct file *file, st |
| .nr_to_write = 0, /* sys_fsync did this */ |
| }; |
| ret = sync_inode(inode, &wbc); |
| - if (journal && (journal->j_flags & JBD2_BARRIER)) |
| - blkdev_issue_flush(inode->i_sb->s_bdev, NULL); |
| } |
| out: |
| + if (journal && (journal->j_flags & JBD2_BARRIER)) |
| + blkdev_issue_flush(inode->i_sb->s_bdev, NULL); |
| return ret; |
| } |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:36 2009 |
| Message-Id: <20091211042736.230715168@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:55 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [17/90] ext4: Fix wrong comparisons in mext_check_arguments() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0017-ext4-Fix-wrong-comparisons-in-mext_check_arguments.patch |
| Content-Length: 3782 |
| Lines: 97 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 70d5d3dcea47c16058d2b093c29e07fdf61b56ad) |
| |
| The mext_check_arguments() function in move_extents.c has wrong |
| comparisons. orig_start which is passed from user-space is block |
| unit, but i_size of inode is byte unit, therefore the checks do not |
| work fine. This mis-check leads to the overflow of 'len' and then |
| hits BUG_ON() in ext4_move_extents(). The patch fixes this issue. |
| |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Reviewed-by: Greg Freemyer <greg.freemyer@gmail.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 46 +++++++++++++++++++++++++++------------------- |
| 1 file changed, 27 insertions(+), 19 deletions(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -898,6 +898,10 @@ mext_check_arguments(struct inode *orig_ |
| struct inode *donor_inode, __u64 orig_start, |
| __u64 donor_start, __u64 *len, __u64 moved_len) |
| { |
| + ext4_lblk_t orig_blocks, donor_blocks; |
| + unsigned int blkbits = orig_inode->i_blkbits; |
| + unsigned int blocksize = 1 << blkbits; |
| + |
| /* Regular file check */ |
| if (!S_ISREG(orig_inode->i_mode) || !S_ISREG(donor_inode->i_mode)) { |
| ext4_debug("ext4 move extent: The argument files should be " |
| @@ -972,43 +976,47 @@ mext_check_arguments(struct inode *orig_ |
| } |
| |
| if (orig_inode->i_size > donor_inode->i_size) { |
| - if (orig_start >= donor_inode->i_size) { |
| + donor_blocks = (donor_inode->i_size + blocksize - 1) >> blkbits; |
| + /* TODO: eliminate this artificial restriction */ |
| + if (orig_start >= donor_blocks) { |
| ext4_debug("ext4 move extent: orig start offset " |
| - "[%llu] should be less than donor file size " |
| - "[%lld] [ino:orig %lu, donor_inode %lu]\n", |
| - orig_start, donor_inode->i_size, |
| + "[%llu] should be less than donor file blocks " |
| + "[%u] [ino:orig %lu, donor %lu]\n", |
| + orig_start, donor_blocks, |
| orig_inode->i_ino, donor_inode->i_ino); |
| return -EINVAL; |
| } |
| |
| - if (orig_start + *len > donor_inode->i_size) { |
| + /* TODO: eliminate this artificial restriction */ |
| + if (orig_start + *len > donor_blocks) { |
| ext4_debug("ext4 move extent: End offset [%llu] should " |
| - "be less than donor file size [%lld]." |
| - "So adjust length from %llu to %lld " |
| + "be less than donor file blocks [%u]." |
| + "So adjust length from %llu to %llu " |
| "[ino:orig %lu, donor %lu]\n", |
| - orig_start + *len, donor_inode->i_size, |
| - *len, donor_inode->i_size - orig_start, |
| + orig_start + *len, donor_blocks, |
| + *len, donor_blocks - orig_start, |
| orig_inode->i_ino, donor_inode->i_ino); |
| - *len = donor_inode->i_size - orig_start; |
| + *len = donor_blocks - orig_start; |
| } |
| } else { |
| - if (orig_start >= orig_inode->i_size) { |
| + orig_blocks = (orig_inode->i_size + blocksize - 1) >> blkbits; |
| + if (orig_start >= orig_blocks) { |
| ext4_debug("ext4 move extent: start offset [%llu] " |
| - "should be less than original file size " |
| - "[%lld] [inode:orig %lu, donor %lu]\n", |
| - orig_start, orig_inode->i_size, |
| + "should be less than original file blocks " |
| + "[%u] [ino:orig %lu, donor %lu]\n", |
| + orig_start, orig_blocks, |
| orig_inode->i_ino, donor_inode->i_ino); |
| return -EINVAL; |
| } |
| |
| - if (orig_start + *len > orig_inode->i_size) { |
| + if (orig_start + *len > orig_blocks) { |
| ext4_debug("ext4 move extent: Adjust length " |
| - "from %llu to %lld. Because it should be " |
| - "less than original file size " |
| + "from %llu to %llu. Because it should be " |
| + "less than original file blocks " |
| "[ino:orig %lu, donor %lu]\n", |
| - *len, orig_inode->i_size - orig_start, |
| + *len, orig_blocks - orig_start, |
| orig_inode->i_ino, donor_inode->i_ino); |
| - *len = orig_inode->i_size - orig_start; |
| + *len = orig_blocks - orig_start; |
| } |
| } |
| |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:37 2009 |
| Message-Id: <20091211042737.028935695@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:56 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [18/90] ext4: Remove unneeded BUG_ON() in ext4_move_extents() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0018-ext4-Remove-unneeded-BUG_ON-in-ext4_move_extents.patch |
| Content-Length: 923 |
| Lines: 28 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit daea696dbac0e33af3cfe304efbfb8d74e0effe6) |
| |
| The ext4_move_extents() functions checks with BUG_ON() whether the |
| exchanged blocks count accords with request blocks count. But, if the |
| target range (orig_start + len) includes sparse block(s), 'moved_len' |
| (exchanged blocks count) does not agree with 'len' (request blocks |
| count), since sparse block is not counted in 'moved_len'. This causes |
| us to hit the BUG_ON(), even though the function succeeded. |
| |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 3 --- |
| 1 file changed, 3 deletions(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -1322,8 +1322,5 @@ out2: |
| if (ret) |
| return ret; |
| |
| - /* All of the specified blocks must be exchanged in succeed */ |
| - BUG_ON(*moved_len != len); |
| - |
| return 0; |
| } |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:38 2009 |
| Message-Id: <20091211042737.946304901@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:57 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [19/90] ext4: Return exchanged blocks count to user space in failure |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0019-ext4-Return-exchanged-blocks-count-to-user-space-in-.patch |
| Content-Length: 771 |
| Lines: 29 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 8d6669133d8cdbb7cbe0e1f0f3744e7802a84afe) |
| |
| Return exchanged blocks count (moved_len) to user space, |
| if ext4_move_extents() failed on the way. |
| |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ioctl.c | 7 +++---- |
| 1 file changed, 3 insertions(+), 4 deletions(-) |
| |
| --- a/fs/ext4/ioctl.c |
| +++ b/fs/ext4/ioctl.c |
| @@ -243,10 +243,9 @@ setversion_out: |
| me.donor_start, me.len, &me.moved_len); |
| fput(donor_filp); |
| |
| - if (!err) |
| - if (copy_to_user((struct move_extent *)arg, |
| - &me, sizeof(me))) |
| - return -EFAULT; |
| + if (copy_to_user((struct move_extent *)arg, &me, sizeof(me))) |
| + return -EFAULT; |
| + |
| return err; |
| } |
| |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:39 2009 |
| Message-Id: <20091211042738.873362487@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:58 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [20/90] ext4: Take page lock before looking at attached buffer_heads flags |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0020-ext4-Take-page-lock-before-looking-at-attached-buffe.patch |
| Content-Length: 1236 |
| Lines: 39 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit a827eaffff07c7d58a4cb32158cbeb4849f4e33a) |
| |
| In order to check whether the buffer_heads are mapped we need to hold |
| page lock. Otherwise a reclaim can cleanup the attached buffer_heads. |
| |
| Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 13 +++++++++++-- |
| 1 file changed, 11 insertions(+), 2 deletions(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -5298,12 +5298,21 @@ int ext4_page_mkwrite(struct vm_area_str |
| else |
| len = PAGE_CACHE_SIZE; |
| |
| + lock_page(page); |
| + /* |
| + * return if we have all the buffers mapped. This avoid |
| + * the need to call write_begin/write_end which does a |
| + * journal_start/journal_stop which can block and take |
| + * long time |
| + */ |
| if (page_has_buffers(page)) { |
| - /* return if we have all the buffers mapped */ |
| if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL, |
| - ext4_bh_unmapped)) |
| + ext4_bh_unmapped)) { |
| + unlock_page(page); |
| goto out_unlock; |
| + } |
| } |
| + unlock_page(page); |
| /* |
| * OK, we need to fill the hole... Do write_begin write_end |
| * to do block allocation/reservation.We are not holding |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:40 2009 |
| Message-Id: <20091211042739.750528035@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:59 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [21/90] ext4: print more sysadmin-friendly message in check_block_validity() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0021-ext4-print-more-sysadmin-friendly-message-in-check_b.patch |
| Content-Length: 2085 |
| Lines: 60 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 80e42468d65475e92651e62175bb7807773321d0) |
| |
| Drop the WARN_ON(1), as he stack trace is not appropriate, since it is |
| triggered by file system corruption, and it misleads users into |
| thinking there is a kernel bug. In addition, change the message |
| displayed by ext4_error() to make it clear that this is a file system |
| corruption problem. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 16 ++++++++-------- |
| 1 file changed, 8 insertions(+), 8 deletions(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -1122,16 +1122,15 @@ static void ext4_da_update_reserve_space |
| ext4_discard_preallocations(inode); |
| } |
| |
| -static int check_block_validity(struct inode *inode, sector_t logical, |
| - sector_t phys, int len) |
| +static int check_block_validity(struct inode *inode, const char *msg, |
| + sector_t logical, sector_t phys, int len) |
| { |
| if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), phys, len)) { |
| - ext4_error(inode->i_sb, "check_block_validity", |
| + ext4_error(inode->i_sb, msg, |
| "inode #%lu logical block %llu mapped to %llu " |
| "(size %d)", inode->i_ino, |
| (unsigned long long) logical, |
| (unsigned long long) phys, len); |
| - WARN_ON(1); |
| return -EIO; |
| } |
| return 0; |
| @@ -1183,8 +1182,8 @@ int ext4_get_blocks(handle_t *handle, st |
| up_read((&EXT4_I(inode)->i_data_sem)); |
| |
| if (retval > 0 && buffer_mapped(bh)) { |
| - int ret = check_block_validity(inode, block, |
| - bh->b_blocknr, retval); |
| + int ret = check_block_validity(inode, "file system corruption", |
| + block, bh->b_blocknr, retval); |
| if (ret != 0) |
| return ret; |
| } |
| @@ -1265,8 +1264,9 @@ int ext4_get_blocks(handle_t *handle, st |
| |
| up_write((&EXT4_I(inode)->i_data_sem)); |
| if (retval > 0 && buffer_mapped(bh)) { |
| - int ret = check_block_validity(inode, block, |
| - bh->b_blocknr, retval); |
| + int ret = check_block_validity(inode, "file system " |
| + "corruption after allocation", |
| + block, bh->b_blocknr, retval); |
| if (ret != 0) |
| return ret; |
| } |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:41 2009 |
| Message-Id: <20091211042740.547415555@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:00 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [22/90] ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0022-ext4-Use-bforget-in-no-journal-mode-for-ext4_journal.patch |
| Content-Length: 1230 |
| Lines: 41 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit c7acb4c16646943180bd221c167a077e0a084f9c) |
| |
| When ext4 is using a journal, a metadata block which is deallocated |
| must be passed into the journal layer so it can be dropped from the |
| current transaction and/or revoked. This is done by calling the |
| functions ext4_journal_forget() and ext4_journal_revoke(), which call |
| jbd2_journal_forget(), and jbd2_journal_revoke(), respectively. |
| |
| Since the jbd2_journal_forget() and jbd2_journal_revoke() call |
| bforget(), if ext4 is not using a journal, ext4_journal_forget() and |
| ext4_journal_revoke() must call bforget() to avoid a dirty metadata |
| block overwriting a block after it has been reallocated and reused for |
| another inode's data block. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4_jbd2.c | 4 ++-- |
| 1 file changed, 2 insertions(+), 2 deletions(-) |
| |
| --- a/fs/ext4/ext4_jbd2.c |
| +++ b/fs/ext4/ext4_jbd2.c |
| @@ -44,7 +44,7 @@ int __ext4_journal_forget(const char *wh |
| handle, err); |
| } |
| else |
| - brelse(bh); |
| + bforget(bh); |
| return err; |
| } |
| |
| @@ -60,7 +60,7 @@ int __ext4_journal_revoke(const char *wh |
| handle, err); |
| } |
| else |
| - brelse(bh); |
| + bforget(bh); |
| return err; |
| } |
| |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:41 2009 |
| Message-Id: <20091211042741.225776673@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:01 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [23/90] ext4: Assure that metadata blocks are written during fsync in no journal mode |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0023-ext4-Assure-that-metadata-blocks-are-written-during-.patch |
| Content-Length: 1900 |
| Lines: 63 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit fe188c0e084bdf3038dc0ac963c21d764f53f7da) |
| |
| When there is no journal present, we must attach buffer heads |
| associated with extent tree and indirect blocks to the inode's |
| mapping->private_list via mark_buffer_dirty_inode() so that |
| ext4_sync_file() --- which is called to service fsync() and |
| fdatasync() system calls --- can write out the inode's metadata blocks |
| by calling sync_mapping_buffers(). |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4_jbd2.c | 5 ++++- |
| fs/ext4/fsync.c | 9 +++++++-- |
| 2 files changed, 11 insertions(+), 3 deletions(-) |
| |
| --- a/fs/ext4/ext4_jbd2.c |
| +++ b/fs/ext4/ext4_jbd2.c |
| @@ -89,7 +89,10 @@ int __ext4_handle_dirty_metadata(const c |
| ext4_journal_abort_handle(where, __func__, bh, |
| handle, err); |
| } else { |
| - mark_buffer_dirty(bh); |
| + if (inode && bh) |
| + mark_buffer_dirty_inode(bh, inode); |
| + else |
| + mark_buffer_dirty(bh); |
| if (inode && inode_needs_sync(inode)) { |
| sync_dirty_buffer(bh); |
| if (buffer_req(bh) && !buffer_uptodate(bh)) { |
| --- a/fs/ext4/fsync.c |
| +++ b/fs/ext4/fsync.c |
| @@ -50,7 +50,7 @@ int ext4_sync_file(struct file *file, st |
| { |
| struct inode *inode = dentry->d_inode; |
| journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; |
| - int ret = 0; |
| + int err, ret = 0; |
| |
| J_ASSERT(ext4_journal_current_handle() == NULL); |
| |
| @@ -79,6 +79,9 @@ int ext4_sync_file(struct file *file, st |
| goto out; |
| } |
| |
| + if (!journal) |
| + ret = sync_mapping_buffers(inode->i_mapping); |
| + |
| if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) |
| goto out; |
| |
| @@ -91,7 +94,9 @@ int ext4_sync_file(struct file *file, st |
| .sync_mode = WB_SYNC_ALL, |
| .nr_to_write = 0, /* sys_fsync did this */ |
| }; |
| - ret = sync_inode(inode, &wbc); |
| + err = sync_inode(inode, &wbc); |
| + if (ret == 0) |
| + ret = err; |
| } |
| out: |
| if (journal && (journal->j_flags & JBD2_BARRIER)) |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:42 2009 |
| Message-Id: <20091211042741.985036844@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:02 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Frank Mayhar <fmayhar@google.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [24/90] ext4: Make non-journal fsync work properly |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0024-ext4-Make-non-journal-fsync-work-properly.patch |
| Content-Length: 3529 |
| Lines: 113 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 91ac6f43317c0bf99969665f98016548011dfa38) |
| |
| Teach ext4_write_inode() and ext4_do_update_inode() about non-journal |
| mode: If we're not using a journal, ext4_write_inode() now calls |
| ext4_do_update_inode() (after getting the iloc via ext4_get_inode_loc()) |
| with a new "do_sync" parameter. If that parameter is nonzero _and_ we're |
| not using a journal, ext4_do_update_inode() calls sync_dirty_buffer() |
| instead of ext4_handle_dirty_metadata(). |
| |
| This problem was found in power-fail testing, checking the amount of |
| loss of files and blocks after a power failure when using fsync() and |
| when not using fsync(). It turned out that using fsync() was actually |
| worse than not doing so, possibly because it increased the likelihood |
| that the inodes would remain unflushed and would therefore be lost at |
| the power failure. |
| |
| Signed-off-by: Frank Mayhar <fmayhar@google.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 54 ++++++++++++++++++++++++++++++++++++++++-------------- |
| 1 file changed, 40 insertions(+), 14 deletions(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -4550,7 +4550,8 @@ static int ext4_inode_blocks_set(handle_ |
| */ |
| static int ext4_do_update_inode(handle_t *handle, |
| struct inode *inode, |
| - struct ext4_iloc *iloc) |
| + struct ext4_iloc *iloc, |
| + int do_sync) |
| { |
| struct ext4_inode *raw_inode = ext4_raw_inode(iloc); |
| struct ext4_inode_info *ei = EXT4_I(inode); |
| @@ -4652,10 +4653,22 @@ static int ext4_do_update_inode(handle_t |
| raw_inode->i_extra_isize = cpu_to_le16(ei->i_extra_isize); |
| } |
| |
| - BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata"); |
| - rc = ext4_handle_dirty_metadata(handle, inode, bh); |
| - if (!err) |
| - err = rc; |
| + /* |
| + * If we're not using a journal and we were called from |
| + * ext4_write_inode() to sync the inode (making do_sync true), |
| + * we can just use sync_dirty_buffer() directly to do our dirty |
| + * work. Testing s_journal here is a bit redundant but it's |
| + * worth it to avoid potential future trouble. |
| + */ |
| + if (EXT4_SB(inode->i_sb)->s_journal == NULL && do_sync) { |
| + BUFFER_TRACE(bh, "call sync_dirty_buffer"); |
| + sync_dirty_buffer(bh); |
| + } else { |
| + BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata"); |
| + rc = ext4_handle_dirty_metadata(handle, inode, bh); |
| + if (!err) |
| + err = rc; |
| + } |
| ei->i_state &= ~EXT4_STATE_NEW; |
| |
| out_brelse: |
| @@ -4701,19 +4714,32 @@ out_brelse: |
| */ |
| int ext4_write_inode(struct inode *inode, int wait) |
| { |
| + int err; |
| + |
| if (current->flags & PF_MEMALLOC) |
| return 0; |
| |
| - if (ext4_journal_current_handle()) { |
| - jbd_debug(1, "called recursively, non-PF_MEMALLOC!\n"); |
| - dump_stack(); |
| - return -EIO; |
| - } |
| + if (EXT4_SB(inode->i_sb)->s_journal) { |
| + if (ext4_journal_current_handle()) { |
| + jbd_debug(1, "called recursively, non-PF_MEMALLOC!\n"); |
| + dump_stack(); |
| + return -EIO; |
| + } |
| |
| - if (!wait) |
| - return 0; |
| + if (!wait) |
| + return 0; |
| + |
| + err = ext4_force_commit(inode->i_sb); |
| + } else { |
| + struct ext4_iloc iloc; |
| |
| - return ext4_force_commit(inode->i_sb); |
| + err = ext4_get_inode_loc(inode, &iloc); |
| + if (err) |
| + return err; |
| + err = ext4_do_update_inode(EXT4_NOJOURNAL_HANDLE, |
| + inode, &iloc, wait); |
| + } |
| + return err; |
| } |
| |
| /* |
| @@ -5007,7 +5033,7 @@ int ext4_mark_iloc_dirty(handle_t *handl |
| get_bh(iloc->bh); |
| |
| /* ext4_do_update_inode() does jbd2_journal_dirty_metadata */ |
| - err = ext4_do_update_inode(handle, inode, iloc); |
| + err = ext4_do_update_inode(handle, inode, iloc, 0); |
| put_bh(iloc->bh); |
| return err; |
| } |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:44 2009 |
| Message-Id: <20091211042742.991422660@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:03 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [25/90] ext4: move ext4_mb_init_group() function earlier in the mballoc.c |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0025-ext4-move-ext4_mb_init_group-function-earlier-in-the.patch |
| Content-Length: 5462 |
| Lines: 211 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit b6a758ec3af3ec236dbfdcf6a06b84ac8f94957e) |
| |
| This moves the function around so that it can be called from |
| ext4_mb_load_buddy(). |
| |
| Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/mballoc.c | 182 +++++++++++++++++++++++++++--------------------------- |
| 1 file changed, 91 insertions(+), 91 deletions(-) |
| |
| --- a/fs/ext4/mballoc.c |
| +++ b/fs/ext4/mballoc.c |
| @@ -908,6 +908,97 @@ out: |
| return err; |
| } |
| |
| +static noinline_for_stack |
| +int ext4_mb_init_group(struct super_block *sb, ext4_group_t group) |
| +{ |
| + |
| + int ret = 0; |
| + void *bitmap; |
| + int blocks_per_page; |
| + int block, pnum, poff; |
| + int num_grp_locked = 0; |
| + struct ext4_group_info *this_grp; |
| + struct ext4_sb_info *sbi = EXT4_SB(sb); |
| + struct inode *inode = sbi->s_buddy_cache; |
| + struct page *page = NULL, *bitmap_page = NULL; |
| + |
| + mb_debug("init group %lu\n", group); |
| + blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize; |
| + this_grp = ext4_get_group_info(sb, group); |
| + /* |
| + * This ensures we don't add group |
| + * to this buddy cache via resize |
| + */ |
| + num_grp_locked = ext4_mb_get_buddy_cache_lock(sb, group); |
| + if (!EXT4_MB_GRP_NEED_INIT(this_grp)) { |
| + /* |
| + * somebody initialized the group |
| + * return without doing anything |
| + */ |
| + ret = 0; |
| + goto err; |
| + } |
| + /* |
| + * the buddy cache inode stores the block bitmap |
| + * and buddy information in consecutive blocks. |
| + * So for each group we need two blocks. |
| + */ |
| + block = group * 2; |
| + pnum = block / blocks_per_page; |
| + poff = block % blocks_per_page; |
| + page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); |
| + if (page) { |
| + BUG_ON(page->mapping != inode->i_mapping); |
| + ret = ext4_mb_init_cache(page, NULL); |
| + if (ret) { |
| + unlock_page(page); |
| + goto err; |
| + } |
| + unlock_page(page); |
| + } |
| + if (page == NULL || !PageUptodate(page)) { |
| + ret = -EIO; |
| + goto err; |
| + } |
| + mark_page_accessed(page); |
| + bitmap_page = page; |
| + bitmap = page_address(page) + (poff * sb->s_blocksize); |
| + |
| + /* init buddy cache */ |
| + block++; |
| + pnum = block / blocks_per_page; |
| + poff = block % blocks_per_page; |
| + page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); |
| + if (page == bitmap_page) { |
| + /* |
| + * If both the bitmap and buddy are in |
| + * the same page we don't need to force |
| + * init the buddy |
| + */ |
| + unlock_page(page); |
| + } else if (page) { |
| + BUG_ON(page->mapping != inode->i_mapping); |
| + ret = ext4_mb_init_cache(page, bitmap); |
| + if (ret) { |
| + unlock_page(page); |
| + goto err; |
| + } |
| + unlock_page(page); |
| + } |
| + if (page == NULL || !PageUptodate(page)) { |
| + ret = -EIO; |
| + goto err; |
| + } |
| + mark_page_accessed(page); |
| +err: |
| + ext4_mb_put_buddy_cache_lock(sb, group, num_grp_locked); |
| + if (bitmap_page) |
| + page_cache_release(bitmap_page); |
| + if (page) |
| + page_cache_release(page); |
| + return ret; |
| +} |
| + |
| static noinline_for_stack int |
| ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, |
| struct ext4_buddy *e4b) |
| @@ -1837,97 +1928,6 @@ void ext4_mb_put_buddy_cache_lock(struct |
| |
| } |
| |
| -static noinline_for_stack |
| -int ext4_mb_init_group(struct super_block *sb, ext4_group_t group) |
| -{ |
| - |
| - int ret; |
| - void *bitmap; |
| - int blocks_per_page; |
| - int block, pnum, poff; |
| - int num_grp_locked = 0; |
| - struct ext4_group_info *this_grp; |
| - struct ext4_sb_info *sbi = EXT4_SB(sb); |
| - struct inode *inode = sbi->s_buddy_cache; |
| - struct page *page = NULL, *bitmap_page = NULL; |
| - |
| - mb_debug("init group %lu\n", group); |
| - blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize; |
| - this_grp = ext4_get_group_info(sb, group); |
| - /* |
| - * This ensures we don't add group |
| - * to this buddy cache via resize |
| - */ |
| - num_grp_locked = ext4_mb_get_buddy_cache_lock(sb, group); |
| - if (!EXT4_MB_GRP_NEED_INIT(this_grp)) { |
| - /* |
| - * somebody initialized the group |
| - * return without doing anything |
| - */ |
| - ret = 0; |
| - goto err; |
| - } |
| - /* |
| - * the buddy cache inode stores the block bitmap |
| - * and buddy information in consecutive blocks. |
| - * So for each group we need two blocks. |
| - */ |
| - block = group * 2; |
| - pnum = block / blocks_per_page; |
| - poff = block % blocks_per_page; |
| - page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); |
| - if (page) { |
| - BUG_ON(page->mapping != inode->i_mapping); |
| - ret = ext4_mb_init_cache(page, NULL); |
| - if (ret) { |
| - unlock_page(page); |
| - goto err; |
| - } |
| - unlock_page(page); |
| - } |
| - if (page == NULL || !PageUptodate(page)) { |
| - ret = -EIO; |
| - goto err; |
| - } |
| - mark_page_accessed(page); |
| - bitmap_page = page; |
| - bitmap = page_address(page) + (poff * sb->s_blocksize); |
| - |
| - /* init buddy cache */ |
| - block++; |
| - pnum = block / blocks_per_page; |
| - poff = block % blocks_per_page; |
| - page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS); |
| - if (page == bitmap_page) { |
| - /* |
| - * If both the bitmap and buddy are in |
| - * the same page we don't need to force |
| - * init the buddy |
| - */ |
| - unlock_page(page); |
| - } else if (page) { |
| - BUG_ON(page->mapping != inode->i_mapping); |
| - ret = ext4_mb_init_cache(page, bitmap); |
| - if (ret) { |
| - unlock_page(page); |
| - goto err; |
| - } |
| - unlock_page(page); |
| - } |
| - if (page == NULL || !PageUptodate(page)) { |
| - ret = -EIO; |
| - goto err; |
| - } |
| - mark_page_accessed(page); |
| -err: |
| - ext4_mb_put_buddy_cache_lock(sb, group, num_grp_locked); |
| - if (bitmap_page) |
| - page_cache_release(bitmap_page); |
| - if (page) |
| - page_cache_release(page); |
| - return ret; |
| -} |
| - |
| static noinline_for_stack int |
| ext4_mb_regular_allocator(struct ext4_allocation_context *ac) |
| { |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:45 2009 |
| Message-Id: <20091211042744.228047197@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:04 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [26/90] ext4: check for need init flag in ext4_mb_load_buddy |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0026-ext4-check-for-need-init-flag-in-ext4_mb_load_buddy.patch |
| Content-Length: 2180 |
| Lines: 75 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit f41c0750538667b87a19c93952e5d42fcc069bd7) |
| |
| We should check for need init flag with the group's alloc_sem held, to |
| make sure while we are loading the buddy cache and holding a reference |
| to it, a file system resize can't add new blocks to same group. |
| |
| The patch also drops the need init flag check in |
| ext4_mb_regular_allocator() because doing the check without holding |
| alloc_sem is racy. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/mballoc.c | 39 ++++++++++++++++++--------------------- |
| 1 file changed, 18 insertions(+), 21 deletions(-) |
| |
| --- a/fs/ext4/mballoc.c |
| +++ b/fs/ext4/mballoc.c |
| @@ -1032,8 +1032,26 @@ ext4_mb_load_buddy(struct super_block *s |
| * groups mapped by the page is blocked |
| * till we are done with allocation |
| */ |
| +repeat_load_buddy: |
| down_read(e4b->alloc_semp); |
| |
| + if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { |
| + /* we need to check for group need init flag |
| + * with alloc_semp held so that we can be sure |
| + * that new blocks didn't get added to the group |
| + * when we are loading the buddy cache |
| + */ |
| + up_read(e4b->alloc_semp); |
| + /* |
| + * we need full data about the group |
| + * to make a good selection |
| + */ |
| + ret = ext4_mb_init_group(sb, group); |
| + if (ret) |
| + return ret; |
| + goto repeat_load_buddy; |
| + } |
| + |
| /* |
| * the buddy cache inode stores the block bitmap |
| * and buddy information in consecutive blocks. |
| @@ -2010,27 +2028,6 @@ repeat: |
| if (grp->bb_free == 0) |
| continue; |
| |
| - /* |
| - * if the group is already init we check whether it is |
| - * a good group and if not we don't load the buddy |
| - */ |
| - if (EXT4_MB_GRP_NEED_INIT(grp)) { |
| - /* |
| - * we need full data about the group |
| - * to make a good selection |
| - */ |
| - err = ext4_mb_init_group(sb, group); |
| - if (err) |
| - goto out; |
| - } |
| - |
| - /* |
| - * If the particular group doesn't satisfy our |
| - * criteria we continue with the next group |
| - */ |
| - if (!ext4_mb_good_group(ac, group, cr)) |
| - continue; |
| - |
| err = ext4_mb_load_buddy(sb, group, &e4b); |
| if (err) |
| goto out; |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:46 2009 |
| Message-Id: <20091211042745.672220552@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:05 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [27/90] ext4: Dont update superblock write time when filesystem is read-only |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0027-ext4-Don-t-update-superblock-write-time-when-filesys.patch |
| Content-Length: 1535 |
| Lines: 37 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 71290b368ad5e1e0b0b300c9d5638490a9fd1a2d) |
| |
| This avoids updating the superblock write time when we are mounting |
| the root file system read/only but we need to replay the journal; at |
| that point, for people who are east of GMT and who make their clock |
| tick in localtime for Windows bug-for-bug compatibility, and this will |
| cause e2fsck to complain and force a full file system check. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/super.c | 13 ++++++++++++- |
| 1 file changed, 12 insertions(+), 1 deletion(-) |
| |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -3230,7 +3230,18 @@ static int ext4_commit_super(struct supe |
| clear_buffer_write_io_error(sbh); |
| set_buffer_uptodate(sbh); |
| } |
| - es->s_wtime = cpu_to_le32(get_seconds()); |
| + /* |
| + * If the file system is mounted read-only, don't update the |
| + * superblock write time. This avoids updating the superblock |
| + * write time when we are mounting the root file system |
| + * read/only but we need to replay the journal; at that point, |
| + * for people who are east of GMT and who make their clock |
| + * tick in localtime for Windows bug-for-bug compatibility, |
| + * the clock is set in the future, and this will cause e2fsck |
| + * to complain and force a full file system check. |
| + */ |
| + if (!(sb->s_flags & MS_RDONLY)) |
| + es->s_wtime = cpu_to_le32(get_seconds()); |
| es->s_kbytes_written = |
| cpu_to_le64(EXT4_SB(sb)->s_kbytes_written + |
| ((part_stat_read(sb->s_bdev->bd_part, sectors[1]) - |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:47 2009 |
| Message-Id: <20091211042746.862451041@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:06 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Andreas Schlick <schlick@lavabit.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [28/90] ext4: Always set dx_nodes fake_dirent explicitly. |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0028-ext4-Always-set-dx_node-s-fake_dirent-explicitly.patch |
| Content-Length: 1031 |
| Lines: 28 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 1f7bebb9e911d870fa8f997ddff838e82b5715ea) |
| |
| When ext4_dx_add_entry() has to split an index node, it has to ensure that |
| name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck |
| won't recognise it as an intermediate htree node and consider the htree to |
| be corrupted. |
| |
| Signed-off-by: Andreas Schlick <schlick@lavabit.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/namei.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/ext4/namei.c |
| +++ b/fs/ext4/namei.c |
| @@ -1590,9 +1590,9 @@ static int ext4_dx_add_entry(handle_t *h |
| goto cleanup; |
| node2 = (struct dx_node *)(bh2->b_data); |
| entries2 = node2->entries; |
| + memset(&node2->fake, 0, sizeof(struct fake_dirent)); |
| node2->fake.rec_len = ext4_rec_len_to_disk(sb->s_blocksize, |
| sb->s_blocksize); |
| - node2->fake.inode = 0; |
| BUFFER_TRACE(frame->bh, "get_write_access"); |
| err = ext4_journal_get_write_access(handle, frame->bh); |
| if (err) |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:48 2009 |
| Message-Id: <20091211042747.846949412@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:07 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [29/90] ext4: Fix initalization of s_flex_groups |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0029-ext4-Fix-initalization-of-s_flex_groups.patch |
| Content-Length: 1510 |
| Lines: 40 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 7ad9bb651fc2036ea94bed94da76a4b08959a911) |
| |
| The s_flex_groups array should have been initialized using atomic_add |
| to sum up the free counts from the block groups that make up a |
| flex_bg. By using atomic_set, the value of the s_flex_groups array |
| was set to the values of the last block group in the flex_bg. |
| |
| The impact of this bug is that the block and inode allocation |
| algorithms might not pick the best flex_bg for new allocation. |
| |
| Thanks to Damien Guibouret for pointing out this problem! |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/super.c | 12 ++++++------ |
| 1 file changed, 6 insertions(+), 6 deletions(-) |
| |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -1696,12 +1696,12 @@ static int ext4_fill_flex_info(struct su |
| gdp = ext4_get_group_desc(sb, i, NULL); |
| |
| flex_group = ext4_flex_group(sbi, i); |
| - atomic_set(&sbi->s_flex_groups[flex_group].free_inodes, |
| - ext4_free_inodes_count(sb, gdp)); |
| - atomic_set(&sbi->s_flex_groups[flex_group].free_blocks, |
| - ext4_free_blks_count(sb, gdp)); |
| - atomic_set(&sbi->s_flex_groups[flex_group].used_dirs, |
| - ext4_used_dirs_count(sb, gdp)); |
| + atomic_add(ext4_free_inodes_count(sb, gdp), |
| + &sbi->s_flex_groups[flex_group].free_inodes); |
| + atomic_add(ext4_free_blks_count(sb, gdp), |
| + &sbi->s_flex_groups[flex_group].free_blocks); |
| + atomic_add(ext4_used_dirs_count(sb, gdp), |
| + &sbi->s_flex_groups[flex_group].used_dirs); |
| } |
| |
| return 1; |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:49 2009 |
| Message-Id: <20091211042748.760455091@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:08 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [30/90] ext4: Fix include/trace/events/ext4.h to work with Systemtap |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0030-ext4-Fix-include-trace-events-ext4.h-to-work-with-Sy.patch |
| Content-Length: 1379 |
| Lines: 46 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 3661d28615ea580c1db02a972fd4d3898df1cb01) |
| |
| Using relative pathnames in #include statements interacts badly with |
| SystemTap, since the fs/ext4/*.h header files are not packaged up as |
| part of a distribution kernel's header files. Since systemtap doesn't |
| use TP_fast_assign(), we can use a blind structure definition and then |
| make sure the needed header files are defined before the ext4 source |
| files #include the trace/events/ext4.h header file. |
| |
| https://bugzilla.redhat.com/show_bug.cgi?id=512478 |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/super.c | 1 + |
| include/trace/events/ext4.h | 6 ++++-- |
| 2 files changed, 5 insertions(+), 2 deletions(-) |
| |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -45,6 +45,7 @@ |
| #include "ext4_jbd2.h" |
| #include "xattr.h" |
| #include "acl.h" |
| +#include "mballoc.h" |
| |
| #define CREATE_TRACE_POINTS |
| #include <trace/events/ext4.h> |
| --- a/include/trace/events/ext4.h |
| +++ b/include/trace/events/ext4.h |
| @@ -5,10 +5,12 @@ |
| #define _TRACE_EXT4_H |
| |
| #include <linux/writeback.h> |
| -#include "../../../fs/ext4/ext4.h" |
| -#include "../../../fs/ext4/mballoc.h" |
| #include <linux/tracepoint.h> |
| |
| +struct ext4_allocation_context; |
| +struct ext4_allocation_request; |
| +struct ext4_prealloc_space; |
| + |
| TRACE_EVENT(ext4_free_inode, |
| TP_PROTO(struct inode *inode), |
| |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:49 2009 |
| Message-Id: <20091211042749.363249380@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:09 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.co.jp>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [31/90] ext4: Fix small typo for move_extent_per_page() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0031-ext4-Fix-small-typo-for-move_extent_per_page.patch |
| Content-Length: 1155 |
| Lines: 33 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 44fc48f7048ab9657b524938a832fec4e0acea98) |
| |
| This function means moving extents every page, so change its name from |
| move_exgtent_par_page(). |
| |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.co.jp> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 4 ++-- |
| 1 file changed, 2 insertions(+), 2 deletions(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -740,7 +740,7 @@ out: |
| * on success, or a negative error value on failure. |
| */ |
| static int |
| -move_extent_par_page(struct file *o_filp, struct inode *donor_inode, |
| +move_extent_per_page(struct file *o_filp, struct inode *donor_inode, |
| pgoff_t orig_page_offset, int data_offset_in_page, |
| int block_len_in_page, int uninit) |
| { |
| @@ -1267,7 +1267,7 @@ ext4_move_extents(struct file *o_filp, s |
| while (orig_page_offset <= seq_end_page) { |
| |
| /* Swap original branches with new branches */ |
| - ret = move_extent_par_page(o_filp, donor_inode, |
| + ret = move_extent_per_page(o_filp, donor_inode, |
| orig_page_offset, |
| data_offset_in_page, |
| block_len_in_page, uninit); |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:50 2009 |
| Message-Id: <20091211042749.975038594@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:10 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [32/90] ext4: Replace get_ext_path macro with an inline funciton |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0032-ext4-Replace-get_ext_path-macro-with-an-inline-funci.patch |
| Content-Length: 4445 |
| Lines: 142 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit e8505970af46658ece2545e9bc1fe594998fdcdf) |
| |
| Replace get_ext_path macro with an inline function, |
| since this macro looks like a function call but its arguments |
| get modified. Ted pointed this out, thanks. |
| |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 55 ++++++++++++++++++++++++++++++-------------------- |
| 1 file changed, 34 insertions(+), 21 deletions(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -19,14 +19,29 @@ |
| #include "ext4_extents.h" |
| #include "ext4.h" |
| |
| -#define get_ext_path(path, inode, block, ret) \ |
| - do { \ |
| - path = ext4_ext_find_extent(inode, block, path); \ |
| - if (IS_ERR(path)) { \ |
| - ret = PTR_ERR(path); \ |
| - path = NULL; \ |
| - } \ |
| - } while (0) |
| +/** |
| + * get_ext_path - Find an extent path for designated logical block number. |
| + * |
| + * @inode: an inode which is searched |
| + * @lblock: logical block number to find an extent path |
| + * @path: pointer to an extent path pointer (for output) |
| + * |
| + * ext4_ext_find_extent wrapper. Return 0 on success, or a negative error value |
| + * on failure. |
| + */ |
| +static inline int |
| +get_ext_path(struct inode *inode, ext4_lblk_t lblock, |
| + struct ext4_ext_path **path) |
| +{ |
| + int ret = 0; |
| + |
| + *path = ext4_ext_find_extent(inode, lblock, *path); |
| + if (IS_ERR(*path)) { |
| + ret = PTR_ERR(*path); |
| + *path = NULL; |
| + } |
| + return ret; |
| +} |
| |
| /** |
| * copy_extent_status - Copy the extent's initialization status |
| @@ -283,7 +298,7 @@ mext_insert_across_blocks(handle_t *hand |
| } |
| |
| if (new_flag) { |
| - get_ext_path(orig_path, orig_inode, eblock, err); |
| + err = get_ext_path(orig_inode, eblock, &orig_path); |
| if (orig_path == NULL) |
| goto out; |
| |
| @@ -293,8 +308,8 @@ mext_insert_across_blocks(handle_t *hand |
| } |
| |
| if (end_flag) { |
| - get_ext_path(orig_path, orig_inode, |
| - le32_to_cpu(end_ext->ee_block) - 1, err); |
| + err = get_ext_path(orig_inode, |
| + le32_to_cpu(end_ext->ee_block) - 1, &orig_path); |
| if (orig_path == NULL) |
| goto out; |
| |
| @@ -631,12 +646,12 @@ mext_replace_branches(handle_t *handle, |
| mext_double_down_write(orig_inode, donor_inode); |
| |
| /* Get the original extent for the block "orig_off" */ |
| - get_ext_path(orig_path, orig_inode, orig_off, err); |
| + err = get_ext_path(orig_inode, orig_off, &orig_path); |
| if (orig_path == NULL) |
| goto out; |
| |
| /* Get the donor extent for the head */ |
| - get_ext_path(donor_path, donor_inode, donor_off, err); |
| + err = get_ext_path(donor_inode, donor_off, &donor_path); |
| if (donor_path == NULL) |
| goto out; |
| depth = ext_depth(orig_inode); |
| @@ -678,7 +693,7 @@ mext_replace_branches(handle_t *handle, |
| |
| if (orig_path) |
| ext4_ext_drop_refs(orig_path); |
| - get_ext_path(orig_path, orig_inode, orig_off, err); |
| + err = get_ext_path(orig_inode, orig_off, &orig_path); |
| if (orig_path == NULL) |
| goto out; |
| depth = ext_depth(orig_inode); |
| @@ -692,8 +707,7 @@ mext_replace_branches(handle_t *handle, |
| |
| if (donor_path) |
| ext4_ext_drop_refs(donor_path); |
| - get_ext_path(donor_path, donor_inode, |
| - donor_off, err); |
| + err = get_ext_path(donor_inode, donor_off, &donor_path); |
| if (donor_path == NULL) |
| goto out; |
| depth = ext_depth(donor_inode); |
| @@ -1154,12 +1168,12 @@ ext4_move_extents(struct file *o_filp, s |
| if (file_end < block_end) |
| len -= block_end - file_end; |
| |
| - get_ext_path(orig_path, orig_inode, block_start, ret); |
| + ret = get_ext_path(orig_inode, block_start, &orig_path); |
| if (orig_path == NULL) |
| goto out2; |
| |
| /* Get path structure to check the hole */ |
| - get_ext_path(holecheck_path, orig_inode, block_start, ret); |
| + ret = get_ext_path(orig_inode, block_start, &holecheck_path); |
| if (holecheck_path == NULL) |
| goto out; |
| |
| @@ -1289,8 +1303,7 @@ ext4_move_extents(struct file *o_filp, s |
| /* Decrease buffer counter */ |
| if (holecheck_path) |
| ext4_ext_drop_refs(holecheck_path); |
| - get_ext_path(holecheck_path, orig_inode, |
| - seq_start, ret); |
| + ret = get_ext_path(orig_inode, seq_start, &holecheck_path); |
| if (holecheck_path == NULL) |
| break; |
| depth = holecheck_path->p_depth; |
| @@ -1298,7 +1311,7 @@ ext4_move_extents(struct file *o_filp, s |
| /* Decrease buffer counter */ |
| if (orig_path) |
| ext4_ext_drop_refs(orig_path); |
| - get_ext_path(orig_path, orig_inode, seq_start, ret); |
| + ret = get_ext_path(orig_inode, seq_start, &orig_path); |
| if (orig_path == NULL) |
| break; |
| |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:51 2009 |
| Message-Id: <20091211042750.648840028@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:11 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [33/90] ext4: Replace BUG_ON() with ext4_error() in move_extents.c |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0033-ext4-Replace-BUG_ON-with-ext4_error-in-move_extents..patch |
| Content-Length: 10385 |
| Lines: 352 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 2147b1a6a48e28399120ca51d4a91840a278611f) |
| |
| Replace BUG_ON calls with a call to ext4_error() |
| to print an error message if EXT4_IOC_MOVE_EXT failed |
| with some kind of reasons. This will help to debug. |
| Ted pointed this out, thanks. |
| |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 149 ++++++++++++++++++++++++++++++++++++-------------- |
| 1 file changed, 109 insertions(+), 40 deletions(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -128,6 +128,31 @@ mext_next_extent(struct inode *inode, st |
| } |
| |
| /** |
| + * mext_check_null_inode - NULL check for two inodes |
| + * |
| + * If inode1 or inode2 is NULL, return -EIO. Otherwise, return 0. |
| + */ |
| +static int |
| +mext_check_null_inode(struct inode *inode1, struct inode *inode2, |
| + const char *function) |
| +{ |
| + int ret = 0; |
| + |
| + if (inode1 == NULL) { |
| + ext4_error(inode2->i_sb, function, |
| + "Both inodes should not be NULL: " |
| + "inode1 NULL inode2 %lu", inode2->i_ino); |
| + ret = -EIO; |
| + } else if (inode2 == NULL) { |
| + ext4_error(inode1->i_sb, function, |
| + "Both inodes should not be NULL: " |
| + "inode1 %lu inode2 NULL", inode1->i_ino); |
| + ret = -EIO; |
| + } |
| + return ret; |
| +} |
| + |
| +/** |
| * mext_double_down_read - Acquire two inodes' read semaphore |
| * |
| * @orig_inode: original inode structure |
| @@ -139,8 +164,6 @@ mext_double_down_read(struct inode *orig |
| { |
| struct inode *first = orig_inode, *second = donor_inode; |
| |
| - BUG_ON(orig_inode == NULL || donor_inode == NULL); |
| - |
| /* |
| * Use the inode number to provide the stable locking order instead |
| * of its address, because the C language doesn't guarantee you can |
| @@ -167,8 +190,6 @@ mext_double_down_write(struct inode *ori |
| { |
| struct inode *first = orig_inode, *second = donor_inode; |
| |
| - BUG_ON(orig_inode == NULL || donor_inode == NULL); |
| - |
| /* |
| * Use the inode number to provide the stable locking order instead |
| * of its address, because the C language doesn't guarantee you can |
| @@ -193,8 +214,6 @@ mext_double_down_write(struct inode *ori |
| static void |
| mext_double_up_read(struct inode *orig_inode, struct inode *donor_inode) |
| { |
| - BUG_ON(orig_inode == NULL || donor_inode == NULL); |
| - |
| up_read(&EXT4_I(orig_inode)->i_data_sem); |
| up_read(&EXT4_I(donor_inode)->i_data_sem); |
| } |
| @@ -209,8 +228,6 @@ mext_double_up_read(struct inode *orig_i |
| static void |
| mext_double_up_write(struct inode *orig_inode, struct inode *donor_inode) |
| { |
| - BUG_ON(orig_inode == NULL || donor_inode == NULL); |
| - |
| up_write(&EXT4_I(orig_inode)->i_data_sem); |
| up_write(&EXT4_I(donor_inode)->i_data_sem); |
| } |
| @@ -534,7 +551,15 @@ mext_leaf_block(handle_t *handle, struct |
| * oext |-----------| |
| * new_ext |-------| |
| */ |
| - BUG_ON(le32_to_cpu(oext->ee_block) + oext_alen - 1 < new_ext_end); |
| + if (le32_to_cpu(oext->ee_block) + oext_alen - 1 < new_ext_end) { |
| + ext4_error(orig_inode->i_sb, __func__, |
| + "new_ext_end(%u) should be less than or equal to " |
| + "oext->ee_block(%u) + oext_alen(%d) - 1", |
| + new_ext_end, le32_to_cpu(oext->ee_block), |
| + oext_alen); |
| + ret = -EIO; |
| + goto out; |
| + } |
| |
| /* |
| * Case: new_ext is smaller than original extent |
| @@ -558,6 +583,7 @@ mext_leaf_block(handle_t *handle, struct |
| |
| ret = mext_insert_extents(handle, orig_inode, orig_path, o_start, |
| o_end, &start_ext, &new_ext, &end_ext); |
| +out: |
| return ret; |
| } |
| |
| @@ -668,7 +694,20 @@ mext_replace_branches(handle_t *handle, |
| /* Loop for the donor extents */ |
| while (1) { |
| /* The extent for donor must be found. */ |
| - BUG_ON(!dext || donor_off != le32_to_cpu(tmp_dext.ee_block)); |
| + if (!dext) { |
| + ext4_error(donor_inode->i_sb, __func__, |
| + "The extent for donor must be found"); |
| + err = -EIO; |
| + goto out; |
| + } else if (donor_off != le32_to_cpu(tmp_dext.ee_block)) { |
| + ext4_error(donor_inode->i_sb, __func__, |
| + "Donor offset(%u) and the first block of donor " |
| + "extent(%u) should be equal", |
| + donor_off, |
| + le32_to_cpu(tmp_dext.ee_block)); |
| + err = -EIO; |
| + goto out; |
| + } |
| |
| /* Set donor extent to orig extent */ |
| err = mext_leaf_block(handle, orig_inode, |
| @@ -1050,18 +1089,23 @@ mext_check_arguments(struct inode *orig_ |
| * @inode1: the inode structure |
| * @inode2: the inode structure |
| * |
| - * Lock two inodes' i_mutex by i_ino order. This function is moved from |
| - * fs/inode.c. |
| + * Lock two inodes' i_mutex by i_ino order. |
| + * If inode1 or inode2 is NULL, return -EIO. Otherwise, return 0. |
| */ |
| -static void |
| +static int |
| mext_inode_double_lock(struct inode *inode1, struct inode *inode2) |
| { |
| - if (inode1 == NULL || inode2 == NULL || inode1 == inode2) { |
| - if (inode1) |
| - mutex_lock(&inode1->i_mutex); |
| - else if (inode2) |
| - mutex_lock(&inode2->i_mutex); |
| - return; |
| + int ret = 0; |
| + |
| + BUG_ON(inode1 == NULL && inode2 == NULL); |
| + |
| + ret = mext_check_null_inode(inode1, inode2, __func__); |
| + if (ret < 0) |
| + goto out; |
| + |
| + if (inode1 == inode2) { |
| + mutex_lock(&inode1->i_mutex); |
| + goto out; |
| } |
| |
| if (inode1->i_ino < inode2->i_ino) { |
| @@ -1071,6 +1115,9 @@ mext_inode_double_lock(struct inode *ino |
| mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT); |
| mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD); |
| } |
| + |
| +out: |
| + return ret; |
| } |
| |
| /** |
| @@ -1079,17 +1126,28 @@ mext_inode_double_lock(struct inode *ino |
| * @inode1: the inode that is released first |
| * @inode2: the inode that is released second |
| * |
| - * This function is moved from fs/inode.c. |
| + * If inode1 or inode2 is NULL, return -EIO. Otherwise, return 0. |
| */ |
| |
| -static void |
| +static int |
| mext_inode_double_unlock(struct inode *inode1, struct inode *inode2) |
| { |
| + int ret = 0; |
| + |
| + BUG_ON(inode1 == NULL && inode2 == NULL); |
| + |
| + ret = mext_check_null_inode(inode1, inode2, __func__); |
| + if (ret < 0) |
| + goto out; |
| + |
| if (inode1) |
| mutex_unlock(&inode1->i_mutex); |
| |
| if (inode2 && inode2 != inode1) |
| mutex_unlock(&inode2->i_mutex); |
| + |
| +out: |
| + return ret; |
| } |
| |
| /** |
| @@ -1146,21 +1204,23 @@ ext4_move_extents(struct file *o_filp, s |
| ext4_lblk_t block_end, seq_start, add_blocks, file_end, seq_blocks = 0; |
| ext4_lblk_t rest_blocks; |
| pgoff_t orig_page_offset = 0, seq_end_page; |
| - int ret, depth, last_extent = 0; |
| + int ret1, ret2, depth, last_extent = 0; |
| int blocks_per_page = PAGE_CACHE_SIZE >> orig_inode->i_blkbits; |
| int data_offset_in_page; |
| int block_len_in_page; |
| int uninit; |
| |
| /* protect orig and donor against a truncate */ |
| - mext_inode_double_lock(orig_inode, donor_inode); |
| + ret1 = mext_inode_double_lock(orig_inode, donor_inode); |
| + if (ret1 < 0) |
| + return ret1; |
| |
| mext_double_down_read(orig_inode, donor_inode); |
| /* Check the filesystem environment whether move_extent can be done */ |
| - ret = mext_check_arguments(orig_inode, donor_inode, orig_start, |
| + ret1 = mext_check_arguments(orig_inode, donor_inode, orig_start, |
| donor_start, &len, *moved_len); |
| mext_double_up_read(orig_inode, donor_inode); |
| - if (ret) |
| + if (ret1) |
| goto out2; |
| |
| file_end = (i_size_read(orig_inode) - 1) >> orig_inode->i_blkbits; |
| @@ -1168,19 +1228,19 @@ ext4_move_extents(struct file *o_filp, s |
| if (file_end < block_end) |
| len -= block_end - file_end; |
| |
| - ret = get_ext_path(orig_inode, block_start, &orig_path); |
| + ret1 = get_ext_path(orig_inode, block_start, &orig_path); |
| if (orig_path == NULL) |
| goto out2; |
| |
| /* Get path structure to check the hole */ |
| - ret = get_ext_path(orig_inode, block_start, &holecheck_path); |
| + ret1 = get_ext_path(orig_inode, block_start, &holecheck_path); |
| if (holecheck_path == NULL) |
| goto out; |
| |
| depth = ext_depth(orig_inode); |
| ext_cur = holecheck_path[depth].p_ext; |
| if (ext_cur == NULL) { |
| - ret = -EINVAL; |
| + ret1 = -EINVAL; |
| goto out; |
| } |
| |
| @@ -1193,13 +1253,13 @@ ext4_move_extents(struct file *o_filp, s |
| last_extent = mext_next_extent(orig_inode, |
| holecheck_path, &ext_cur); |
| if (last_extent < 0) { |
| - ret = last_extent; |
| + ret1 = last_extent; |
| goto out; |
| } |
| last_extent = mext_next_extent(orig_inode, orig_path, |
| &ext_dummy); |
| if (last_extent < 0) { |
| - ret = last_extent; |
| + ret1 = last_extent; |
| goto out; |
| } |
| } |
| @@ -1209,7 +1269,7 @@ ext4_move_extents(struct file *o_filp, s |
| if (le32_to_cpu(ext_cur->ee_block) > block_end) { |
| ext4_debug("ext4 move extent: The specified range of file " |
| "may be the hole\n"); |
| - ret = -EINVAL; |
| + ret1 = -EINVAL; |
| goto out; |
| } |
| |
| @@ -1229,7 +1289,7 @@ ext4_move_extents(struct file *o_filp, s |
| last_extent = mext_next_extent(orig_inode, holecheck_path, |
| &ext_cur); |
| if (last_extent < 0) { |
| - ret = last_extent; |
| + ret1 = last_extent; |
| break; |
| } |
| add_blocks = ext4_ext_get_actual_len(ext_cur); |
| @@ -1281,16 +1341,23 @@ ext4_move_extents(struct file *o_filp, s |
| while (orig_page_offset <= seq_end_page) { |
| |
| /* Swap original branches with new branches */ |
| - ret = move_extent_per_page(o_filp, donor_inode, |
| + ret1 = move_extent_per_page(o_filp, donor_inode, |
| orig_page_offset, |
| data_offset_in_page, |
| block_len_in_page, uninit); |
| - if (ret < 0) |
| + if (ret1 < 0) |
| goto out; |
| orig_page_offset++; |
| /* Count how many blocks we have exchanged */ |
| *moved_len += block_len_in_page; |
| - BUG_ON(*moved_len > len); |
| + if (*moved_len > len) { |
| + ext4_error(orig_inode->i_sb, __func__, |
| + "We replaced blocks too much! " |
| + "sum of replaced: %llu requested: %llu", |
| + *moved_len, len); |
| + ret1 = -EIO; |
| + goto out; |
| + } |
| |
| data_offset_in_page = 0; |
| rest_blocks -= block_len_in_page; |
| @@ -1303,7 +1370,7 @@ ext4_move_extents(struct file *o_filp, s |
| /* Decrease buffer counter */ |
| if (holecheck_path) |
| ext4_ext_drop_refs(holecheck_path); |
| - ret = get_ext_path(orig_inode, seq_start, &holecheck_path); |
| + ret1 = get_ext_path(orig_inode, seq_start, &holecheck_path); |
| if (holecheck_path == NULL) |
| break; |
| depth = holecheck_path->p_depth; |
| @@ -1311,7 +1378,7 @@ ext4_move_extents(struct file *o_filp, s |
| /* Decrease buffer counter */ |
| if (orig_path) |
| ext4_ext_drop_refs(orig_path); |
| - ret = get_ext_path(orig_inode, seq_start, &orig_path); |
| + ret1 = get_ext_path(orig_inode, seq_start, &orig_path); |
| if (orig_path == NULL) |
| break; |
| |
| @@ -1330,10 +1397,12 @@ out: |
| kfree(holecheck_path); |
| } |
| out2: |
| - mext_inode_double_unlock(orig_inode, donor_inode); |
| + ret2 = mext_inode_double_unlock(orig_inode, donor_inode); |
| |
| - if (ret) |
| - return ret; |
| + if (ret1) |
| + return ret1; |
| + else if (ret2) |
| + return ret2; |
| |
| return 0; |
| } |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:51 2009 |
| Message-Id: <20091211042751.274203312@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:12 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [34/90] ext4: Add null extent check to ext_get_path |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0034-ext4-Add-null-extent-check-to-ext_get_path.patch |
| Content-Length: 4115 |
| Lines: 142 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 347fa6f1c7cb5df2b38d3c9167cfe242ce0cd1da) |
| |
| There is the possibility that path structure which is taken |
| by ext4_ext_find_extent() indicates null extents. |
| Because during data block exchanging in ext4_move_extents(), |
| constitution of an extent tree may be changed. |
| As a solution, the patch adds null extent check |
| to ext_get_path(). |
| |
| Reported-by: Peng Tao <bergwolf@gmail.com> |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 34 ++++++++++++++++------------------ |
| 1 file changed, 16 insertions(+), 18 deletions(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -39,7 +39,9 @@ get_ext_path(struct inode *inode, ext4_l |
| if (IS_ERR(*path)) { |
| ret = PTR_ERR(*path); |
| *path = NULL; |
| - } |
| + } else if ((*path)[ext_depth(inode)].p_ext == NULL) |
| + ret = -ENODATA; |
| + |
| return ret; |
| } |
| |
| @@ -316,7 +318,7 @@ mext_insert_across_blocks(handle_t *hand |
| |
| if (new_flag) { |
| err = get_ext_path(orig_inode, eblock, &orig_path); |
| - if (orig_path == NULL) |
| + if (err) |
| goto out; |
| |
| if (ext4_ext_insert_extent(handle, orig_inode, |
| @@ -327,7 +329,7 @@ mext_insert_across_blocks(handle_t *hand |
| if (end_flag) { |
| err = get_ext_path(orig_inode, |
| le32_to_cpu(end_ext->ee_block) - 1, &orig_path); |
| - if (orig_path == NULL) |
| + if (err) |
| goto out; |
| |
| if (ext4_ext_insert_extent(handle, orig_inode, |
| @@ -673,12 +675,12 @@ mext_replace_branches(handle_t *handle, |
| |
| /* Get the original extent for the block "orig_off" */ |
| err = get_ext_path(orig_inode, orig_off, &orig_path); |
| - if (orig_path == NULL) |
| + if (err) |
| goto out; |
| |
| /* Get the donor extent for the head */ |
| err = get_ext_path(donor_inode, donor_off, &donor_path); |
| - if (donor_path == NULL) |
| + if (err) |
| goto out; |
| depth = ext_depth(orig_inode); |
| oext = orig_path[depth].p_ext; |
| @@ -733,7 +735,7 @@ mext_replace_branches(handle_t *handle, |
| if (orig_path) |
| ext4_ext_drop_refs(orig_path); |
| err = get_ext_path(orig_inode, orig_off, &orig_path); |
| - if (orig_path == NULL) |
| + if (err) |
| goto out; |
| depth = ext_depth(orig_inode); |
| oext = orig_path[depth].p_ext; |
| @@ -747,7 +749,7 @@ mext_replace_branches(handle_t *handle, |
| if (donor_path) |
| ext4_ext_drop_refs(donor_path); |
| err = get_ext_path(donor_inode, donor_off, &donor_path); |
| - if (donor_path == NULL) |
| + if (err) |
| goto out; |
| depth = ext_depth(donor_inode); |
| dext = donor_path[depth].p_ext; |
| @@ -1221,7 +1223,7 @@ ext4_move_extents(struct file *o_filp, s |
| donor_start, &len, *moved_len); |
| mext_double_up_read(orig_inode, donor_inode); |
| if (ret1) |
| - goto out2; |
| + goto out; |
| |
| file_end = (i_size_read(orig_inode) - 1) >> orig_inode->i_blkbits; |
| block_end = block_start + len - 1; |
| @@ -1229,20 +1231,16 @@ ext4_move_extents(struct file *o_filp, s |
| len -= block_end - file_end; |
| |
| ret1 = get_ext_path(orig_inode, block_start, &orig_path); |
| - if (orig_path == NULL) |
| - goto out2; |
| + if (ret1) |
| + goto out; |
| |
| /* Get path structure to check the hole */ |
| ret1 = get_ext_path(orig_inode, block_start, &holecheck_path); |
| - if (holecheck_path == NULL) |
| + if (ret1) |
| goto out; |
| |
| depth = ext_depth(orig_inode); |
| ext_cur = holecheck_path[depth].p_ext; |
| - if (ext_cur == NULL) { |
| - ret1 = -EINVAL; |
| - goto out; |
| - } |
| |
| /* |
| * Get proper extent whose ee_block is beyond block_start |
| @@ -1371,7 +1369,7 @@ ext4_move_extents(struct file *o_filp, s |
| if (holecheck_path) |
| ext4_ext_drop_refs(holecheck_path); |
| ret1 = get_ext_path(orig_inode, seq_start, &holecheck_path); |
| - if (holecheck_path == NULL) |
| + if (ret1) |
| break; |
| depth = holecheck_path->p_depth; |
| |
| @@ -1379,7 +1377,7 @@ ext4_move_extents(struct file *o_filp, s |
| if (orig_path) |
| ext4_ext_drop_refs(orig_path); |
| ret1 = get_ext_path(orig_inode, seq_start, &orig_path); |
| - if (orig_path == NULL) |
| + if (ret1) |
| break; |
| |
| ext_cur = holecheck_path[depth].p_ext; |
| @@ -1396,7 +1394,7 @@ out: |
| ext4_ext_drop_refs(holecheck_path); |
| kfree(holecheck_path); |
| } |
| -out2: |
| + |
| ret2 = mext_inode_double_unlock(orig_inode, donor_inode); |
| |
| if (ret1) |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:52 2009 |
| Message-Id: <20091211042751.864763427@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:13 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [35/90] ext4: Fix different block exchange issue in EXT4_IOC_MOVE_EXT |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0035-ext4-Fix-different-block-exchange-issue-in-EXT4_IOC_.patch |
| Content-Length: 3955 |
| Lines: 122 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit c40ce3c9ea97425a12d7e44031a98fe50add6fc1) |
| |
| If logical block offset of original file which is passed to |
| EXT4_IOC_MOVE_EXT is different from donor file's, |
| a calculation error occurs in ext4_calc_swap_extents(), |
| therefore wrong block is exchanged between original file and donor file. |
| As a result, we hit ext4_error() in check_block_validity(). |
| To detect the logical offset difference in EXT4_IOC_MOVE_EXT, |
| add checks to mext_calc_swap_extents() and handle it as error, |
| since data exchange must be done between the same blocks in EXT4_IOC_MOVE_EXT. |
| |
| Reported-by: Peng Tao <bergwolf@gmail.com> |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 46 +++++++++++++++++++++++++++++++++++++--------- |
| 1 file changed, 37 insertions(+), 9 deletions(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -597,8 +597,10 @@ out: |
| * @orig_off: block offset of original inode |
| * @donor_off: block offset of donor inode |
| * @max_count: the maximun length of extents |
| + * |
| + * Return 0 on success, or a negative error value on failure. |
| */ |
| -static void |
| +static int |
| mext_calc_swap_extents(struct ext4_extent *tmp_dext, |
| struct ext4_extent *tmp_oext, |
| ext4_lblk_t orig_off, ext4_lblk_t donor_off, |
| @@ -607,6 +609,19 @@ mext_calc_swap_extents(struct ext4_exten |
| ext4_lblk_t diff, orig_diff; |
| struct ext4_extent dext_old, oext_old; |
| |
| + BUG_ON(orig_off != donor_off); |
| + |
| + /* original and donor extents have to cover the same block offset */ |
| + if (orig_off < le32_to_cpu(tmp_oext->ee_block) || |
| + le32_to_cpu(tmp_oext->ee_block) + |
| + ext4_ext_get_actual_len(tmp_oext) - 1 < orig_off) |
| + return -ENODATA; |
| + |
| + if (orig_off < le32_to_cpu(tmp_dext->ee_block) || |
| + le32_to_cpu(tmp_dext->ee_block) + |
| + ext4_ext_get_actual_len(tmp_dext) - 1 < orig_off) |
| + return -ENODATA; |
| + |
| dext_old = *tmp_dext; |
| oext_old = *tmp_oext; |
| |
| @@ -634,6 +649,8 @@ mext_calc_swap_extents(struct ext4_exten |
| |
| copy_extent_status(&oext_old, tmp_dext); |
| copy_extent_status(&dext_old, tmp_oext); |
| + |
| + return 0; |
| } |
| |
| /** |
| @@ -690,8 +707,10 @@ mext_replace_branches(handle_t *handle, |
| dext = donor_path[depth].p_ext; |
| tmp_dext = *dext; |
| |
| - mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off, |
| + err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off, |
| donor_off, count); |
| + if (err) |
| + goto out; |
| |
| /* Loop for the donor extents */ |
| while (1) { |
| @@ -760,9 +779,10 @@ mext_replace_branches(handle_t *handle, |
| } |
| tmp_dext = *dext; |
| |
| - mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off, |
| - donor_off, |
| - count - replaced_count); |
| + err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off, |
| + donor_off, count - replaced_count); |
| + if (err) |
| + goto out; |
| } |
| |
| out: |
| @@ -1243,11 +1263,15 @@ ext4_move_extents(struct file *o_filp, s |
| ext_cur = holecheck_path[depth].p_ext; |
| |
| /* |
| - * Get proper extent whose ee_block is beyond block_start |
| - * if block_start was within the hole. |
| + * Get proper starting location of block replacement if block_start was |
| + * within the hole. |
| */ |
| if (le32_to_cpu(ext_cur->ee_block) + |
| ext4_ext_get_actual_len(ext_cur) - 1 < block_start) { |
| + /* |
| + * The hole exists between extents or the tail of |
| + * original file. |
| + */ |
| last_extent = mext_next_extent(orig_inode, |
| holecheck_path, &ext_cur); |
| if (last_extent < 0) { |
| @@ -1260,8 +1284,12 @@ ext4_move_extents(struct file *o_filp, s |
| ret1 = last_extent; |
| goto out; |
| } |
| - } |
| - seq_start = block_start; |
| + seq_start = le32_to_cpu(ext_cur->ee_block); |
| + } else if (le32_to_cpu(ext_cur->ee_block) > block_start) |
| + /* The hole exists at the beginning of original file. */ |
| + seq_start = le32_to_cpu(ext_cur->ee_block); |
| + else |
| + seq_start = block_start; |
| |
| /* No blocks within the specified range. */ |
| if (le32_to_cpu(ext_cur->ee_block) > block_end) { |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:52 2009 |
| Message-Id: <20091211042752.421711582@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:14 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Eric Sandeen <sandeen@redhat.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [36/90] ext4: limit block allocations for indirect-block files to < 2^32 |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0036-ext4-limit-block-allocations-for-indirect-block-file.patch |
| Content-Length: 5868 |
| Lines: 174 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit fb0a387dcdcd21aab1b09ee7fd80b7c979bdbbfd) |
| |
| Today, the ext4 allocator will happily allocate blocks past |
| 2^32 for indirect-block files, which results in the block |
| numbers getting truncated, and corruption ensues. |
| |
| This patch limits such allocations to < 2^32, and adds |
| BUG_ONs if we do get blocks larger than that. |
| |
| This should address RH Bug 519471, ext4 bitmap allocator |
| must limit blocks to < 2^32 |
| |
| * ext4_find_goal() is modified to choose a goal < UINT_MAX, |
| so that our starting point is in an acceptable range. |
| |
| * ext4_xattr_block_set() is modified such that the goal block |
| is < UINT_MAX, as above. |
| |
| * ext4_mb_regular_allocator() is modified so that the group |
| search does not continue into groups which are too high |
| |
| * ext4_mb_use_preallocated() has a check that we don't use |
| preallocated space which is too far out |
| |
| * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs |
| |
| No attempt has been made to limit inode locations to < 2^32, |
| so we may wind up with blocks far from their inodes. Doing |
| this much already will lead to some odd ENOSPC issues when the |
| "lower 32" gets full, and further restricting inodes could |
| make that even weirder. |
| |
| For high inodes, choosing a goal of the original, % UINT_MAX, |
| may be a bit odd, but then we're in an odd situation anyway, |
| and I don't know of a better heuristic. |
| |
| Signed-off-by: Eric Sandeen <sandeen@redhat.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4.h | 4 ++++ |
| fs/ext4/inode.c | 11 ++++++++++- |
| fs/ext4/mballoc.c | 9 +++++++++ |
| fs/ext4/super.c | 2 ++ |
| fs/ext4/xattr.c | 15 +++++++++++++-- |
| 5 files changed, 38 insertions(+), 3 deletions(-) |
| |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -388,6 +388,9 @@ struct ext4_mount_options { |
| #endif |
| }; |
| |
| +/* Max physical block we can addres w/o extents */ |
| +#define EXT4_MAX_BLOCK_FILE_PHYS 0xFFFFFFFF |
| + |
| /* |
| * Structure of an inode on the disk |
| */ |
| @@ -843,6 +846,7 @@ struct ext4_sb_info { |
| unsigned long s_gdb_count; /* Number of group descriptor blocks */ |
| unsigned long s_desc_per_block; /* Number of group descriptors per block */ |
| ext4_group_t s_groups_count; /* Number of groups in the fs */ |
| + ext4_group_t s_blockfile_groups;/* Groups acceptable for non-extent files */ |
| unsigned long s_overhead_last; /* Last calculated overhead */ |
| unsigned long s_blocks_last; /* Last seen block count */ |
| loff_t s_bitmap_maxbytes; /* max bytes for bitmap files */ |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -564,15 +564,21 @@ static ext4_fsblk_t ext4_find_near(struc |
| * |
| * Normally this function find the preferred place for block allocation, |
| * returns it. |
| + * Because this is only used for non-extent files, we limit the block nr |
| + * to 32 bits. |
| */ |
| static ext4_fsblk_t ext4_find_goal(struct inode *inode, ext4_lblk_t block, |
| Indirect *partial) |
| { |
| + ext4_fsblk_t goal; |
| + |
| /* |
| * XXX need to get goal block from mballoc's data structures |
| */ |
| |
| - return ext4_find_near(inode, partial); |
| + goal = ext4_find_near(inode, partial); |
| + goal = goal & EXT4_MAX_BLOCK_FILE_PHYS; |
| + return goal; |
| } |
| |
| /** |
| @@ -653,6 +659,8 @@ static int ext4_alloc_blocks(handle_t *h |
| if (*err) |
| goto failed_out; |
| |
| + BUG_ON(current_block + count > EXT4_MAX_BLOCK_FILE_PHYS); |
| + |
| target -= count; |
| /* allocate blocks for indirect blocks */ |
| while (index < indirect_blks && count) { |
| @@ -687,6 +695,7 @@ static int ext4_alloc_blocks(handle_t *h |
| ar.flags = EXT4_MB_HINT_DATA; |
| |
| current_block = ext4_mb_new_blocks(handle, &ar, err); |
| + BUG_ON(current_block + ar.len > EXT4_MAX_BLOCK_FILE_PHYS); |
| |
| if (*err && (target == blks)) { |
| /* |
| --- a/fs/ext4/mballoc.c |
| +++ b/fs/ext4/mballoc.c |
| @@ -1960,6 +1960,10 @@ ext4_mb_regular_allocator(struct ext4_al |
| sb = ac->ac_sb; |
| sbi = EXT4_SB(sb); |
| ngroups = ext4_get_groups_count(sb); |
| + /* non-extent files are limited to low blocks/groups */ |
| + if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL)) |
| + ngroups = sbi->s_blockfile_groups; |
| + |
| BUG_ON(ac->ac_status == AC_STATUS_FOUND); |
| |
| /* first, try the goal */ |
| @@ -3355,6 +3359,11 @@ ext4_mb_use_preallocated(struct ext4_all |
| ac->ac_o_ex.fe_logical >= pa->pa_lstart + pa->pa_len) |
| continue; |
| |
| + /* non-extent files can't have physical blocks past 2^32 */ |
| + if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL) && |
| + pa->pa_pstart + pa->pa_len > EXT4_MAX_BLOCK_FILE_PHYS) |
| + continue; |
| + |
| /* found preallocated blocks, use them */ |
| spin_lock(&pa->pa_lock); |
| if (pa->pa_deleted == 0 && pa->pa_free) { |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -2618,6 +2618,8 @@ static int ext4_fill_super(struct super_ |
| goto failed_mount; |
| } |
| sbi->s_groups_count = blocks_count; |
| + sbi->s_blockfile_groups = min_t(ext4_group_t, sbi->s_groups_count, |
| + (EXT4_MAX_BLOCK_FILE_PHYS / EXT4_BLOCKS_PER_GROUP(sb))); |
| db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) / |
| EXT4_DESC_PER_BLOCK(sb); |
| sbi->s_group_desc = kmalloc(db_count * sizeof(struct buffer_head *), |
| --- a/fs/ext4/xattr.c |
| +++ b/fs/ext4/xattr.c |
| @@ -810,12 +810,23 @@ inserted: |
| get_bh(new_bh); |
| } else { |
| /* We need to allocate a new block */ |
| - ext4_fsblk_t goal = ext4_group_first_block_no(sb, |
| + ext4_fsblk_t goal, block; |
| + |
| + goal = ext4_group_first_block_no(sb, |
| EXT4_I(inode)->i_block_group); |
| - ext4_fsblk_t block = ext4_new_meta_blocks(handle, inode, |
| + |
| + /* non-extent files can't have physical blocks past 2^32 */ |
| + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) |
| + goal = goal & EXT4_MAX_BLOCK_FILE_PHYS; |
| + |
| + block = ext4_new_meta_blocks(handle, inode, |
| goal, NULL, &error); |
| if (error) |
| goto cleanup; |
| + |
| + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) |
| + BUG_ON(block > EXT4_MAX_BLOCK_FILE_PHYS); |
| + |
| ea_idebug(inode, "creating block %d", block); |
| |
| new_bh = sb_getblk(sb, block); |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:53 2009 |
| Message-Id: <20091211042753.004509392@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:15 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [37/90] ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flags |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0037-ext4-store-EXT4_EXT_MIGRATE-in-i_state-instead-of-i_.patch |
| Content-Length: 4204 |
| Lines: 103 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 1b9c12f44c1eb614fd3b8822bfe8f1f5d8e53737) |
| |
| EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag, |
| and the hex value assigned to it collides with FS_DIRECTIO_FL (which |
| is also stored in i_flags). There's no reason for the |
| EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use |
| i_state instead. |
| |
| Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4.h | 2 +- |
| fs/ext4/inode.c | 6 ++---- |
| fs/ext4/migrate.c | 20 ++++++++++---------- |
| 3 files changed, 13 insertions(+), 15 deletions(-) |
| |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -253,7 +253,6 @@ struct flex_groups { |
| #define EXT4_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ |
| #define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */ |
| #define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */ |
| -#define EXT4_EXT_MIGRATE 0x00100000 /* Inode is migrating */ |
| #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ |
| |
| #define EXT4_FL_USER_VISIBLE 0x000BDFFF /* User visible flags */ |
| @@ -291,6 +290,7 @@ static inline __u32 ext4_mask_flags(umod |
| #define EXT4_STATE_XATTR 0x00000004 /* has in-inode xattrs */ |
| #define EXT4_STATE_NO_EXPAND 0x00000008 /* No space for expansion */ |
| #define EXT4_STATE_DA_ALLOC_CLOSE 0x00000010 /* Alloc DA blks on close */ |
| +#define EXT4_STATE_EXT_MIGRATE 0x00000020 /* Inode is migrating */ |
| |
| /* Used to pass group descriptor data when online resize is done */ |
| struct ext4_new_group_input { |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -1256,8 +1256,7 @@ int ext4_get_blocks(handle_t *handle, st |
| * i_data's format changing. Force the migrate |
| * to fail by clearing migrate flags |
| */ |
| - EXT4_I(inode)->i_flags = EXT4_I(inode)->i_flags & |
| - ~EXT4_EXT_MIGRATE; |
| + EXT4_I(inode)->i_state &= ~EXT4_STATE_EXT_MIGRATE; |
| } |
| } |
| |
| @@ -4608,8 +4607,7 @@ static int ext4_do_update_inode(handle_t |
| if (ext4_inode_blocks_set(handle, raw_inode, ei)) |
| goto out_brelse; |
| raw_inode->i_dtime = cpu_to_le32(ei->i_dtime); |
| - /* clear the migrate flag in the raw_inode */ |
| - raw_inode->i_flags = cpu_to_le32(ei->i_flags & ~EXT4_EXT_MIGRATE); |
| + raw_inode->i_flags = cpu_to_le32(ei->i_flags); |
| if (EXT4_SB(inode->i_sb)->s_es->s_creator_os != |
| cpu_to_le32(EXT4_OS_HURD)) |
| raw_inode->i_file_acl_high = |
| --- a/fs/ext4/migrate.c |
| +++ b/fs/ext4/migrate.c |
| @@ -353,17 +353,16 @@ static int ext4_ext_swap_inode_data(hand |
| |
| down_write(&EXT4_I(inode)->i_data_sem); |
| /* |
| - * if EXT4_EXT_MIGRATE is cleared a block allocation |
| + * if EXT4_STATE_EXT_MIGRATE is cleared a block allocation |
| * happened after we started the migrate. We need to |
| * fail the migrate |
| */ |
| - if (!(EXT4_I(inode)->i_flags & EXT4_EXT_MIGRATE)) { |
| + if (!(EXT4_I(inode)->i_state & EXT4_STATE_EXT_MIGRATE)) { |
| retval = -EAGAIN; |
| up_write(&EXT4_I(inode)->i_data_sem); |
| goto err_out; |
| } else |
| - EXT4_I(inode)->i_flags = EXT4_I(inode)->i_flags & |
| - ~EXT4_EXT_MIGRATE; |
| + EXT4_I(inode)->i_state &= ~EXT4_STATE_EXT_MIGRATE; |
| /* |
| * We have the extent map build with the tmp inode. |
| * Now copy the i_data across |
| @@ -517,14 +516,15 @@ int ext4_ext_migrate(struct inode *inode |
| * when we add extents we extent the journal |
| */ |
| /* |
| - * Even though we take i_mutex we can still cause block allocation |
| - * via mmap write to holes. If we have allocated new blocks we fail |
| - * migrate. New block allocation will clear EXT4_EXT_MIGRATE flag. |
| - * The flag is updated with i_data_sem held to prevent racing with |
| - * block allocation. |
| + * Even though we take i_mutex we can still cause block |
| + * allocation via mmap write to holes. If we have allocated |
| + * new blocks we fail migrate. New block allocation will |
| + * clear EXT4_STATE_EXT_MIGRATE flag. The flag is updated |
| + * with i_data_sem held to prevent racing with block |
| + * allocation. |
| */ |
| down_read((&EXT4_I(inode)->i_data_sem)); |
| - EXT4_I(inode)->i_flags = EXT4_I(inode)->i_flags | EXT4_EXT_MIGRATE; |
| + EXT4_I(inode)->i_state |= EXT4_STATE_EXT_MIGRATE; |
| up_read((&EXT4_I(inode)->i_data_sem)); |
| |
| handle = ext4_journal_start(inode, 1); |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:54 2009 |
| Message-Id: <20091211042753.567755833@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:16 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [38/90] ext4: Fix the alloc on close after a truncate hueristic |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0038-ext4-Fix-the-alloc-on-close-after-a-truncate-huerist.patch |
| Content-Length: 1235 |
| Lines: 33 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 5534fb5bb35a62a94e0bd1fa2421f7fb6e894f10) |
| |
| In an attempt to avoid doing an unneeded flush after opening a |
| (previously non-existent) file with O_CREAT|O_TRUNC, the code only |
| triggered the hueristic if ei->disksize was non-zero. Turns out that |
| the VFS doesn't call ->truncate() if the file doesn't exist, and |
| ei->disksize is always zero even if the file previously existed. So |
| remove the test, since it isn't necessary and in fact disabled the |
| hueristic. |
| |
| Thanks to Clemens Eisserer that he was seeing problems with files |
| written using kwrite and eclipse after sudden crashes caused by a |
| buggy Intel video driver. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 3 +-- |
| 1 file changed, 1 insertion(+), 2 deletions(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -3983,8 +3983,7 @@ void ext4_truncate(struct inode *inode) |
| if (!ext4_can_truncate(inode)) |
| return; |
| |
| - if (ei->i_disksize && inode->i_size == 0 && |
| - !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC)) |
| + if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC)) |
| ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE; |
| |
| if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) { |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:54 2009 |
| Message-Id: <20091211042754.189709948@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:17 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [39/90] ext4: Fix hueristic which avoids group preallocation for closed files |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0039-ext4-Fix-hueristic-which-avoids-group-preallocation-.patch |
| Content-Length: 1048 |
| Lines: 32 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 71780577306fd1e76c7a92e3b308db624d03adb9) |
| |
| The hueristic was designed to avoid using locality group preallocation |
| when writing the last segment of a closed file. Fix it by move |
| setting size to the maximum of size and isize until after we check |
| whether size == isize. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/mballoc.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/ext4/mballoc.c |
| +++ b/fs/ext4/mballoc.c |
| @@ -4162,7 +4162,6 @@ static void ext4_mb_group_or_file(struct |
| size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len; |
| isize = (i_size_read(ac->ac_inode) + ac->ac_sb->s_blocksize - 1) |
| >> bsbits; |
| - size = max(size, isize); |
| |
| if ((size == isize) && |
| !ext4_fs_is_busy(sbi) && |
| @@ -4172,6 +4171,7 @@ static void ext4_mb_group_or_file(struct |
| } |
| |
| /* don't use group allocation for large files */ |
| + size = max(size, isize); |
| if (size >= sbi->s_mb_stream_request) { |
| ac->ac_flags |= EXT4_MB_STREAM_ALLOC; |
| return; |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:55 2009 |
| Message-Id: <20091211042754.655932661@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:18 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [40/90] ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0040-ext4-Adjust-ext4_da_writepages-to-write-out-larger-c.patch |
| Content-Length: 11442 |
| Lines: 340 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 55138e0bc29c0751e2152df9ad35deea542f29b3) |
| |
| Work around problems in the writeback code to force out writebacks in |
| larger chunks than just 4mb, which is just too small. This also works |
| around limitations in the ext4 block allocator, which can't allocate |
| more than 2048 blocks at a time. So we need to defeat the round-robin |
| characteristics of the writeback code and try to write out as many |
| blocks in one inode before allowing the writeback code to move on to |
| another inode. We add a a new per-filesystem tunable, |
| max_writeback_mb_bump, which caps this to a default of 128mb per |
| inode. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4.h | 17 ++++++ |
| fs/ext4/inode.c | 121 +++++++++++++++++++++++++++++++++----------- |
| fs/ext4/super.c | 3 + |
| include/trace/events/ext4.h | 54 +++++++++++++++++-- |
| 4 files changed, 161 insertions(+), 34 deletions(-) |
| |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -114,6 +114,22 @@ struct ext4_allocation_request { |
| }; |
| |
| /* |
| + * Delayed allocation stuff |
| + */ |
| + |
| +struct mpage_da_data { |
| + struct inode *inode; |
| + sector_t b_blocknr; /* start block number of extent */ |
| + size_t b_size; /* size of extent */ |
| + unsigned long b_state; /* state of the extent */ |
| + unsigned long first_page, next_page; /* extent of pages */ |
| + struct writeback_control *wbc; |
| + int io_done; |
| + int pages_written; |
| + int retval; |
| +}; |
| + |
| +/* |
| * Special inodes numbers |
| */ |
| #define EXT4_BAD_INO 1 /* Bad blocks inode */ |
| @@ -929,6 +945,7 @@ struct ext4_sb_info { |
| unsigned int s_mb_stats; |
| unsigned int s_mb_order2_reqs; |
| unsigned int s_mb_group_prealloc; |
| + unsigned int s_max_writeback_mb_bump; |
| /* where last allocation was done - for stream allocation */ |
| unsigned long s_mb_last_group; |
| unsigned long s_mb_last_start; |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -1146,6 +1146,64 @@ static int check_block_validity(struct i |
| } |
| |
| /* |
| + * Return the number of dirty pages in the given inode starting at |
| + * page frame idx. |
| + */ |
| +static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx, |
| + unsigned int max_pages) |
| +{ |
| + struct address_space *mapping = inode->i_mapping; |
| + pgoff_t index; |
| + struct pagevec pvec; |
| + pgoff_t num = 0; |
| + int i, nr_pages, done = 0; |
| + |
| + if (max_pages == 0) |
| + return 0; |
| + pagevec_init(&pvec, 0); |
| + while (!done) { |
| + index = idx; |
| + nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, |
| + PAGECACHE_TAG_DIRTY, |
| + (pgoff_t)PAGEVEC_SIZE); |
| + if (nr_pages == 0) |
| + break; |
| + for (i = 0; i < nr_pages; i++) { |
| + struct page *page = pvec.pages[i]; |
| + struct buffer_head *bh, *head; |
| + |
| + lock_page(page); |
| + if (unlikely(page->mapping != mapping) || |
| + !PageDirty(page) || |
| + PageWriteback(page) || |
| + page->index != idx) { |
| + done = 1; |
| + unlock_page(page); |
| + break; |
| + } |
| + head = page_buffers(page); |
| + bh = head; |
| + do { |
| + if (!buffer_delay(bh) && |
| + !buffer_unwritten(bh)) { |
| + done = 1; |
| + break; |
| + } |
| + } while ((bh = bh->b_this_page) != head); |
| + unlock_page(page); |
| + if (done) |
| + break; |
| + idx++; |
| + num++; |
| + if (num >= max_pages) |
| + break; |
| + } |
| + pagevec_release(&pvec); |
| + } |
| + return num; |
| +} |
| + |
| +/* |
| * The ext4_get_blocks() function tries to look up the requested blocks, |
| * and returns if the blocks are already mapped. |
| * |
| @@ -1881,22 +1939,6 @@ static void ext4_da_page_release_reserva |
| } |
| |
| /* |
| - * Delayed allocation stuff |
| - */ |
| - |
| -struct mpage_da_data { |
| - struct inode *inode; |
| - sector_t b_blocknr; /* start block number of extent */ |
| - size_t b_size; /* size of extent */ |
| - unsigned long b_state; /* state of the extent */ |
| - unsigned long first_page, next_page; /* extent of pages */ |
| - struct writeback_control *wbc; |
| - int io_done; |
| - int pages_written; |
| - int retval; |
| -}; |
| - |
| -/* |
| * mpage_da_submit_io - walks through extent of pages and try to write |
| * them with writepage() call back |
| * |
| @@ -2756,8 +2798,10 @@ static int ext4_da_writepages(struct add |
| int no_nrwrite_index_update; |
| int pages_written = 0; |
| long pages_skipped; |
| + unsigned int max_pages; |
| int range_cyclic, cycled = 1, io_done = 0; |
| - int needed_blocks, ret = 0, nr_to_writebump = 0; |
| + int needed_blocks, ret = 0; |
| + long desired_nr_to_write, nr_to_writebump = 0; |
| loff_t range_start = wbc->range_start; |
| struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb); |
| |
| @@ -2784,16 +2828,6 @@ static int ext4_da_writepages(struct add |
| if (unlikely(sbi->s_mount_flags & EXT4_MF_FS_ABORTED)) |
| return -EROFS; |
| |
| - /* |
| - * Make sure nr_to_write is >= sbi->s_mb_stream_request |
| - * This make sure small files blocks are allocated in |
| - * single attempt. This ensure that small files |
| - * get less fragmented. |
| - */ |
| - if (wbc->nr_to_write < sbi->s_mb_stream_request) { |
| - nr_to_writebump = sbi->s_mb_stream_request - wbc->nr_to_write; |
| - wbc->nr_to_write = sbi->s_mb_stream_request; |
| - } |
| if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) |
| range_whole = 1; |
| |
| @@ -2808,6 +2842,36 @@ static int ext4_da_writepages(struct add |
| } else |
| index = wbc->range_start >> PAGE_CACHE_SHIFT; |
| |
| + /* |
| + * This works around two forms of stupidity. The first is in |
| + * the writeback code, which caps the maximum number of pages |
| + * written to be 1024 pages. This is wrong on multiple |
| + * levels; different architectues have a different page size, |
| + * which changes the maximum amount of data which gets |
| + * written. Secondly, 4 megabytes is way too small. XFS |
| + * forces this value to be 16 megabytes by multiplying |
| + * nr_to_write parameter by four, and then relies on its |
| + * allocator to allocate larger extents to make them |
| + * contiguous. Unfortunately this brings us to the second |
| + * stupidity, which is that ext4's mballoc code only allocates |
| + * at most 2048 blocks. So we force contiguous writes up to |
| + * the number of dirty blocks in the inode, or |
| + * sbi->max_writeback_mb_bump whichever is smaller. |
| + */ |
| + max_pages = sbi->s_max_writeback_mb_bump << (20 - PAGE_CACHE_SHIFT); |
| + if (!range_cyclic && range_whole) |
| + desired_nr_to_write = wbc->nr_to_write * 8; |
| + else |
| + desired_nr_to_write = ext4_num_dirty_pages(inode, index, |
| + max_pages); |
| + if (desired_nr_to_write > max_pages) |
| + desired_nr_to_write = max_pages; |
| + |
| + if (wbc->nr_to_write < desired_nr_to_write) { |
| + nr_to_writebump = desired_nr_to_write - wbc->nr_to_write; |
| + wbc->nr_to_write = desired_nr_to_write; |
| + } |
| + |
| mpd.wbc = wbc; |
| mpd.inode = mapping->host; |
| |
| @@ -2926,7 +2990,8 @@ retry: |
| out_writepages: |
| if (!no_nrwrite_index_update) |
| wbc->no_nrwrite_index_update = 0; |
| - wbc->nr_to_write -= nr_to_writebump; |
| + if (wbc->nr_to_write > nr_to_writebump) |
| + wbc->nr_to_write -= nr_to_writebump; |
| wbc->range_start = range_start; |
| trace_ext4_da_writepages_result(inode, wbc, ret, pages_written); |
| return ret; |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -2199,6 +2199,7 @@ EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb |
| EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs); |
| EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request); |
| EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc); |
| +EXT4_RW_ATTR_SBI_UI(max_writeback_mb_bump, s_max_writeback_mb_bump); |
| |
| static struct attribute *ext4_attrs[] = { |
| ATTR_LIST(delayed_allocation_blocks), |
| @@ -2212,6 +2213,7 @@ static struct attribute *ext4_attrs[] = |
| ATTR_LIST(mb_order2_req), |
| ATTR_LIST(mb_stream_req), |
| ATTR_LIST(mb_group_prealloc), |
| + ATTR_LIST(max_writeback_mb_bump), |
| NULL, |
| }; |
| |
| @@ -2681,6 +2683,7 @@ static int ext4_fill_super(struct super_ |
| } |
| |
| sbi->s_stripe = ext4_get_stripe_size(sbi); |
| + sbi->s_max_writeback_mb_bump = 128; |
| |
| /* |
| * set up enough so that it can read an inode |
| --- a/include/trace/events/ext4.h |
| +++ b/include/trace/events/ext4.h |
| @@ -231,6 +231,7 @@ TRACE_EVENT(ext4_da_writepages, |
| __field( char, for_reclaim ) |
| __field( char, for_writepages ) |
| __field( char, range_cyclic ) |
| + __field( pgoff_t, writeback_index ) |
| ), |
| |
| TP_fast_assign( |
| @@ -245,14 +246,51 @@ TRACE_EVENT(ext4_da_writepages, |
| __entry->for_reclaim = wbc->for_reclaim; |
| __entry->for_writepages = wbc->for_writepages; |
| __entry->range_cyclic = wbc->range_cyclic; |
| + __entry->writeback_index = inode->i_mapping->writeback_index; |
| ), |
| |
| - TP_printk("dev %s ino %lu nr_t_write %ld pages_skipped %ld range_start %llu range_end %llu nonblocking %d for_kupdate %d for_reclaim %d for_writepages %d range_cyclic %d", |
| - jbd2_dev_to_name(__entry->dev), __entry->ino, __entry->nr_to_write, |
| + TP_printk("dev %s ino %lu nr_to_write %ld pages_skipped %ld range_start %llu range_end %llu nonblocking %d for_kupdate %d for_reclaim %d for_writepages %d range_cyclic %d writeback_index %lu", |
| + jbd2_dev_to_name(__entry->dev), |
| + (unsigned long) __entry->ino, __entry->nr_to_write, |
| __entry->pages_skipped, __entry->range_start, |
| __entry->range_end, __entry->nonblocking, |
| __entry->for_kupdate, __entry->for_reclaim, |
| - __entry->for_writepages, __entry->range_cyclic) |
| + __entry->for_writepages, __entry->range_cyclic, |
| + (unsigned long) __entry->writeback_index) |
| +); |
| + |
| +TRACE_EVENT(ext4_da_write_pages, |
| + TP_PROTO(struct inode *inode, struct mpage_da_data *mpd), |
| + |
| + TP_ARGS(inode, mpd), |
| + |
| + TP_STRUCT__entry( |
| + __field( dev_t, dev ) |
| + __field( ino_t, ino ) |
| + __field( __u64, b_blocknr ) |
| + __field( __u32, b_size ) |
| + __field( __u32, b_state ) |
| + __field( unsigned long, first_page ) |
| + __field( int, io_done ) |
| + __field( int, pages_written ) |
| + ), |
| + |
| + TP_fast_assign( |
| + __entry->dev = inode->i_sb->s_dev; |
| + __entry->ino = inode->i_ino; |
| + __entry->b_blocknr = mpd->b_blocknr; |
| + __entry->b_size = mpd->b_size; |
| + __entry->b_state = mpd->b_state; |
| + __entry->first_page = mpd->first_page; |
| + __entry->io_done = mpd->io_done; |
| + __entry->pages_written = mpd->pages_written; |
| + ), |
| + |
| + TP_printk("dev %s ino %lu b_blocknr %llu b_size %u b_state 0x%04x first_page %lu io_done %d pages_written %d", |
| + jbd2_dev_to_name(__entry->dev), (unsigned long) __entry->ino, |
| + __entry->b_blocknr, __entry->b_size, |
| + __entry->b_state, __entry->first_page, |
| + __entry->io_done, __entry->pages_written) |
| ); |
| |
| TRACE_EVENT(ext4_da_writepages_result, |
| @@ -270,6 +308,7 @@ TRACE_EVENT(ext4_da_writepages_result, |
| __field( char, encountered_congestion ) |
| __field( char, more_io ) |
| __field( char, no_nrwrite_index_update ) |
| + __field( pgoff_t, writeback_index ) |
| ), |
| |
| TP_fast_assign( |
| @@ -281,13 +320,16 @@ TRACE_EVENT(ext4_da_writepages_result, |
| __entry->encountered_congestion = wbc->encountered_congestion; |
| __entry->more_io = wbc->more_io; |
| __entry->no_nrwrite_index_update = wbc->no_nrwrite_index_update; |
| + __entry->writeback_index = inode->i_mapping->writeback_index; |
| ), |
| |
| - TP_printk("dev %s ino %lu ret %d pages_written %d pages_skipped %ld congestion %d more_io %d no_nrwrite_index_update %d", |
| - jbd2_dev_to_name(__entry->dev), __entry->ino, __entry->ret, |
| + TP_printk("dev %s ino %lu ret %d pages_written %d pages_skipped %ld congestion %d more_io %d no_nrwrite_index_update %d writeback_index %lu", |
| + jbd2_dev_to_name(__entry->dev), |
| + (unsigned long) __entry->ino, __entry->ret, |
| __entry->pages_written, __entry->pages_skipped, |
| __entry->encountered_congestion, __entry->more_io, |
| - __entry->no_nrwrite_index_update) |
| + __entry->no_nrwrite_index_update, |
| + (unsigned long) __entry->writeback_index) |
| ); |
| |
| TRACE_EVENT(ext4_da_write_begin, |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:55 2009 |
| Message-Id: <20091211042755.220900196@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:19 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Mingming Cao <cmm@us.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [41/90] ext4: release reserved quota when block reservation for delalloc retry |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0041-ext4-release-reserved-quota-when-block-reservation-f.patch |
| Content-Length: 953 |
| Lines: 31 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 9f0ccfd8e07d61b413e6536ffa02fbf60d2e20d8) |
| |
| ext4_da_reserve_space() can reserve quota blocks multiple times if |
| ext4_claim_free_blocks() fail and we retry the allocation. We should |
| release the quota reservation before restarting. |
| |
| Bug found by Jan Kara. |
| |
| Signed-off-by: Mingming Cao <cmm@us.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -1855,11 +1855,11 @@ repeat: |
| |
| if (ext4_claim_free_blocks(sbi, total)) { |
| spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); |
| + vfs_dq_release_reservation_block(inode, total); |
| if (ext4_should_retry_alloc(inode->i_sb, &retries)) { |
| yield(); |
| goto repeat; |
| } |
| - vfs_dq_release_reservation_block(inode, total); |
| return -ENOSPC; |
| } |
| EXT4_I(inode)->i_reserved_data_blocks += nrblocks; |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:56 2009 |
| Message-Id: <20091211042755.790514342@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:20 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [42/90] ext4: Split uninitialized extents for direct I/O |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0042-ext4-Split-uninitialized-extents-for-direct-I-O.patch |
| Content-Length: 21012 |
| Lines: 650 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 0031462b5b392f90d17f1d75abb795883c44e969) |
| |
| When writing into an unitialized extent via direct I/O, and the direct |
| I/O doesn't exactly cover the unitialized extent, split the extent |
| into uninitialized and initialized extents before submitting the I/O. |
| This avoids needing to deal with an ENOSPC error in the end_io |
| callback that gets used for direct I/O. |
| |
| When the IO is complete, the written extent will be marked as initialized. |
| |
| Singed-Off-By: Mingming Cao <cmm@us.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4.h | 22 ++ |
| fs/ext4/ext4_extents.h | 7 |
| fs/ext4/extents.c | 423 ++++++++++++++++++++++++++++++++++++++++++++----- |
| fs/ext4/inode.c | 3 |
| fs/ext4/migrate.c | 2 |
| fs/ext4/move_extent.c | 4 |
| 6 files changed, 419 insertions(+), 42 deletions(-) |
| |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -113,6 +113,15 @@ struct ext4_allocation_request { |
| unsigned int flags; |
| }; |
| |
| +typedef struct ext4_io_end { |
| + struct inode *inode; /* file being written to */ |
| + unsigned int flag; /* sync IO or AIO */ |
| + int error; /* I/O error code */ |
| + ext4_lblk_t offset; /* offset in the file */ |
| + size_t size; /* size of the extent */ |
| + struct work_struct work; /* data work queue */ |
| +} ext4_io_end_t; |
| + |
| /* |
| * Delayed allocation stuff |
| */ |
| @@ -348,7 +357,16 @@ struct ext4_new_group_data { |
| /* Call ext4_da_update_reserve_space() after successfully |
| allocating the blocks */ |
| #define EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE 0x0008 |
| - |
| + /* caller is from the direct IO path, request to creation of an |
| + unitialized extents if not allocated, split the uninitialized |
| + extent if blocks has been preallocated already*/ |
| +#define EXT4_GET_BLOCKS_DIO 0x0010 |
| +#define EXT4_GET_BLOCKS_CONVERT 0x0020 |
| +#define EXT4_GET_BLOCKS_DIO_CREATE_EXT (EXT4_GET_BLOCKS_DIO|\ |
| + EXT4_GET_BLOCKS_CREATE_UNINIT_EXT) |
| + /* Convert extent to initialized after direct IO complete */ |
| +#define EXT4_GET_BLOCKS_DIO_CONVERT_EXT (EXT4_GET_BLOCKS_CONVERT|\ |
| + EXT4_GET_BLOCKS_DIO_CREATE_EXT) |
| |
| /* |
| * ioctl commands |
| @@ -1702,6 +1720,8 @@ extern void ext4_ext_init(struct super_b |
| extern void ext4_ext_release(struct super_block *); |
| extern long ext4_fallocate(struct inode *inode, int mode, loff_t offset, |
| loff_t len); |
| +extern int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset, |
| + loff_t len); |
| extern int ext4_get_blocks(handle_t *handle, struct inode *inode, |
| sector_t block, unsigned int max_blocks, |
| struct buffer_head *bh, int flags); |
| --- a/fs/ext4/ext4_extents.h |
| +++ b/fs/ext4/ext4_extents.h |
| @@ -220,6 +220,11 @@ static inline int ext4_ext_get_actual_le |
| (le16_to_cpu(ext->ee_len) - EXT_INIT_MAX_LEN)); |
| } |
| |
| +static inline void ext4_ext_mark_initialized(struct ext4_extent *ext) |
| +{ |
| + ext->ee_len = cpu_to_le16(ext4_ext_get_actual_len(ext)); |
| +} |
| + |
| extern int ext4_ext_calc_metadata_amount(struct inode *inode, int blocks); |
| extern ext4_fsblk_t ext_pblock(struct ext4_extent *ex); |
| extern ext4_fsblk_t idx_pblock(struct ext4_extent_idx *); |
| @@ -235,7 +240,7 @@ extern int ext4_ext_try_to_merge(struct |
| struct ext4_ext_path *path, |
| struct ext4_extent *); |
| extern unsigned int ext4_ext_check_overlap(struct inode *, struct ext4_extent *, struct ext4_ext_path *); |
| -extern int ext4_ext_insert_extent(handle_t *, struct inode *, struct ext4_ext_path *, struct ext4_extent *); |
| +extern int ext4_ext_insert_extent(handle_t *, struct inode *, struct ext4_ext_path *, struct ext4_extent *, int); |
| extern int ext4_ext_walk_space(struct inode *, ext4_lblk_t, ext4_lblk_t, |
| ext_prepare_callback, void *); |
| extern struct ext4_ext_path *ext4_ext_find_extent(struct inode *, ext4_lblk_t, |
| --- a/fs/ext4/extents.c |
| +++ b/fs/ext4/extents.c |
| @@ -710,7 +710,7 @@ err: |
| * insert new index [@logical;@ptr] into the block at @curp; |
| * check where to insert: before @curp or after @curp |
| */ |
| -static int ext4_ext_insert_index(handle_t *handle, struct inode *inode, |
| +int ext4_ext_insert_index(handle_t *handle, struct inode *inode, |
| struct ext4_ext_path *curp, |
| int logical, ext4_fsblk_t ptr) |
| { |
| @@ -1572,7 +1572,7 @@ out: |
| */ |
| int ext4_ext_insert_extent(handle_t *handle, struct inode *inode, |
| struct ext4_ext_path *path, |
| - struct ext4_extent *newext) |
| + struct ext4_extent *newext, int flag) |
| { |
| struct ext4_extent_header *eh; |
| struct ext4_extent *ex, *fex; |
| @@ -1588,7 +1588,8 @@ int ext4_ext_insert_extent(handle_t *han |
| BUG_ON(path[depth].p_hdr == NULL); |
| |
| /* try to insert block into found extent and return */ |
| - if (ex && ext4_can_extents_be_merged(inode, ex, newext)) { |
| + if (ex && (flag != EXT4_GET_BLOCKS_DIO_CREATE_EXT) |
| + && ext4_can_extents_be_merged(inode, ex, newext)) { |
| ext_debug("append %d block to %d:%d (from %llu)\n", |
| ext4_ext_get_actual_len(newext), |
| le32_to_cpu(ex->ee_block), |
| @@ -1703,7 +1704,8 @@ has_space: |
| |
| merge: |
| /* try to merge extents to the right */ |
| - ext4_ext_try_to_merge(inode, path, nearex); |
| + if (flag != EXT4_GET_BLOCKS_DIO_CREATE_EXT) |
| + ext4_ext_try_to_merge(inode, path, nearex); |
| |
| /* try to merge extents to the left */ |
| |
| @@ -2470,7 +2472,6 @@ static int ext4_ext_zeroout(struct inode |
| } |
| |
| #define EXT4_EXT_ZERO_LEN 7 |
| - |
| /* |
| * This function is called by ext4_ext_get_blocks() if someone tries to write |
| * to an uninitialized extent. It may result in splitting the uninitialized |
| @@ -2563,7 +2564,8 @@ static int ext4_ext_convert_to_initializ |
| ex3->ee_block = cpu_to_le32(iblock); |
| ext4_ext_store_pblock(ex3, newblock); |
| ex3->ee_len = cpu_to_le16(allocated); |
| - err = ext4_ext_insert_extent(handle, inode, path, ex3); |
| + err = ext4_ext_insert_extent(handle, inode, path, |
| + ex3, 0); |
| if (err == -ENOSPC) { |
| err = ext4_ext_zeroout(inode, &orig_ex); |
| if (err) |
| @@ -2619,7 +2621,7 @@ static int ext4_ext_convert_to_initializ |
| ext4_ext_store_pblock(ex3, newblock + max_blocks); |
| ex3->ee_len = cpu_to_le16(allocated - max_blocks); |
| ext4_ext_mark_uninitialized(ex3); |
| - err = ext4_ext_insert_extent(handle, inode, path, ex3); |
| + err = ext4_ext_insert_extent(handle, inode, path, ex3, 0); |
| if (err == -ENOSPC) { |
| err = ext4_ext_zeroout(inode, &orig_ex); |
| if (err) |
| @@ -2737,7 +2739,7 @@ static int ext4_ext_convert_to_initializ |
| err = ext4_ext_dirty(handle, inode, path + depth); |
| goto out; |
| insert: |
| - err = ext4_ext_insert_extent(handle, inode, path, &newex); |
| + err = ext4_ext_insert_extent(handle, inode, path, &newex, 0); |
| if (err == -ENOSPC) { |
| err = ext4_ext_zeroout(inode, &orig_ex); |
| if (err) |
| @@ -2764,6 +2766,320 @@ fix_extent_len: |
| } |
| |
| /* |
| + * This function is called by ext4_ext_get_blocks() from |
| + * ext4_get_blocks_dio_write() when DIO to write |
| + * to an uninitialized extent. |
| + * |
| + * Writing to an uninitized extent may result in splitting the uninitialized |
| + * extent into multiple /intialized unintialized extents (up to three) |
| + * There are three possibilities: |
| + * a> There is no split required: Entire extent should be uninitialized |
| + * b> Splits in two extents: Write is happening at either end of the extent |
| + * c> Splits in three extents: Somone is writing in middle of the extent |
| + * |
| + * One of more index blocks maybe needed if the extent tree grow after |
| + * the unintialized extent split. To prevent ENOSPC occur at the IO |
| + * complete, we need to split the uninitialized extent before DIO submit |
| + * the IO. The uninitilized extent called at this time will be split |
| + * into three uninitialized extent(at most). After IO complete, the part |
| + * being filled will be convert to initialized by the end_io callback function |
| + * via ext4_convert_unwritten_extents(). |
| + */ |
| +static int ext4_split_unwritten_extents(handle_t *handle, |
| + struct inode *inode, |
| + struct ext4_ext_path *path, |
| + ext4_lblk_t iblock, |
| + unsigned int max_blocks, |
| + int flags) |
| +{ |
| + struct ext4_extent *ex, newex, orig_ex; |
| + struct ext4_extent *ex1 = NULL; |
| + struct ext4_extent *ex2 = NULL; |
| + struct ext4_extent *ex3 = NULL; |
| + struct ext4_extent_header *eh; |
| + ext4_lblk_t ee_block; |
| + unsigned int allocated, ee_len, depth; |
| + ext4_fsblk_t newblock; |
| + int err = 0; |
| + int ret = 0; |
| + |
| + ext_debug("ext4_split_unwritten_extents: inode %lu," |
| + "iblock %llu, max_blocks %u\n", inode->i_ino, |
| + (unsigned long long)iblock, max_blocks); |
| + depth = ext_depth(inode); |
| + eh = path[depth].p_hdr; |
| + ex = path[depth].p_ext; |
| + ee_block = le32_to_cpu(ex->ee_block); |
| + ee_len = ext4_ext_get_actual_len(ex); |
| + allocated = ee_len - (iblock - ee_block); |
| + newblock = iblock - ee_block + ext_pblock(ex); |
| + ex2 = ex; |
| + orig_ex.ee_block = ex->ee_block; |
| + orig_ex.ee_len = cpu_to_le16(ee_len); |
| + ext4_ext_store_pblock(&orig_ex, ext_pblock(ex)); |
| + |
| + /* |
| + * if the entire unintialized extent length less than |
| + * the size of extent to write, there is no need to split |
| + * uninitialized extent |
| + */ |
| + if (allocated <= max_blocks) |
| + return ret; |
| + |
| + err = ext4_ext_get_access(handle, inode, path + depth); |
| + if (err) |
| + goto out; |
| + /* ex1: ee_block to iblock - 1 : uninitialized */ |
| + if (iblock > ee_block) { |
| + ex1 = ex; |
| + ex1->ee_len = cpu_to_le16(iblock - ee_block); |
| + ext4_ext_mark_uninitialized(ex1); |
| + ex2 = &newex; |
| + } |
| + /* |
| + * for sanity, update the length of the ex2 extent before |
| + * we insert ex3, if ex1 is NULL. This is to avoid temporary |
| + * overlap of blocks. |
| + */ |
| + if (!ex1 && allocated > max_blocks) |
| + ex2->ee_len = cpu_to_le16(max_blocks); |
| + /* ex3: to ee_block + ee_len : uninitialised */ |
| + if (allocated > max_blocks) { |
| + unsigned int newdepth; |
| + ex3 = &newex; |
| + ex3->ee_block = cpu_to_le32(iblock + max_blocks); |
| + ext4_ext_store_pblock(ex3, newblock + max_blocks); |
| + ex3->ee_len = cpu_to_le16(allocated - max_blocks); |
| + ext4_ext_mark_uninitialized(ex3); |
| + err = ext4_ext_insert_extent(handle, inode, path, ex3, flags); |
| + if (err == -ENOSPC) { |
| + err = ext4_ext_zeroout(inode, &orig_ex); |
| + if (err) |
| + goto fix_extent_len; |
| + /* update the extent length and mark as initialized */ |
| + ex->ee_block = orig_ex.ee_block; |
| + ex->ee_len = orig_ex.ee_len; |
| + ext4_ext_store_pblock(ex, ext_pblock(&orig_ex)); |
| + ext4_ext_dirty(handle, inode, path + depth); |
| + /* zeroed the full extent */ |
| + /* blocks available from iblock */ |
| + return allocated; |
| + |
| + } else if (err) |
| + goto fix_extent_len; |
| + /* |
| + * The depth, and hence eh & ex might change |
| + * as part of the insert above. |
| + */ |
| + newdepth = ext_depth(inode); |
| + /* |
| + * update the extent length after successful insert of the |
| + * split extent |
| + */ |
| + orig_ex.ee_len = cpu_to_le16(ee_len - |
| + ext4_ext_get_actual_len(ex3)); |
| + depth = newdepth; |
| + ext4_ext_drop_refs(path); |
| + path = ext4_ext_find_extent(inode, iblock, path); |
| + if (IS_ERR(path)) { |
| + err = PTR_ERR(path); |
| + goto out; |
| + } |
| + eh = path[depth].p_hdr; |
| + ex = path[depth].p_ext; |
| + if (ex2 != &newex) |
| + ex2 = ex; |
| + |
| + err = ext4_ext_get_access(handle, inode, path + depth); |
| + if (err) |
| + goto out; |
| + |
| + allocated = max_blocks; |
| + } |
| + /* |
| + * If there was a change of depth as part of the |
| + * insertion of ex3 above, we need to update the length |
| + * of the ex1 extent again here |
| + */ |
| + if (ex1 && ex1 != ex) { |
| + ex1 = ex; |
| + ex1->ee_len = cpu_to_le16(iblock - ee_block); |
| + ext4_ext_mark_uninitialized(ex1); |
| + ex2 = &newex; |
| + } |
| + /* |
| + * ex2: iblock to iblock + maxblocks-1 : to be direct IO written, |
| + * uninitialised still. |
| + */ |
| + ex2->ee_block = cpu_to_le32(iblock); |
| + ext4_ext_store_pblock(ex2, newblock); |
| + ex2->ee_len = cpu_to_le16(allocated); |
| + ext4_ext_mark_uninitialized(ex2); |
| + if (ex2 != ex) |
| + goto insert; |
| + /* Mark modified extent as dirty */ |
| + err = ext4_ext_dirty(handle, inode, path + depth); |
| + ext_debug("out here\n"); |
| + goto out; |
| +insert: |
| + err = ext4_ext_insert_extent(handle, inode, path, &newex, flags); |
| + if (err == -ENOSPC) { |
| + err = ext4_ext_zeroout(inode, &orig_ex); |
| + if (err) |
| + goto fix_extent_len; |
| + /* update the extent length and mark as initialized */ |
| + ex->ee_block = orig_ex.ee_block; |
| + ex->ee_len = orig_ex.ee_len; |
| + ext4_ext_store_pblock(ex, ext_pblock(&orig_ex)); |
| + ext4_ext_dirty(handle, inode, path + depth); |
| + /* zero out the first half */ |
| + return allocated; |
| + } else if (err) |
| + goto fix_extent_len; |
| +out: |
| + ext4_ext_show_leaf(inode, path); |
| + return err ? err : allocated; |
| + |
| +fix_extent_len: |
| + ex->ee_block = orig_ex.ee_block; |
| + ex->ee_len = orig_ex.ee_len; |
| + ext4_ext_store_pblock(ex, ext_pblock(&orig_ex)); |
| + ext4_ext_mark_uninitialized(ex); |
| + ext4_ext_dirty(handle, inode, path + depth); |
| + return err; |
| +} |
| +static int ext4_convert_unwritten_extents_dio(handle_t *handle, |
| + struct inode *inode, |
| + struct ext4_ext_path *path) |
| +{ |
| + struct ext4_extent *ex; |
| + struct ext4_extent_header *eh; |
| + int depth; |
| + int err = 0; |
| + int ret = 0; |
| + |
| + depth = ext_depth(inode); |
| + eh = path[depth].p_hdr; |
| + ex = path[depth].p_ext; |
| + |
| + err = ext4_ext_get_access(handle, inode, path + depth); |
| + if (err) |
| + goto out; |
| + /* first mark the extent as initialized */ |
| + ext4_ext_mark_initialized(ex); |
| + |
| + /* |
| + * We have to see if it can be merged with the extent |
| + * on the left. |
| + */ |
| + if (ex > EXT_FIRST_EXTENT(eh)) { |
| + /* |
| + * To merge left, pass "ex - 1" to try_to_merge(), |
| + * since it merges towards right _only_. |
| + */ |
| + ret = ext4_ext_try_to_merge(inode, path, ex - 1); |
| + if (ret) { |
| + err = ext4_ext_correct_indexes(handle, inode, path); |
| + if (err) |
| + goto out; |
| + depth = ext_depth(inode); |
| + ex--; |
| + } |
| + } |
| + /* |
| + * Try to Merge towards right. |
| + */ |
| + ret = ext4_ext_try_to_merge(inode, path, ex); |
| + if (ret) { |
| + err = ext4_ext_correct_indexes(handle, inode, path); |
| + if (err) |
| + goto out; |
| + depth = ext_depth(inode); |
| + } |
| + /* Mark modified extent as dirty */ |
| + err = ext4_ext_dirty(handle, inode, path + depth); |
| +out: |
| + ext4_ext_show_leaf(inode, path); |
| + return err; |
| +} |
| + |
| +static int |
| +ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode, |
| + ext4_lblk_t iblock, unsigned int max_blocks, |
| + struct ext4_ext_path *path, int flags, |
| + unsigned int allocated, struct buffer_head *bh_result, |
| + ext4_fsblk_t newblock) |
| +{ |
| + int ret = 0; |
| + int err = 0; |
| + |
| + ext_debug("ext4_ext_handle_uninitialized_extents: inode %lu, logical" |
| + "block %llu, max_blocks %u, flags %d, allocated %u", |
| + inode->i_ino, (unsigned long long)iblock, max_blocks, |
| + flags, allocated); |
| + ext4_ext_show_leaf(inode, path); |
| + |
| + /* DIO get_block() before submit the IO, split the extent */ |
| + if (flags == EXT4_GET_BLOCKS_DIO_CREATE_EXT) { |
| + ret = ext4_split_unwritten_extents(handle, |
| + inode, path, iblock, |
| + max_blocks, flags); |
| + goto out; |
| + } |
| + /* DIO end_io complete, convert the filled extent to written */ |
| + if (flags == EXT4_GET_BLOCKS_DIO_CONVERT_EXT) { |
| + ret = ext4_convert_unwritten_extents_dio(handle, inode, |
| + path); |
| + goto out2; |
| + } |
| + /* buffered IO case */ |
| + /* |
| + * repeat fallocate creation request |
| + * we already have an unwritten extent |
| + */ |
| + if (flags & EXT4_GET_BLOCKS_UNINIT_EXT) |
| + goto map_out; |
| + |
| + /* buffered READ or buffered write_begin() lookup */ |
| + if ((flags & EXT4_GET_BLOCKS_CREATE) == 0) { |
| + /* |
| + * We have blocks reserved already. We |
| + * return allocated blocks so that delalloc |
| + * won't do block reservation for us. But |
| + * the buffer head will be unmapped so that |
| + * a read from the block returns 0s. |
| + */ |
| + set_buffer_unwritten(bh_result); |
| + goto out1; |
| + } |
| + |
| + /* buffered write, writepage time, convert*/ |
| + ret = ext4_ext_convert_to_initialized(handle, inode, |
| + path, iblock, |
| + max_blocks); |
| +out: |
| + if (ret <= 0) { |
| + err = ret; |
| + goto out2; |
| + } else |
| + allocated = ret; |
| + set_buffer_new(bh_result); |
| +map_out: |
| + set_buffer_mapped(bh_result); |
| +out1: |
| + if (allocated > max_blocks) |
| + allocated = max_blocks; |
| + ext4_ext_show_leaf(inode, path); |
| + bh_result->b_bdev = inode->i_sb->s_bdev; |
| + bh_result->b_blocknr = newblock; |
| +out2: |
| + if (path) { |
| + ext4_ext_drop_refs(path); |
| + kfree(path); |
| + } |
| + return err ? err : allocated; |
| +} |
| +/* |
| * Block allocation/map/preallocation routine for extents based files |
| * |
| * |
| @@ -2868,33 +3184,10 @@ int ext4_ext_get_blocks(handle_t *handle |
| EXT4_EXT_CACHE_EXTENT); |
| goto out; |
| } |
| - if (flags & EXT4_GET_BLOCKS_UNINIT_EXT) |
| - goto out; |
| - if ((flags & EXT4_GET_BLOCKS_CREATE) == 0) { |
| - if (allocated > max_blocks) |
| - allocated = max_blocks; |
| - /* |
| - * We have blocks reserved already. We |
| - * return allocated blocks so that delalloc |
| - * won't do block reservation for us. But |
| - * the buffer head will be unmapped so that |
| - * a read from the block returns 0s. |
| - */ |
| - set_buffer_unwritten(bh_result); |
| - bh_result->b_bdev = inode->i_sb->s_bdev; |
| - bh_result->b_blocknr = newblock; |
| - goto out2; |
| - } |
| - |
| - ret = ext4_ext_convert_to_initialized(handle, inode, |
| - path, iblock, |
| - max_blocks); |
| - if (ret <= 0) { |
| - err = ret; |
| - goto out2; |
| - } else |
| - allocated = ret; |
| - goto outnew; |
| + ret = ext4_ext_handle_uninitialized_extents(handle, |
| + inode, iblock, max_blocks, path, |
| + flags, allocated, bh_result, newblock); |
| + return ret; |
| } |
| } |
| |
| @@ -2967,7 +3260,7 @@ int ext4_ext_get_blocks(handle_t *handle |
| newex.ee_len = cpu_to_le16(ar.len); |
| if (flags & EXT4_GET_BLOCKS_UNINIT_EXT) /* Mark uninitialized */ |
| ext4_ext_mark_uninitialized(&newex); |
| - err = ext4_ext_insert_extent(handle, inode, path, &newex); |
| + err = ext4_ext_insert_extent(handle, inode, path, &newex, flags); |
| if (err) { |
| /* free data blocks we just allocated */ |
| /* not a good idea to call discard here directly, |
| @@ -2981,7 +3274,6 @@ int ext4_ext_get_blocks(handle_t *handle |
| /* previous routine could use block we allocated */ |
| newblock = ext_pblock(&newex); |
| allocated = ext4_ext_get_actual_len(&newex); |
| -outnew: |
| set_buffer_new(bh_result); |
| |
| /* Cache only when it is _not_ an uninitialized extent */ |
| @@ -3180,6 +3472,63 @@ retry: |
| } |
| |
| /* |
| + * This function convert a range of blocks to written extents |
| + * The caller of this function will pass the start offset and the size. |
| + * all unwritten extents within this range will be converted to |
| + * written extents. |
| + * |
| + * This function is called from the direct IO end io call back |
| + * function, to convert the fallocated extents after IO is completed. |
| + */ |
| +int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset, |
| + loff_t len) |
| +{ |
| + handle_t *handle; |
| + ext4_lblk_t block; |
| + unsigned int max_blocks; |
| + int ret = 0; |
| + int ret2 = 0; |
| + struct buffer_head map_bh; |
| + unsigned int credits, blkbits = inode->i_blkbits; |
| + |
| + block = offset >> blkbits; |
| + /* |
| + * We can't just convert len to max_blocks because |
| + * If blocksize = 4096 offset = 3072 and len = 2048 |
| + */ |
| + max_blocks = (EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits) |
| + - block; |
| + /* |
| + * credits to insert 1 extent into extent tree |
| + */ |
| + credits = ext4_chunk_trans_blocks(inode, max_blocks); |
| + while (ret >= 0 && ret < max_blocks) { |
| + block = block + ret; |
| + max_blocks = max_blocks - ret; |
| + handle = ext4_journal_start(inode, credits); |
| + if (IS_ERR(handle)) { |
| + ret = PTR_ERR(handle); |
| + break; |
| + } |
| + map_bh.b_state = 0; |
| + ret = ext4_get_blocks(handle, inode, block, |
| + max_blocks, &map_bh, |
| + EXT4_GET_BLOCKS_DIO_CONVERT_EXT); |
| + if (ret <= 0) { |
| + WARN_ON(ret <= 0); |
| + printk(KERN_ERR "%s: ext4_ext_get_blocks " |
| + "returned error inode#%lu, block=%u, " |
| + "max_blocks=%u", __func__, |
| + inode->i_ino, block, max_blocks); |
| + } |
| + ext4_mark_inode_dirty(handle, inode); |
| + ret2 = ext4_journal_stop(handle); |
| + if (ret <= 0 || ret2 ) |
| + break; |
| + } |
| + return ret > 0 ? ret2 : ret; |
| +} |
| +/* |
| * Callback function called for each extent to gather FIEMAP information. |
| */ |
| static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path, |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -1234,6 +1234,9 @@ int ext4_get_blocks(handle_t *handle, st |
| clear_buffer_mapped(bh); |
| clear_buffer_unwritten(bh); |
| |
| + ext_debug("ext4_get_blocks(): inode %lu, flag %d, max_blocks %u," |
| + "logical block %lu\n", inode->i_ino, flags, max_blocks, |
| + (unsigned long)block); |
| /* |
| * Try to see if we can get the block without requesting a new |
| * file system block. |
| --- a/fs/ext4/migrate.c |
| +++ b/fs/ext4/migrate.c |
| @@ -75,7 +75,7 @@ static int finish_range(handle_t *handle |
| goto err_out; |
| } |
| } |
| - retval = ext4_ext_insert_extent(handle, inode, path, &newext); |
| + retval = ext4_ext_insert_extent(handle, inode, path, &newext, 0); |
| err_out: |
| if (path) { |
| ext4_ext_drop_refs(path); |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -322,7 +322,7 @@ mext_insert_across_blocks(handle_t *hand |
| goto out; |
| |
| if (ext4_ext_insert_extent(handle, orig_inode, |
| - orig_path, new_ext)) |
| + orig_path, new_ext, 0)) |
| goto out; |
| } |
| |
| @@ -333,7 +333,7 @@ mext_insert_across_blocks(handle_t *hand |
| goto out; |
| |
| if (ext4_ext_insert_extent(handle, orig_inode, |
| - orig_path, end_ext)) |
| + orig_path, end_ext, 0)) |
| goto out; |
| } |
| out: |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:56 2009 |
| Message-Id: <20091211042756.339680773@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:21 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [43/90] ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0043-ext4-Use-end_io-callback-to-avoid-direct-I-O-fallbac.patch |
| Content-Length: 9561 |
| Lines: 301 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 4c0425ff68b1b87b802ffeda7b6a46ff7da7241c) |
| |
| Currently the DIO VFS code passes create = 0 when writing to the |
| middle of file. It does this to avoid block allocation for holes, so |
| as not to expose stale data out when there is a parallel buffered read |
| (which does not hold the i_mutex lock). Direct I/O writes into holes |
| falls back to buffered IO for this reason. |
| |
| Since preallocated extents are treated as holes when doing a |
| get_block() look up (buffer is not mapped), direct IO over fallocate |
| also falls back to buffered IO. Thus ext4 actually silently falls |
| back to buffered IO in above two cases, which is undesirable. |
| |
| To fix this, this patch creates unitialized extents when a direct I/O |
| write into holes in sparse files, and registering an end_io callback which |
| converts the uninitialized extent to an initialized extent after the |
| I/O is completed. |
| |
| Singed-Off-By: Mingming Cao <cmm@us.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4.h | 3 |
| fs/ext4/inode.c | 197 +++++++++++++++++++++++++++++++++++++++++++++++++++++++- |
| fs/ext4/super.c | 11 +++ |
| 3 files changed, 210 insertions(+), 1 deletion(-) |
| |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -1002,6 +1002,9 @@ struct ext4_sb_info { |
| |
| unsigned int s_log_groups_per_flex; |
| struct flex_groups *s_flex_groups; |
| + |
| + /* workqueue for dio unwritten */ |
| + struct workqueue_struct *dio_unwritten_wq; |
| }; |
| |
| static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb) |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -37,6 +37,7 @@ |
| #include <linux/namei.h> |
| #include <linux/uio.h> |
| #include <linux/bio.h> |
| +#include <linux/workqueue.h> |
| |
| #include "ext4_jbd2.h" |
| #include "xattr.h" |
| @@ -3350,6 +3351,8 @@ static int ext4_releasepage(struct page |
| } |
| |
| /* |
| + * O_DIRECT for ext3 (or indirect map) based files |
| + * |
| * If the O_DIRECT write will extend the file then add this inode to the |
| * orphan list. So recovery will truncate it back to the original size |
| * if the machine crashes during the write. |
| @@ -3358,7 +3361,7 @@ static int ext4_releasepage(struct page |
| * crashes then stale disk data _may_ be exposed inside the file. But current |
| * VFS code falls back into buffered path in that case so we are safe. |
| */ |
| -static ssize_t ext4_direct_IO(int rw, struct kiocb *iocb, |
| +static ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb, |
| const struct iovec *iov, loff_t offset, |
| unsigned long nr_segs) |
| { |
| @@ -3432,6 +3435,198 @@ out: |
| return ret; |
| } |
| |
| +/* Maximum number of blocks we map for direct IO at once. */ |
| + |
| +static int ext4_get_block_dio_write(struct inode *inode, sector_t iblock, |
| + struct buffer_head *bh_result, int create) |
| +{ |
| + handle_t *handle = NULL; |
| + int ret = 0; |
| + unsigned max_blocks = bh_result->b_size >> inode->i_blkbits; |
| + int dio_credits; |
| + |
| + /* |
| + * DIO VFS code passes create = 0 flag for write to |
| + * the middle of file. It does this to avoid block |
| + * allocation for holes, to prevent expose stale data |
| + * out when there is parallel buffered read (which does |
| + * not hold the i_mutex lock) while direct IO write has |
| + * not completed. DIO request on holes finally falls back |
| + * to buffered IO for this reason. |
| + * |
| + * For ext4 extent based file, since we support fallocate, |
| + * new allocated extent as uninitialized, for holes, we |
| + * could fallocate blocks for holes, thus parallel |
| + * buffered IO read will zero out the page when read on |
| + * a hole while parallel DIO write to the hole has not completed. |
| + * |
| + * when we come here, we know it's a direct IO write to |
| + * to the middle of file (<i_size) |
| + * so it's safe to override the create flag from VFS. |
| + */ |
| + create = EXT4_GET_BLOCKS_DIO_CREATE_EXT; |
| + |
| + if (max_blocks > DIO_MAX_BLOCKS) |
| + max_blocks = DIO_MAX_BLOCKS; |
| + dio_credits = ext4_chunk_trans_blocks(inode, max_blocks); |
| + handle = ext4_journal_start(inode, dio_credits); |
| + if (IS_ERR(handle)) { |
| + ret = PTR_ERR(handle); |
| + goto out; |
| + } |
| + ret = ext4_get_blocks(handle, inode, iblock, max_blocks, bh_result, |
| + create); |
| + if (ret > 0) { |
| + bh_result->b_size = (ret << inode->i_blkbits); |
| + ret = 0; |
| + } |
| + ext4_journal_stop(handle); |
| +out: |
| + return ret; |
| +} |
| + |
| +#define DIO_AIO 0x1 |
| + |
| +static void ext4_free_io_end(ext4_io_end_t *io) |
| +{ |
| + kfree(io); |
| +} |
| + |
| +/* |
| + * IO write completion for unwritten extents. |
| + * |
| + * check a range of space and convert unwritten extents to written. |
| + */ |
| +static void ext4_end_dio_unwritten(struct work_struct *work) |
| +{ |
| + ext4_io_end_t *io = container_of(work, ext4_io_end_t, work); |
| + struct inode *inode = io->inode; |
| + loff_t offset = io->offset; |
| + size_t size = io->size; |
| + int ret = 0; |
| + int aio = io->flag & DIO_AIO; |
| + |
| + if (aio) |
| + mutex_lock(&inode->i_mutex); |
| + if (offset + size <= i_size_read(inode)) |
| + ret = ext4_convert_unwritten_extents(inode, offset, size); |
| + |
| + if (ret < 0) |
| + printk(KERN_EMERG "%s: failed to convert unwritten" |
| + "extents to written extents, error is %d\n", |
| + __func__, ret); |
| + |
| + ext4_free_io_end(io); |
| + if (aio) |
| + mutex_unlock(&inode->i_mutex); |
| +} |
| + |
| +static ext4_io_end_t *ext4_init_io_end (struct inode *inode, unsigned int flag) |
| +{ |
| + ext4_io_end_t *io = NULL; |
| + |
| + io = kmalloc(sizeof(*io), GFP_NOFS); |
| + |
| + if (io) { |
| + io->inode = inode; |
| + io->flag = flag; |
| + io->offset = 0; |
| + io->size = 0; |
| + io->error = 0; |
| + INIT_WORK(&io->work, ext4_end_dio_unwritten); |
| + } |
| + |
| + return io; |
| +} |
| + |
| +static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset, |
| + ssize_t size, void *private) |
| +{ |
| + ext4_io_end_t *io_end = iocb->private; |
| + struct workqueue_struct *wq; |
| + |
| + /* if not hole or unwritten extents, just simple return */ |
| + if (!io_end || !size || !iocb->private) |
| + return; |
| + io_end->offset = offset; |
| + io_end->size = size; |
| + wq = EXT4_SB(io_end->inode->i_sb)->dio_unwritten_wq; |
| + |
| + /* We need to convert unwritten extents to written */ |
| + queue_work(wq, &io_end->work); |
| + |
| + if (is_sync_kiocb(iocb)) |
| + flush_workqueue(wq); |
| + |
| + iocb->private = NULL; |
| +} |
| +/* |
| + * For ext4 extent files, ext4 will do direct-io write to holes, |
| + * preallocated extents, and those write extend the file, no need to |
| + * fall back to buffered IO. |
| + * |
| + * For holes, we fallocate those blocks, mark them as unintialized |
| + * If those blocks were preallocated, we mark sure they are splited, but |
| + * still keep the range to write as unintialized. |
| + * |
| + * When end_io call back function called at the last IO complete time, |
| + * those extents will be converted to written extents. |
| + * |
| + * If the O_DIRECT write will extend the file then add this inode to the |
| + * orphan list. So recovery will truncate it back to the original size |
| + * if the machine crashes during the write. |
| + * |
| + */ |
| +static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb, |
| + const struct iovec *iov, loff_t offset, |
| + unsigned long nr_segs) |
| +{ |
| + struct file *file = iocb->ki_filp; |
| + struct inode *inode = file->f_mapping->host; |
| + ssize_t ret; |
| + size_t count = iov_length(iov, nr_segs); |
| + |
| + loff_t final_size = offset + count; |
| + if (rw == WRITE && final_size <= inode->i_size) { |
| + /* |
| + * For DIO we fallocate blocks for holes, we fallocate blocks |
| + * The fallocated extent for hole is marked as uninitialized |
| + * to prevent paralel buffered read to expose the stale data |
| + * before DIO complete the data IO. |
| + * as for previously fallocated extents, ext4 get_block |
| + * will just simply mark the buffer mapped but still |
| + * keep the extents uninitialized. |
| + * |
| + * At the end of IO, the ext4 end_io callback function |
| + * will convert those unwritten extents to written, |
| + * |
| + */ |
| + iocb->private = ext4_init_io_end(inode, !is_sync_kiocb(iocb)); |
| + if (!iocb->private) |
| + return -ENOMEM; |
| + ret = blockdev_direct_IO(rw, iocb, inode, |
| + inode->i_sb->s_bdev, iov, |
| + offset, nr_segs, |
| + ext4_get_block_dio_write, |
| + ext4_end_io_dio); |
| + return ret; |
| + } |
| + return ext4_ind_direct_IO(rw, iocb, iov, offset, nr_segs); |
| +} |
| + |
| +static ssize_t ext4_direct_IO(int rw, struct kiocb *iocb, |
| + const struct iovec *iov, loff_t offset, |
| + unsigned long nr_segs) |
| +{ |
| + struct file *file = iocb->ki_filp; |
| + struct inode *inode = file->f_mapping->host; |
| + |
| + if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) |
| + return ext4_ext_direct_IO(rw, iocb, iov, offset, nr_segs); |
| + |
| + return ext4_ind_direct_IO(rw, iocb, iov, offset, nr_segs); |
| +} |
| + |
| /* |
| * Pages can be marked dirty completely asynchronously from ext4's journalling |
| * activity. By filemap_sync_pte(), try_to_unmap_one(), etc. We cannot do |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -580,6 +580,9 @@ static void ext4_put_super(struct super_ |
| struct ext4_super_block *es = sbi->s_es; |
| int i, err; |
| |
| + flush_workqueue(sbi->dio_unwritten_wq); |
| + destroy_workqueue(sbi->dio_unwritten_wq); |
| + |
| lock_super(sb); |
| lock_kernel(); |
| if (sb->s_dirt) |
| @@ -2809,6 +2812,12 @@ no_journal: |
| clear_opt(sbi->s_mount_opt, NOBH); |
| } |
| } |
| + EXT4_SB(sb)->dio_unwritten_wq = create_workqueue("ext4-dio-unwritten"); |
| + if (!EXT4_SB(sb)->dio_unwritten_wq) { |
| + printk(KERN_ERR "EXT4-fs: failed to create DIO workqueue\n"); |
| + goto failed_mount_wq; |
| + } |
| + |
| /* |
| * The jbd2_journal_load will have done any necessary log recovery, |
| * so we can safely mount the rest of the filesystem now. |
| @@ -2921,6 +2930,8 @@ cantfind_ext4: |
| |
| failed_mount4: |
| ext4_msg(sb, KERN_ERR, "mount failed"); |
| + destroy_workqueue(EXT4_SB(sb)->dio_unwritten_wq); |
| +failed_mount_wq: |
| ext4_release_system_zone(sb); |
| if (sbi->s_journal) { |
| jbd2_journal_destroy(sbi->s_journal); |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:57 2009 |
| Message-Id: <20091211042756.989318573@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:22 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Mingming Cao <cmm@us.ibm.com>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [44/90] ext4: async direct IO for holes and fallocate support |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0044-ext4-async-direct-IO-for-holes-and-fallocate-support.patch |
| Content-Length: 15796 |
| Lines: 475 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 8d5d02e6b176565c77ff03604908b1453a22044d) |
| |
| For async direct IO that covers holes or fallocate, the end_io |
| callback function now queued the convertion work on workqueue but |
| don't flush the work rightaway as it might take too long to afford. |
| |
| But when fsync is called after all the data is completed, user expects |
| the metadata also being updated before fsync returns. |
| |
| Thus we need to flush the conversion work when fsync() is called. |
| This patch keep track of a listed of completed async direct io that |
| has a work queued on workqueue. When fsync() is called, it will go |
| through the list and do the conversion. |
| |
| Signed-off-by: Mingming Cao <cmm@us.ibm.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4.h | 9 +- |
| fs/ext4/extents.c | 19 ++++ |
| fs/ext4/fsync.c | 5 + |
| fs/ext4/inode.c | 231 +++++++++++++++++++++++++++++++++++++++++++++--------- |
| fs/ext4/super.c | 8 + |
| 5 files changed, 233 insertions(+), 39 deletions(-) |
| |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -113,7 +113,9 @@ struct ext4_allocation_request { |
| unsigned int flags; |
| }; |
| |
| +#define DIO_AIO_UNWRITTEN 0x1 |
| typedef struct ext4_io_end { |
| + struct list_head list; /* per-file finished AIO list */ |
| struct inode *inode; /* file being written to */ |
| unsigned int flag; /* sync IO or AIO */ |
| int error; /* I/O error code */ |
| @@ -692,6 +694,11 @@ struct ext4_inode_info { |
| __u16 i_extra_isize; |
| |
| spinlock_t i_block_reservation_lock; |
| + |
| + /* completed async DIOs that might need unwritten extents handling */ |
| + struct list_head i_aio_dio_complete_list; |
| + /* current io_end structure for async DIO write*/ |
| + ext4_io_end_t *cur_aio_dio; |
| }; |
| |
| /* |
| @@ -1424,7 +1431,7 @@ extern int ext4_block_truncate_page(hand |
| struct address_space *mapping, loff_t from); |
| extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf); |
| extern qsize_t ext4_get_reserved_space(struct inode *inode); |
| - |
| +extern int flush_aio_dio_completed_IO(struct inode *inode); |
| /* ioctl.c */ |
| extern long ext4_ioctl(struct file *, unsigned int, unsigned long); |
| extern long ext4_compat_ioctl(struct file *, unsigned int, unsigned long); |
| --- a/fs/ext4/extents.c |
| +++ b/fs/ext4/extents.c |
| @@ -3012,6 +3012,7 @@ ext4_ext_handle_uninitialized_extents(ha |
| { |
| int ret = 0; |
| int err = 0; |
| + ext4_io_end_t *io = EXT4_I(inode)->cur_aio_dio; |
| |
| ext_debug("ext4_ext_handle_uninitialized_extents: inode %lu, logical" |
| "block %llu, max_blocks %u, flags %d, allocated %u", |
| @@ -3024,6 +3025,9 @@ ext4_ext_handle_uninitialized_extents(ha |
| ret = ext4_split_unwritten_extents(handle, |
| inode, path, iblock, |
| max_blocks, flags); |
| + /* flag the io_end struct that we need convert when IO done */ |
| + if (io) |
| + io->flag = DIO_AIO_UNWRITTEN; |
| goto out; |
| } |
| /* DIO end_io complete, convert the filled extent to written */ |
| @@ -3109,6 +3113,7 @@ int ext4_ext_get_blocks(handle_t *handle |
| int err = 0, depth, ret, cache_type; |
| unsigned int allocated = 0; |
| struct ext4_allocation_request ar; |
| + ext4_io_end_t *io = EXT4_I(inode)->cur_aio_dio; |
| |
| __clear_bit(BH_New, &bh_result->b_state); |
| ext_debug("blocks %u/%u requested for inode %u\n", |
| @@ -3258,8 +3263,20 @@ int ext4_ext_get_blocks(handle_t *handle |
| /* try to insert new extent into found leaf and return */ |
| ext4_ext_store_pblock(&newex, newblock); |
| newex.ee_len = cpu_to_le16(ar.len); |
| - if (flags & EXT4_GET_BLOCKS_UNINIT_EXT) /* Mark uninitialized */ |
| + /* Mark uninitialized */ |
| + if (flags & EXT4_GET_BLOCKS_UNINIT_EXT){ |
| ext4_ext_mark_uninitialized(&newex); |
| + /* |
| + * io_end structure was created for every async |
| + * direct IO write to the middle of the file. |
| + * To avoid unecessary convertion for every aio dio rewrite |
| + * to the mid of file, here we flag the IO that is really |
| + * need the convertion. |
| + * |
| + */ |
| + if (io && flags == EXT4_GET_BLOCKS_DIO_CREATE_EXT) |
| + io->flag = DIO_AIO_UNWRITTEN; |
| + } |
| err = ext4_ext_insert_extent(handle, inode, path, &newex, flags); |
| if (err) { |
| /* free data blocks we just allocated */ |
| --- a/fs/ext4/fsync.c |
| +++ b/fs/ext4/fsync.c |
| @@ -44,6 +44,8 @@ |
| * |
| * What we do is just kick off a commit and wait on it. This will snapshot the |
| * inode to disk. |
| + * |
| + * i_mutex lock is held when entering and exiting this function |
| */ |
| |
| int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) |
| @@ -56,6 +58,9 @@ int ext4_sync_file(struct file *file, st |
| |
| trace_ext4_sync_file(file, dentry, datasync); |
| |
| + ret = flush_aio_dio_completed_IO(inode); |
| + if (ret < 0) |
| + goto out; |
| /* |
| * data=writeback: |
| * The caller's filemap_fdatawrite()/wait will sync the data. |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -3445,6 +3445,8 @@ static int ext4_get_block_dio_write(stru |
| unsigned max_blocks = bh_result->b_size >> inode->i_blkbits; |
| int dio_credits; |
| |
| + ext4_debug("ext4_get_block_dio_write: inode %lu, create flag %d\n", |
| + inode->i_ino, create); |
| /* |
| * DIO VFS code passes create = 0 flag for write to |
| * the middle of file. It does this to avoid block |
| @@ -3485,55 +3487,152 @@ out: |
| return ret; |
| } |
| |
| -#define DIO_AIO 0x1 |
| - |
| static void ext4_free_io_end(ext4_io_end_t *io) |
| { |
| + BUG_ON(!io); |
| + iput(io->inode); |
| kfree(io); |
| } |
| +static void dump_aio_dio_list(struct inode * inode) |
| +{ |
| +#ifdef EXT4_DEBUG |
| + struct list_head *cur, *before, *after; |
| + ext4_io_end_t *io, *io0, *io1; |
| + |
| + if (list_empty(&EXT4_I(inode)->i_aio_dio_complete_list)){ |
| + ext4_debug("inode %lu aio dio list is empty\n", inode->i_ino); |
| + return; |
| + } |
| + |
| + ext4_debug("Dump inode %lu aio_dio_completed_IO list \n", inode->i_ino); |
| + list_for_each_entry(io, &EXT4_I(inode)->i_aio_dio_complete_list, list){ |
| + cur = &io->list; |
| + before = cur->prev; |
| + io0 = container_of(before, ext4_io_end_t, list); |
| + after = cur->next; |
| + io1 = container_of(after, ext4_io_end_t, list); |
| + |
| + ext4_debug("io 0x%p from inode %lu,prev 0x%p,next 0x%p\n", |
| + io, inode->i_ino, io0, io1); |
| + } |
| +#endif |
| +} |
| |
| /* |
| - * IO write completion for unwritten extents. |
| - * |
| * check a range of space and convert unwritten extents to written. |
| */ |
| -static void ext4_end_dio_unwritten(struct work_struct *work) |
| +static int ext4_end_aio_dio_nolock(ext4_io_end_t *io) |
| { |
| - ext4_io_end_t *io = container_of(work, ext4_io_end_t, work); |
| struct inode *inode = io->inode; |
| loff_t offset = io->offset; |
| size_t size = io->size; |
| int ret = 0; |
| - int aio = io->flag & DIO_AIO; |
| |
| - if (aio) |
| - mutex_lock(&inode->i_mutex); |
| + ext4_debug("end_aio_dio_onlock: io 0x%p from inode %lu,list->next 0x%p," |
| + "list->prev 0x%p\n", |
| + io, inode->i_ino, io->list.next, io->list.prev); |
| + |
| + if (list_empty(&io->list)) |
| + return ret; |
| + |
| + if (io->flag != DIO_AIO_UNWRITTEN) |
| + return ret; |
| + |
| if (offset + size <= i_size_read(inode)) |
| ret = ext4_convert_unwritten_extents(inode, offset, size); |
| |
| - if (ret < 0) |
| + if (ret < 0) { |
| printk(KERN_EMERG "%s: failed to convert unwritten" |
| - "extents to written extents, error is %d\n", |
| - __func__, ret); |
| + "extents to written extents, error is %d" |
| + " io is still on inode %lu aio dio list\n", |
| + __func__, ret, inode->i_ino); |
| + return ret; |
| + } |
| + |
| + /* clear the DIO AIO unwritten flag */ |
| + io->flag = 0; |
| + return ret; |
| +} |
| +/* |
| + * work on completed aio dio IO, to convert unwritten extents to extents |
| + */ |
| +static void ext4_end_aio_dio_work(struct work_struct *work) |
| +{ |
| + ext4_io_end_t *io = container_of(work, ext4_io_end_t, work); |
| + struct inode *inode = io->inode; |
| + int ret = 0; |
| |
| - ext4_free_io_end(io); |
| - if (aio) |
| - mutex_unlock(&inode->i_mutex); |
| + mutex_lock(&inode->i_mutex); |
| + ret = ext4_end_aio_dio_nolock(io); |
| + if (ret >= 0) { |
| + if (!list_empty(&io->list)) |
| + list_del_init(&io->list); |
| + ext4_free_io_end(io); |
| + } |
| + mutex_unlock(&inode->i_mutex); |
| } |
| +/* |
| + * This function is called from ext4_sync_file(). |
| + * |
| + * When AIO DIO IO is completed, the work to convert unwritten |
| + * extents to written is queued on workqueue but may not get immediately |
| + * scheduled. When fsync is called, we need to ensure the |
| + * conversion is complete before fsync returns. |
| + * The inode keeps track of a list of completed AIO from DIO path |
| + * that might needs to do the conversion. This function walks through |
| + * the list and convert the related unwritten extents to written. |
| + */ |
| +int flush_aio_dio_completed_IO(struct inode *inode) |
| +{ |
| + ext4_io_end_t *io; |
| + int ret = 0; |
| + int ret2 = 0; |
| + |
| + if (list_empty(&EXT4_I(inode)->i_aio_dio_complete_list)) |
| + return ret; |
| |
| -static ext4_io_end_t *ext4_init_io_end (struct inode *inode, unsigned int flag) |
| + dump_aio_dio_list(inode); |
| + while (!list_empty(&EXT4_I(inode)->i_aio_dio_complete_list)){ |
| + io = list_entry(EXT4_I(inode)->i_aio_dio_complete_list.next, |
| + ext4_io_end_t, list); |
| + /* |
| + * Calling ext4_end_aio_dio_nolock() to convert completed |
| + * IO to written. |
| + * |
| + * When ext4_sync_file() is called, run_queue() may already |
| + * about to flush the work corresponding to this io structure. |
| + * It will be upset if it founds the io structure related |
| + * to the work-to-be schedule is freed. |
| + * |
| + * Thus we need to keep the io structure still valid here after |
| + * convertion finished. The io structure has a flag to |
| + * avoid double converting from both fsync and background work |
| + * queue work. |
| + */ |
| + ret = ext4_end_aio_dio_nolock(io); |
| + if (ret < 0) |
| + ret2 = ret; |
| + else |
| + list_del_init(&io->list); |
| + } |
| + return (ret2 < 0) ? ret2 : 0; |
| +} |
| + |
| +static ext4_io_end_t *ext4_init_io_end (struct inode *inode) |
| { |
| ext4_io_end_t *io = NULL; |
| |
| io = kmalloc(sizeof(*io), GFP_NOFS); |
| |
| if (io) { |
| + igrab(inode); |
| io->inode = inode; |
| - io->flag = flag; |
| + io->flag = 0; |
| io->offset = 0; |
| io->size = 0; |
| io->error = 0; |
| - INIT_WORK(&io->work, ext4_end_dio_unwritten); |
| + INIT_WORK(&io->work, ext4_end_aio_dio_work); |
| + INIT_LIST_HEAD(&io->list); |
| } |
| |
| return io; |
| @@ -3545,19 +3644,31 @@ static void ext4_end_io_dio(struct kiocb |
| ext4_io_end_t *io_end = iocb->private; |
| struct workqueue_struct *wq; |
| |
| - /* if not hole or unwritten extents, just simple return */ |
| - if (!io_end || !size || !iocb->private) |
| + ext_debug("ext4_end_io_dio(): io_end 0x%p" |
| + "for inode %lu, iocb 0x%p, offset %llu, size %llu\n", |
| + iocb->private, io_end->inode->i_ino, iocb, offset, |
| + size); |
| + /* if not async direct IO or dio with 0 bytes write, just return */ |
| + if (!io_end || !size) |
| + return; |
| + |
| + /* if not aio dio with unwritten extents, just free io and return */ |
| + if (io_end->flag != DIO_AIO_UNWRITTEN){ |
| + ext4_free_io_end(io_end); |
| + iocb->private = NULL; |
| return; |
| + } |
| + |
| io_end->offset = offset; |
| io_end->size = size; |
| wq = EXT4_SB(io_end->inode->i_sb)->dio_unwritten_wq; |
| |
| - /* We need to convert unwritten extents to written */ |
| + /* queue the work to convert unwritten extents to written */ |
| queue_work(wq, &io_end->work); |
| |
| - if (is_sync_kiocb(iocb)) |
| - flush_workqueue(wq); |
| - |
| + /* Add the io_end to per-inode completed aio dio list*/ |
| + list_add_tail(&io_end->list, |
| + &EXT4_I(io_end->inode)->i_aio_dio_complete_list); |
| iocb->private = NULL; |
| } |
| /* |
| @@ -3569,8 +3680,10 @@ static void ext4_end_io_dio(struct kiocb |
| * If those blocks were preallocated, we mark sure they are splited, but |
| * still keep the range to write as unintialized. |
| * |
| - * When end_io call back function called at the last IO complete time, |
| - * those extents will be converted to written extents. |
| + * The unwrritten extents will be converted to written when DIO is completed. |
| + * For async direct IO, since the IO may still pending when return, we |
| + * set up an end_io call back function, which will do the convertion |
| + * when async direct IO completed. |
| * |
| * If the O_DIRECT write will extend the file then add this inode to the |
| * orphan list. So recovery will truncate it back to the original size |
| @@ -3589,28 +3702,76 @@ static ssize_t ext4_ext_direct_IO(int rw |
| loff_t final_size = offset + count; |
| if (rw == WRITE && final_size <= inode->i_size) { |
| /* |
| - * For DIO we fallocate blocks for holes, we fallocate blocks |
| - * The fallocated extent for hole is marked as uninitialized |
| + * We could direct write to holes and fallocate. |
| + * |
| + * Allocated blocks to fill the hole are marked as uninitialized |
| * to prevent paralel buffered read to expose the stale data |
| * before DIO complete the data IO. |
| - * as for previously fallocated extents, ext4 get_block |
| + * |
| + * As to previously fallocated extents, ext4 get_block |
| * will just simply mark the buffer mapped but still |
| * keep the extents uninitialized. |
| * |
| - * At the end of IO, the ext4 end_io callback function |
| - * will convert those unwritten extents to written, |
| - * |
| + * for non AIO case, we will convert those unwritten extents |
| + * to written after return back from blockdev_direct_IO. |
| + * |
| + * for async DIO, the conversion needs to be defered when |
| + * the IO is completed. The ext4 end_io callback function |
| + * will be called to take care of the conversion work. |
| + * Here for async case, we allocate an io_end structure to |
| + * hook to the iocb. |
| */ |
| - iocb->private = ext4_init_io_end(inode, !is_sync_kiocb(iocb)); |
| - if (!iocb->private) |
| - return -ENOMEM; |
| + iocb->private = NULL; |
| + EXT4_I(inode)->cur_aio_dio = NULL; |
| + if (!is_sync_kiocb(iocb)) { |
| + iocb->private = ext4_init_io_end(inode); |
| + if (!iocb->private) |
| + return -ENOMEM; |
| + /* |
| + * we save the io structure for current async |
| + * direct IO, so that later ext4_get_blocks() |
| + * could flag the io structure whether there |
| + * is a unwritten extents needs to be converted |
| + * when IO is completed. |
| + */ |
| + EXT4_I(inode)->cur_aio_dio = iocb->private; |
| + } |
| + |
| ret = blockdev_direct_IO(rw, iocb, inode, |
| inode->i_sb->s_bdev, iov, |
| offset, nr_segs, |
| ext4_get_block_dio_write, |
| ext4_end_io_dio); |
| + if (iocb->private) |
| + EXT4_I(inode)->cur_aio_dio = NULL; |
| + /* |
| + * The io_end structure takes a reference to the inode, |
| + * that structure needs to be destroyed and the |
| + * reference to the inode need to be dropped, when IO is |
| + * complete, even with 0 byte write, or failed. |
| + * |
| + * In the successful AIO DIO case, the io_end structure will be |
| + * desctroyed and the reference to the inode will be dropped |
| + * after the end_io call back function is called. |
| + * |
| + * In the case there is 0 byte write, or error case, since |
| + * VFS direct IO won't invoke the end_io call back function, |
| + * we need to free the end_io structure here. |
| + */ |
| + if (ret != -EIOCBQUEUED && ret <= 0 && iocb->private) { |
| + ext4_free_io_end(iocb->private); |
| + iocb->private = NULL; |
| + } else if (ret > 0) |
| + /* |
| + * for non AIO case, since the IO is already |
| + * completed, we could do the convertion right here |
| + */ |
| + ret = ext4_convert_unwritten_extents(inode, |
| + offset, ret); |
| return ret; |
| } |
| + |
| + /* for write the the end of file case, we fall back to old way */ |
| return ext4_ind_direct_IO(rw, iocb, iov, offset, nr_segs); |
| } |
| |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -687,6 +687,8 @@ static struct inode *ext4_alloc_inode(st |
| ei->i_allocated_meta_blocks = 0; |
| ei->i_delalloc_reserved_flag = 0; |
| spin_lock_init(&(ei->i_block_reservation_lock)); |
| + INIT_LIST_HEAD(&ei->i_aio_dio_complete_list); |
| + ei->cur_aio_dio = NULL; |
| |
| return &ei->vfs_inode; |
| } |
| @@ -3383,11 +3385,13 @@ static int ext4_sync_fs(struct super_blo |
| { |
| int ret = 0; |
| tid_t target; |
| + struct ext4_sb_info *sbi = EXT4_SB(sb); |
| |
| trace_ext4_sync_fs(sb, wait); |
| - if (jbd2_journal_start_commit(EXT4_SB(sb)->s_journal, &target)) { |
| + flush_workqueue(sbi->dio_unwritten_wq); |
| + if (jbd2_journal_start_commit(sbi->s_journal, &target)) { |
| if (wait) |
| - jbd2_log_wait_commit(EXT4_SB(sb)->s_journal, target); |
| + jbd2_log_wait_commit(sbi->s_journal, target); |
| } |
| return ret; |
| } |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:58 2009 |
| Message-Id: <20091211042757.491467570@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:23 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [45/90] ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0045-ext4-EXT4_IOC_MOVE_EXT-Check-for-different-original-.patch |
| Content-Length: 1529 |
| Lines: 45 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit f3ce8064b388ccf420012c5a4907aae4f13fe9d0) |
| |
| Move the check to make sure the original and donor inodes are |
| different earlier, to avoid a potential deadlock by trying to lock the |
| same inode twice. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 16 ++++++++-------- |
| 1 file changed, 8 insertions(+), 8 deletions(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -1001,14 +1001,6 @@ mext_check_arguments(struct inode *orig_ |
| return -EINVAL; |
| } |
| |
| - /* orig and donor should be different file */ |
| - if (orig_inode->i_ino == donor_inode->i_ino) { |
| - ext4_debug("ext4 move extent: The argument files should not " |
| - "be same file [ino:orig %lu, donor %lu]\n", |
| - orig_inode->i_ino, donor_inode->i_ino); |
| - return -EINVAL; |
| - } |
| - |
| /* Ext4 move extent supports only extent based file */ |
| if (!(EXT4_I(orig_inode)->i_flags & EXT4_EXTENTS_FL)) { |
| ext4_debug("ext4 move extent: orig file is not extents " |
| @@ -1232,6 +1224,14 @@ ext4_move_extents(struct file *o_filp, s |
| int block_len_in_page; |
| int uninit; |
| |
| + /* orig and donor should be different file */ |
| + if (orig_inode->i_ino == donor_inode->i_ino) { |
| + ext4_debug("ext4 move extent: The argument files should not " |
| + "be same file [ino:orig %lu, donor %lu]\n", |
| + orig_inode->i_ino, donor_inode->i_ino); |
| + return -EINVAL; |
| + } |
| + |
| /* protect orig and donor against a truncate */ |
| ret1 = mext_inode_double_lock(orig_inode, donor_inode); |
| if (ret1 < 0) |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:58 2009 |
| Message-Id: <20091211042758.055563900@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:24 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Frank Mayhar <fmayhar@google.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [46/90] ext4: Avoid updating the inode table bh twice in no journal mode |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0046-ext4-Avoid-updating-the-inode-table-bh-twice-in-no-j.patch |
| Content-Length: 2805 |
| Lines: 84 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 830156c79b0a99ddf0f62496bcf4de640f9f52cd) |
| |
| This is a cleanup of commit 91ac6f4. Since ext4_mark_inode_dirty() |
| has already called ext4_mark_iloc_dirty(), which in turn calls |
| ext4_do_update_inode(), it's not necessary to have ext4_write_inode() |
| call ext4_do_update_inode() in no journal mode. Indeed, it would be |
| duplicated work. |
| |
| Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> |
| Signed-off-by: Frank Mayhar <fmayhar@google.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 37 ++++++++++++++++--------------------- |
| 1 file changed, 16 insertions(+), 21 deletions(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -4981,8 +4981,7 @@ static int ext4_inode_blocks_set(handle_ |
| */ |
| static int ext4_do_update_inode(handle_t *handle, |
| struct inode *inode, |
| - struct ext4_iloc *iloc, |
| - int do_sync) |
| + struct ext4_iloc *iloc) |
| { |
| struct ext4_inode *raw_inode = ext4_raw_inode(iloc); |
| struct ext4_inode_info *ei = EXT4_I(inode); |
| @@ -5083,22 +5082,10 @@ static int ext4_do_update_inode(handle_t |
| raw_inode->i_extra_isize = cpu_to_le16(ei->i_extra_isize); |
| } |
| |
| - /* |
| - * If we're not using a journal and we were called from |
| - * ext4_write_inode() to sync the inode (making do_sync true), |
| - * we can just use sync_dirty_buffer() directly to do our dirty |
| - * work. Testing s_journal here is a bit redundant but it's |
| - * worth it to avoid potential future trouble. |
| - */ |
| - if (EXT4_SB(inode->i_sb)->s_journal == NULL && do_sync) { |
| - BUFFER_TRACE(bh, "call sync_dirty_buffer"); |
| - sync_dirty_buffer(bh); |
| - } else { |
| - BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata"); |
| - rc = ext4_handle_dirty_metadata(handle, inode, bh); |
| - if (!err) |
| - err = rc; |
| - } |
| + BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata"); |
| + rc = ext4_handle_dirty_metadata(handle, inode, bh); |
| + if (!err) |
| + err = rc; |
| ei->i_state &= ~EXT4_STATE_NEW; |
| |
| out_brelse: |
| @@ -5166,8 +5153,16 @@ int ext4_write_inode(struct inode *inode |
| err = ext4_get_inode_loc(inode, &iloc); |
| if (err) |
| return err; |
| - err = ext4_do_update_inode(EXT4_NOJOURNAL_HANDLE, |
| - inode, &iloc, wait); |
| + if (wait) |
| + sync_dirty_buffer(iloc.bh); |
| + if (buffer_req(iloc.bh) && !buffer_uptodate(iloc.bh)) { |
| + ext4_error(inode->i_sb, __func__, |
| + "IO error syncing inode, " |
| + "inode=%lu, block=%llu", |
| + inode->i_ino, |
| + (unsigned long long)iloc.bh->b_blocknr); |
| + err = -EIO; |
| + } |
| } |
| return err; |
| } |
| @@ -5463,7 +5458,7 @@ int ext4_mark_iloc_dirty(handle_t *handl |
| get_bh(iloc->bh); |
| |
| /* ext4_do_update_inode() does jbd2_journal_dirty_metadata */ |
| - err = ext4_do_update_inode(handle, inode, iloc, 0); |
| + err = ext4_do_update_inode(handle, inode, iloc); |
| put_bh(iloc->bh); |
| return err; |
| } |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:59 2009 |
| Message-Id: <20091211042758.666491884@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:25 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Curt Wohlgemuth <curtw@google.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [47/90] ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0047-ext4-Make-sure-ext4_dirty_inode-updates-the-inode-in.patch |
| Content-Length: 1452 |
| Lines: 48 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit f3dc272fd5e2ae08244796bb39e7e1ce4b25d3b3) |
| |
| This patch a problem that ext4_dirty_inode() was not calling |
| ext4_mark_inode_dirty() if the current_handle is not valid, which it |
| is the case in no journal mode. |
| |
| It also removes a test for non-matching transaction which can never |
| happen. |
| |
| Signed-off-by: Curt Wohlgemuth <curtw@google.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 19 ++++--------------- |
| 1 file changed, 4 insertions(+), 15 deletions(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -5605,24 +5605,13 @@ void ext4_dirty_inode(struct inode *inod |
| handle_t *current_handle = ext4_journal_current_handle(); |
| handle_t *handle; |
| |
| - if (!ext4_handle_valid(current_handle)) { |
| - ext4_mark_inode_dirty(current_handle, inode); |
| - return; |
| - } |
| - |
| handle = ext4_journal_start(inode, 2); |
| if (IS_ERR(handle)) |
| goto out; |
| - if (current_handle && |
| - current_handle->h_transaction != handle->h_transaction) { |
| - /* This task has a transaction open against a different fs */ |
| - printk(KERN_EMERG "%s: transactions do not match!\n", |
| - __func__); |
| - } else { |
| - jbd_debug(5, "marking dirty. outer handle=%p\n", |
| - current_handle); |
| - ext4_mark_inode_dirty(handle, inode); |
| - } |
| + |
| + jbd_debug(5, "marking dirty. outer handle=%p\n", current_handle); |
| + ext4_mark_inode_dirty(handle, inode); |
| + |
| ext4_journal_stop(handle); |
| out: |
| return; |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:59 2009 |
| Message-Id: <20091211042759.257547227@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:26 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Curt Wohlgemuth <curtw@google.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [48/90] ext4: Handle nested ext4_journal_start/stop calls without a journal |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0048-ext4-Handle-nested-ext4_journal_start-stop-calls-wit.patch |
| Content-Length: 3068 |
| Lines: 110 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit d3d1faf6a74496ea4435fd057c6a2cad49f3e523) |
| |
| This patch fixes a problem with handling nested calls to |
| ext4_journal_start/ext4_journal_stop, when there is no journal present. |
| |
| Signed-off-by: Curt Wohlgemuth <curtw@google.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4_jbd2.h | 6 ++++-- |
| fs/ext4/namei.c | 3 ++- |
| fs/ext4/super.c | 42 ++++++++++++++++++++++++++++++++---------- |
| 3 files changed, 38 insertions(+), 13 deletions(-) |
| |
| --- a/fs/ext4/ext4_jbd2.h |
| +++ b/fs/ext4/ext4_jbd2.h |
| @@ -161,11 +161,13 @@ int __ext4_handle_dirty_metadata(const c |
| handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks); |
| int __ext4_journal_stop(const char *where, handle_t *handle); |
| |
| -#define EXT4_NOJOURNAL_HANDLE ((handle_t *) 0x1) |
| +#define EXT4_NOJOURNAL_MAX_REF_COUNT ((unsigned long) 4096) |
| |
| +/* Note: Do not use this for NULL handles. This is only to determine if |
| + * a properly allocated handle is using a journal or not. */ |
| static inline int ext4_handle_valid(handle_t *handle) |
| { |
| - if (handle == EXT4_NOJOURNAL_HANDLE) |
| + if ((unsigned long)handle < EXT4_NOJOURNAL_MAX_REF_COUNT) |
| return 0; |
| return 1; |
| } |
| --- a/fs/ext4/namei.c |
| +++ b/fs/ext4/namei.c |
| @@ -2068,7 +2068,8 @@ int ext4_orphan_del(handle_t *handle, st |
| struct ext4_iloc iloc; |
| int err = 0; |
| |
| - if (!ext4_handle_valid(handle)) |
| + /* ext4_handle_valid() assumes a valid handle_t pointer */ |
| + if (handle && !ext4_handle_valid(handle)) |
| return 0; |
| |
| mutex_lock(&EXT4_SB(inode->i_sb)->s_orphan_lock); |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -189,6 +189,36 @@ void ext4_itable_unused_set(struct super |
| bg->bg_itable_unused_hi = cpu_to_le16(count >> 16); |
| } |
| |
| + |
| +/* Just increment the non-pointer handle value */ |
| +static handle_t *ext4_get_nojournal(void) |
| +{ |
| + handle_t *handle = current->journal_info; |
| + unsigned long ref_cnt = (unsigned long)handle; |
| + |
| + BUG_ON(ref_cnt >= EXT4_NOJOURNAL_MAX_REF_COUNT); |
| + |
| + ref_cnt++; |
| + handle = (handle_t *)ref_cnt; |
| + |
| + current->journal_info = handle; |
| + return handle; |
| +} |
| + |
| + |
| +/* Decrement the non-pointer handle value */ |
| +static void ext4_put_nojournal(handle_t *handle) |
| +{ |
| + unsigned long ref_cnt = (unsigned long)handle; |
| + |
| + BUG_ON(ref_cnt == 0); |
| + |
| + ref_cnt--; |
| + handle = (handle_t *)ref_cnt; |
| + |
| + current->journal_info = handle; |
| +} |
| + |
| /* |
| * Wrappers for jbd2_journal_start/end. |
| * |
| @@ -215,11 +245,7 @@ handle_t *ext4_journal_start_sb(struct s |
| } |
| return jbd2_journal_start(journal, nblocks); |
| } |
| - /* |
| - * We're not journaling, return the appropriate indication. |
| - */ |
| - current->journal_info = EXT4_NOJOURNAL_HANDLE; |
| - return current->journal_info; |
| + return ext4_get_nojournal(); |
| } |
| |
| /* |
| @@ -235,11 +261,7 @@ int __ext4_journal_stop(const char *wher |
| int rc; |
| |
| if (!ext4_handle_valid(handle)) { |
| - /* |
| - * Do this here since we don't call jbd2_journal_stop() in |
| - * no-journal mode. |
| - */ |
| - current->journal_info = NULL; |
| + ext4_put_nojournal(handle); |
| return 0; |
| } |
| sb = handle->h_transaction->t_journal->j_private; |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:00 2009 |
| Message-Id: <20091211042759.784398525@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:27 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [49/90] ext4: Fix time encoding with extra epoch bits |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0049-ext4-Fix-time-encoding-with-extra-epoch-bits.patch |
| Content-Length: 1431 |
| Lines: 37 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit c1fccc0696bcaff6008c11865091f5ec4b0937ab) |
| |
| "Looking at ext4.h, I think the setting of extra time fields forgets to |
| mask the epoch bits so the epoch part overwrites nsec part. The second |
| change is only for coherency (2 -> EXT4_EPOCH_BITS)." |
| |
| Thanks to Damien Guibouret for pointing out this problem. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4.h | 6 +++--- |
| 1 file changed, 3 insertions(+), 3 deletions(-) |
| |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -522,8 +522,8 @@ struct move_extent { |
| static inline __le32 ext4_encode_extra_time(struct timespec *time) |
| { |
| return cpu_to_le32((sizeof(time->tv_sec) > 4 ? |
| - time->tv_sec >> 32 : 0) | |
| - ((time->tv_nsec << 2) & EXT4_NSEC_MASK)); |
| + (time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) | |
| + ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK)); |
| } |
| |
| static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra) |
| @@ -531,7 +531,7 @@ static inline void ext4_decode_extra_tim |
| if (sizeof(time->tv_sec) > 4) |
| time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) |
| << 32; |
| - time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> 2; |
| + time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; |
| } |
| |
| #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode) \ |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:00 2009 |
| Message-Id: <20091211042800.341947927@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:28 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [50/90] ext4: fix a BUG_ON crash by checking that page has buffers attached to it |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0050-ext4-fix-a-BUG_ON-crash-by-checking-that-page-has-bu.patch |
| Content-Length: 1530 |
| Lines: 53 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 1f94533d9cd75f6d2826018d54a971b9cc085992) |
| |
| In ext4_num_dirty_pages() we were calling page_buffers() before |
| checking to see if the page actually had pages attached to it; this |
| would cause a BUG check crash in the inline function page_buffers(). |
| |
| Thanks to Markus Trippelsdorf for reporting this bug. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 22 +++++++++++----------- |
| 1 file changed, 11 insertions(+), 11 deletions(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -1147,8 +1147,8 @@ static int check_block_validity(struct i |
| } |
| |
| /* |
| - * Return the number of dirty pages in the given inode starting at |
| - * page frame idx. |
| + * Return the number of contiguous dirty pages in a given inode |
| + * starting at page frame idx. |
| */ |
| static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx, |
| unsigned int max_pages) |
| @@ -1182,15 +1182,15 @@ static pgoff_t ext4_num_dirty_pages(stru |
| unlock_page(page); |
| break; |
| } |
| - head = page_buffers(page); |
| - bh = head; |
| - do { |
| - if (!buffer_delay(bh) && |
| - !buffer_unwritten(bh)) { |
| - done = 1; |
| - break; |
| - } |
| - } while ((bh = bh->b_this_page) != head); |
| + if (page_has_buffers(page)) { |
| + bh = head = page_buffers(page); |
| + do { |
| + if (!buffer_delay(bh) && |
| + !buffer_unwritten(bh)) |
| + done = 1; |
| + bh = bh->b_this_page; |
| + } while (!done && (bh != head)); |
| + } |
| unlock_page(page); |
| if (done) |
| break; |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:01 2009 |
| Message-Id: <20091211042800.901979632@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:29 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Eric Sandeen <sandeen@redhat.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [51/90] ext4: retry failed direct IO allocations |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0051-ext4-retry-failed-direct-IO-allocations.patch |
| Content-Length: 1203 |
| Lines: 44 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit fbbf69456619de5d251cb9f1df609069178c62d5) |
| |
| On a 256M filesystem, doing this in a loop: |
| |
| xfs_io -F -f -d -c 'pwrite 0 64m' test |
| rm -f test |
| |
| eventually leads to ENOSPC. (the xfs_io command does a |
| 64m direct IO write to the file "test") |
| |
| As with other block allocation callers, it looks like we need to |
| potentially retry the allocations on the initial ENOSPC. |
| |
| Signed-off-by: Eric Sandeen <sandeen@redhat.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 4 ++++ |
| 1 file changed, 4 insertions(+) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -3372,6 +3372,7 @@ static ssize_t ext4_ind_direct_IO(int rw |
| ssize_t ret; |
| int orphan = 0; |
| size_t count = iov_length(iov, nr_segs); |
| + int retries = 0; |
| |
| if (rw == WRITE) { |
| loff_t final_size = offset + count; |
| @@ -3394,9 +3395,12 @@ static ssize_t ext4_ind_direct_IO(int rw |
| } |
| } |
| |
| +retry: |
| ret = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, |
| offset, nr_segs, |
| ext4_get_block, NULL); |
| + if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) |
| + goto retry; |
| |
| if (orphan) { |
| int err; |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:01 2009 |
| Message-Id: <20091211042801.408749880@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:30 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [52/90] ext4: discard preallocation when restarting a transaction during truncate |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0052-ext4-discard-preallocation-when-restarting-a-transac.patch |
| Content-Length: 1276 |
| Lines: 35 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit fa5d11133b07053270e18fa9c18560e66e79217e) |
| |
| When restart a transaction during a truncate operation, we drop and |
| reacquire i_data_sem. After reacquiring i_data_sem, we need to |
| discard any inode-based preallocation that might have been grabbed |
| while we released i_data_sem (for example, if pdflush is allocating |
| blocks and racing against the truncate). |
| |
| Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 3 ++- |
| 1 file changed, 2 insertions(+), 1 deletion(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -193,7 +193,7 @@ static int try_to_extend_transaction(han |
| * so before we call here everything must be consistently dirtied against |
| * this transaction. |
| */ |
| - int ext4_truncate_restart_trans(handle_t *handle, struct inode *inode, |
| +int ext4_truncate_restart_trans(handle_t *handle, struct inode *inode, |
| int nblocks) |
| { |
| int ret; |
| @@ -209,6 +209,7 @@ static int try_to_extend_transaction(han |
| up_write(&EXT4_I(inode)->i_data_sem); |
| ret = ext4_journal_restart(handle, blocks_for_truncate(inode)); |
| down_write(&EXT4_I(inode)->i_data_sem); |
| + ext4_discard_preallocations(inode); |
| |
| return ret; |
| } |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:02 2009 |
| Message-Id: <20091211042802.010530397@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:31 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Mingming Cao <cmm@us.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [53/90] ext4: fix ext4_ext_direct_IO()s return value after converting uninit extents |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0053-ext4-fix-ext4_ext_direct_IO-s-return-value-after-con.patch |
| Content-Length: 1858 |
| Lines: 55 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 109f55651954def97fa41ee71c464d268c512ab0) |
| |
| After a direct I/O request covering an uninitalized extent (i.e., |
| created using the fallocate system call) or a hole in a file, ext4 |
| will convert the uninitialized extent so it is marked as initialized |
| by calling ext4_convert_unwritten_extents(). This function returns |
| zero on success. |
| |
| This return value was getting returned by ext4_direct_IO(); however |
| the file system's direct_IO function is supposed to return the number |
| of bytes read or written on a success. By returning zero, it confused |
| the direct I/O code into falling back to buffered I/O unnecessarily. |
| |
| Signed-off-by: Mingming Cao <cmm@us.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/extents.c | 1 + |
| fs/ext4/inode.c | 10 +++++++--- |
| 2 files changed, 8 insertions(+), 3 deletions(-) |
| |
| --- a/fs/ext4/extents.c |
| +++ b/fs/ext4/extents.c |
| @@ -3496,6 +3496,7 @@ retry: |
| * |
| * This function is called from the direct IO end io call back |
| * function, to convert the fallocated extents after IO is completed. |
| + * Returns 0 on success. |
| */ |
| int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset, |
| loff_t len) |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -3766,13 +3766,17 @@ static ssize_t ext4_ext_direct_IO(int rw |
| if (ret != -EIOCBQUEUED && ret <= 0 && iocb->private) { |
| ext4_free_io_end(iocb->private); |
| iocb->private = NULL; |
| - } else if (ret > 0) |
| + } else if (ret > 0) { |
| + int err; |
| /* |
| * for non AIO case, since the IO is already |
| * completed, we could do the convertion right here |
| */ |
| - ret = ext4_convert_unwritten_extents(inode, |
| - offset, ret); |
| + err = ext4_convert_unwritten_extents(inode, |
| + offset, ret); |
| + if (err < 0) |
| + ret = err; |
| + } |
| return ret; |
| } |
| |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:03 2009 |
| Message-Id: <20091211042802.562626729@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:32 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Mingming Cao <cmm@us.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [54/90] ext4: skip conversion of uninit extents after direct IO if there isnt any |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0054-ext4-skip-conversion-of-uninit-extents-after-direct-.patch |
| Content-Length: 3314 |
| Lines: 92 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 5f5249507e4b5c4fc0f9c93f33d133d8c95f47e1) |
| |
| At the end of direct I/O operation, ext4_ext_direct_IO() always called |
| ext4_convert_unwritten_extents(), regardless of whether there were any |
| unwritten extents involved in the I/O or not. |
| |
| This commit adds a state flag so that ext4_ext_direct_IO() only calls |
| ext4_convert_unwritten_extents() when necessary. |
| |
| Signed-off-by: Mingming Cao <cmm@us.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4.h | 1 + |
| fs/ext4/extents.c | 22 +++++++++++++++++----- |
| fs/ext4/inode.c | 4 +++- |
| 3 files changed, 21 insertions(+), 6 deletions(-) |
| |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -318,6 +318,7 @@ static inline __u32 ext4_mask_flags(umod |
| #define EXT4_STATE_NO_EXPAND 0x00000008 /* No space for expansion */ |
| #define EXT4_STATE_DA_ALLOC_CLOSE 0x00000010 /* Alloc DA blks on close */ |
| #define EXT4_STATE_EXT_MIGRATE 0x00000020 /* Inode is migrating */ |
| +#define EXT4_STATE_DIO_UNWRITTEN 0x00000040 /* need convert on dio done*/ |
| |
| /* Used to pass group descriptor data when online resize is done */ |
| struct ext4_new_group_input { |
| --- a/fs/ext4/extents.c |
| +++ b/fs/ext4/extents.c |
| @@ -3025,12 +3025,18 @@ ext4_ext_handle_uninitialized_extents(ha |
| ret = ext4_split_unwritten_extents(handle, |
| inode, path, iblock, |
| max_blocks, flags); |
| - /* flag the io_end struct that we need convert when IO done */ |
| + /* |
| + * Flag the inode(non aio case) or end_io struct (aio case) |
| + * that this IO needs to convertion to written when IO is |
| + * completed |
| + */ |
| if (io) |
| io->flag = DIO_AIO_UNWRITTEN; |
| + else |
| + EXT4_I(inode)->i_state |= EXT4_STATE_DIO_UNWRITTEN; |
| goto out; |
| } |
| - /* DIO end_io complete, convert the filled extent to written */ |
| + /* async DIO end_io complete, convert the filled extent to written */ |
| if (flags == EXT4_GET_BLOCKS_DIO_CONVERT_EXT) { |
| ret = ext4_convert_unwritten_extents_dio(handle, inode, |
| path); |
| @@ -3272,10 +3278,16 @@ int ext4_ext_get_blocks(handle_t *handle |
| * To avoid unecessary convertion for every aio dio rewrite |
| * to the mid of file, here we flag the IO that is really |
| * need the convertion. |
| - * |
| + * For non asycn direct IO case, flag the inode state |
| + * that we need to perform convertion when IO is done. |
| */ |
| - if (io && flags == EXT4_GET_BLOCKS_DIO_CREATE_EXT) |
| - io->flag = DIO_AIO_UNWRITTEN; |
| + if (flags == EXT4_GET_BLOCKS_DIO_CREATE_EXT) { |
| + if (io) |
| + io->flag = DIO_AIO_UNWRITTEN; |
| + else |
| + EXT4_I(inode)->i_state |= |
| + EXT4_STATE_DIO_UNWRITTEN;; |
| + } |
| } |
| err = ext4_ext_insert_extent(handle, inode, path, &newex, flags); |
| if (err) { |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -3766,7 +3766,8 @@ static ssize_t ext4_ext_direct_IO(int rw |
| if (ret != -EIOCBQUEUED && ret <= 0 && iocb->private) { |
| ext4_free_io_end(iocb->private); |
| iocb->private = NULL; |
| - } else if (ret > 0) { |
| + } else if (ret > 0 && (EXT4_I(inode)->i_state & |
| + EXT4_STATE_DIO_UNWRITTEN)) { |
| int err; |
| /* |
| * for non AIO case, since the IO is already |
| @@ -3776,6 +3777,7 @@ static ssize_t ext4_ext_direct_IO(int rw |
| offset, ret); |
| if (err < 0) |
| ret = err; |
| + EXT4_I(inode)->i_state &= ~EXT4_STATE_DIO_UNWRITTEN; |
| } |
| return ret; |
| } |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:03 2009 |
| Message-Id: <20091211042803.182022052@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:33 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Mingming Cao <cmm@us.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [55/90] ext4: code clean up for dio fallocate handling |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0055-ext4-code-clean-up-for-dio-fallocate-handling.patch |
| Content-Length: 1564 |
| Lines: 48 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 4b70df181611012a3556f017b57dfcef7e1d279f) |
| |
| The ext4_debug() call in ext4_end_io_dio() should be moved after the |
| check to make sure that io_end is non-NULL. |
| |
| The comment above ext4_get_block_dio_write() ("Maximum number of |
| blocks...") is a duplicate; the original and correct comment is above |
| the #define DIO_MAX_BLOCKS up above. |
| |
| Based on review comments from Curt Wohlgemuth. |
| |
| Signed-off-by: Mingming Cao <cmm@us.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 9 ++++----- |
| 1 file changed, 4 insertions(+), 5 deletions(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -3440,8 +3440,6 @@ out: |
| return ret; |
| } |
| |
| -/* Maximum number of blocks we map for direct IO at once. */ |
| - |
| static int ext4_get_block_dio_write(struct inode *inode, sector_t iblock, |
| struct buffer_head *bh_result, int create) |
| { |
| @@ -3649,13 +3647,14 @@ static void ext4_end_io_dio(struct kiocb |
| ext4_io_end_t *io_end = iocb->private; |
| struct workqueue_struct *wq; |
| |
| + /* if not async direct IO or dio with 0 bytes write, just return */ |
| + if (!io_end || !size) |
| + return; |
| + |
| ext_debug("ext4_end_io_dio(): io_end 0x%p" |
| "for inode %lu, iocb 0x%p, offset %llu, size %llu\n", |
| iocb->private, io_end->inode->i_ino, iocb, offset, |
| size); |
| - /* if not async direct IO or dio with 0 bytes write, just return */ |
| - if (!io_end || !size) |
| - return; |
| |
| /* if not aio dio with unwritten extents, just free io and return */ |
| if (io_end->flag != DIO_AIO_UNWRITTEN){ |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:04 2009 |
| Message-Id: <20091211042803.809439347@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:34 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Mingming Cao <cmm@us.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [56/90] ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0056-ext4-Fix-return-value-of-ext4_split_unwritten_extent.patch |
| Content-Length: 2272 |
| Lines: 58 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit ba230c3f6dc88ec008806adb27b12088486d508e) |
| |
| To prepare for a direct I/O write, we need to split the unwritten |
| extents before submitting the I/O. When no extents needed to be |
| split, ext4_split_unwritten_extents() was incorrectly returning 0 |
| instead of the size of uninitialized extents. This bug caused the |
| wrong return value sent back to VFS code when it gets called from |
| async IO path, leading to an unnecessary fall back to buffered IO. |
| |
| This bug also hid the fact that the check to see whether or not a |
| split would be necessary was incorrect; we can only skip splitting the |
| extent if the write completely covers the uninitialized extent. |
| |
| Signed-off-by: Mingming Cao <cmm@us.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/extents.c | 13 +++++++------ |
| 1 file changed, 7 insertions(+), 6 deletions(-) |
| |
| --- a/fs/ext4/extents.c |
| +++ b/fs/ext4/extents.c |
| @@ -2784,6 +2784,8 @@ fix_extent_len: |
| * into three uninitialized extent(at most). After IO complete, the part |
| * being filled will be convert to initialized by the end_io callback function |
| * via ext4_convert_unwritten_extents(). |
| + * |
| + * Returns the size of uninitialized extent to be written on success. |
| */ |
| static int ext4_split_unwritten_extents(handle_t *handle, |
| struct inode *inode, |
| @@ -2801,7 +2803,6 @@ static int ext4_split_unwritten_extents( |
| unsigned int allocated, ee_len, depth; |
| ext4_fsblk_t newblock; |
| int err = 0; |
| - int ret = 0; |
| |
| ext_debug("ext4_split_unwritten_extents: inode %lu," |
| "iblock %llu, max_blocks %u\n", inode->i_ino, |
| @@ -2819,12 +2820,12 @@ static int ext4_split_unwritten_extents( |
| ext4_ext_store_pblock(&orig_ex, ext_pblock(ex)); |
| |
| /* |
| - * if the entire unintialized extent length less than |
| - * the size of extent to write, there is no need to split |
| - * uninitialized extent |
| + * If the uninitialized extent begins at the same logical |
| + * block where the write begins, and the write completely |
| + * covers the extent, then we don't need to split it. |
| */ |
| - if (allocated <= max_blocks) |
| - return ret; |
| + if ((iblock == ee_block) && (allocated <= max_blocks)) |
| + return allocated; |
| |
| err = ext4_ext_get_access(handle, inode, path + depth); |
| if (err) |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:04 2009 |
| Message-Id: <20091211042804.362034552@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:35 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Curt Wohlgemuth <curtw@google.com>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [57/90] ext4: fix potential buffer head leak when add_dirent_to_buf() returns ENOSPC |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0057-ext4-fix-potential-buffer-head-leak-when-add_dirent_.patch |
| Content-Length: 3833 |
| Lines: 118 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 2de770a406b06dfc619faabbf5d85c835ed3f2e1) |
| |
| Previously add_dirent_to_buf() did not free its passed-in buffer head |
| in the case of ENOSPC, since in some cases the caller still needed it. |
| However, this led to potential buffer head leaks since not all callers |
| dealt with this correctly. Fix this by making simplifying the freeing |
| convention; now add_dirent_to_buf() *never* frees the passed-in buffer |
| head, and leaves that to the responsibility of its caller. This makes |
| things cleaner and easier to prove that the code is neither leaking |
| buffer heads or calling brelse() one time too many. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Cc: Curt Wohlgemuth <curtw@google.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/namei.c | 30 ++++++++++++------------------ |
| 1 file changed, 12 insertions(+), 18 deletions(-) |
| |
| --- a/fs/ext4/namei.c |
| +++ b/fs/ext4/namei.c |
| @@ -1292,9 +1292,6 @@ errout: |
| * add_dirent_to_buf will attempt search the directory block for |
| * space. It will return -ENOSPC if no space is available, and -EIO |
| * and -EEXIST if directory entry already exists. |
| - * |
| - * NOTE! bh is NOT released in the case where ENOSPC is returned. In |
| - * all other cases bh is released. |
| */ |
| static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry, |
| struct inode *inode, struct ext4_dir_entry_2 *de, |
| @@ -1315,14 +1312,10 @@ static int add_dirent_to_buf(handle_t *h |
| top = bh->b_data + blocksize - reclen; |
| while ((char *) de <= top) { |
| if (!ext4_check_dir_entry("ext4_add_entry", dir, de, |
| - bh, offset)) { |
| - brelse(bh); |
| + bh, offset)) |
| return -EIO; |
| - } |
| - if (ext4_match(namelen, name, de)) { |
| - brelse(bh); |
| + if (ext4_match(namelen, name, de)) |
| return -EEXIST; |
| - } |
| nlen = EXT4_DIR_REC_LEN(de->name_len); |
| rlen = ext4_rec_len_from_disk(de->rec_len, blocksize); |
| if ((de->inode? rlen - nlen: rlen) >= reclen) |
| @@ -1337,7 +1330,6 @@ static int add_dirent_to_buf(handle_t *h |
| err = ext4_journal_get_write_access(handle, bh); |
| if (err) { |
| ext4_std_error(dir->i_sb, err); |
| - brelse(bh); |
| return err; |
| } |
| |
| @@ -1377,7 +1369,6 @@ static int add_dirent_to_buf(handle_t *h |
| err = ext4_handle_dirty_metadata(handle, dir, bh); |
| if (err) |
| ext4_std_error(dir->i_sb, err); |
| - brelse(bh); |
| return 0; |
| } |
| |
| @@ -1471,7 +1462,9 @@ static int make_indexed_dir(handle_t *ha |
| if (!(de)) |
| return retval; |
| |
| - return add_dirent_to_buf(handle, dentry, inode, de, bh); |
| + retval = add_dirent_to_buf(handle, dentry, inode, de, bh); |
| + brelse(bh); |
| + return retval; |
| } |
| |
| /* |
| @@ -1514,8 +1507,10 @@ static int ext4_add_entry(handle_t *hand |
| if(!bh) |
| return retval; |
| retval = add_dirent_to_buf(handle, dentry, inode, NULL, bh); |
| - if (retval != -ENOSPC) |
| + if (retval != -ENOSPC) { |
| + brelse(bh); |
| return retval; |
| + } |
| |
| if (blocks == 1 && !dx_fallback && |
| EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_DIR_INDEX)) |
| @@ -1528,7 +1523,9 @@ static int ext4_add_entry(handle_t *hand |
| de = (struct ext4_dir_entry_2 *) bh->b_data; |
| de->inode = 0; |
| de->rec_len = ext4_rec_len_to_disk(blocksize, blocksize); |
| - return add_dirent_to_buf(handle, dentry, inode, de, bh); |
| + retval = add_dirent_to_buf(handle, dentry, inode, de, bh); |
| + brelse(bh); |
| + return retval; |
| } |
| |
| /* |
| @@ -1561,10 +1558,8 @@ static int ext4_dx_add_entry(handle_t *h |
| goto journal_error; |
| |
| err = add_dirent_to_buf(handle, dentry, inode, NULL, bh); |
| - if (err != -ENOSPC) { |
| - bh = NULL; |
| + if (err != -ENOSPC) |
| goto cleanup; |
| - } |
| |
| /* Block full, should compress but for now just split */ |
| dxtrace(printk(KERN_DEBUG "using %u of %u node entries\n", |
| @@ -1657,7 +1652,6 @@ static int ext4_dx_add_entry(handle_t *h |
| if (!de) |
| goto cleanup; |
| err = add_dirent_to_buf(handle, dentry, inode, de, bh); |
| - bh = NULL; |
| goto cleanup; |
| |
| journal_error: |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:05 2009 |
| Message-Id: <20091211042804.949254622@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:36 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [58/90] ext4: avoid divide by zero when trying to mount a corrupted file system |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0058-ext4-avoid-divide-by-zero-when-trying-to-mount-a-cor.patch |
| Content-Length: 1267 |
| Lines: 39 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 503358ae01b70ce6909d19dd01287093f6b6271c) |
| |
| If s_log_groups_per_flex is greater than 31, then groups_per_flex will |
| will overflow and cause a divide by zero error. This can cause kernel |
| BUG if such a file system is mounted. |
| |
| Thanks to Nageswara R Sastry for analyzing the failure and providing |
| an initial patch. |
| |
| http://bugzilla.kernel.org/show_bug.cgi?id=14287 |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/super.c | 8 ++++---- |
| 1 file changed, 4 insertions(+), 4 deletions(-) |
| |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -1695,14 +1695,14 @@ static int ext4_fill_flex_info(struct su |
| size_t size; |
| int i; |
| |
| - if (!sbi->s_es->s_log_groups_per_flex) { |
| + sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex; |
| + groups_per_flex = 1 << sbi->s_log_groups_per_flex; |
| + |
| + if (groups_per_flex < 2) { |
| sbi->s_log_groups_per_flex = 0; |
| return 1; |
| } |
| |
| - sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex; |
| - groups_per_flex = 1 << sbi->s_log_groups_per_flex; |
| - |
| /* We allocate both existing and potentially added groups */ |
| flex_group_count = ((sbi->s_groups_count + groups_per_flex - 1) + |
| ((le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks) + 1) << |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:05 2009 |
| Message-Id: <20091211042805.444446227@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:37 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [59/90] ext4: fix the returned block count if EXT4_IOC_MOVE_EXT fails |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0059-ext4-fix-the-returned-block-count-if-EXT4_IOC_MOVE_E.patch |
| Content-Length: 10970 |
| Lines: 349 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit f868a48d06f8886cb0367568a12367fa4f21ea0d) |
| |
| If the EXT4_IOC_MOVE_EXT ioctl fails, the number of blocks that were |
| exchanged before the failure should be returned to the userspace |
| caller. Unfortunately, currently if the block size is not the same as |
| the page size, the returned block count that is returned is the |
| page-aligned block count instead of the actual block count. This |
| commit addresses this bug. |
| |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 139 ++++++++++++++++++++++++++------------------------ |
| 1 file changed, 73 insertions(+), 66 deletions(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -661,6 +661,7 @@ mext_calc_swap_extents(struct ext4_exten |
| * @donor_inode: donor inode |
| * @from: block offset of orig_inode |
| * @count: block count to be replaced |
| + * @err: pointer to save return value |
| * |
| * Replace original inode extents and donor inode extents page by page. |
| * We implement this replacement in the following three steps: |
| @@ -671,19 +672,18 @@ mext_calc_swap_extents(struct ext4_exten |
| * 3. Change the block information of donor inode to point at the saved |
| * original inode blocks in the dummy extents. |
| * |
| - * Return 0 on success, or a negative error value on failure. |
| + * Return replaced block count. |
| */ |
| static int |
| mext_replace_branches(handle_t *handle, struct inode *orig_inode, |
| struct inode *donor_inode, ext4_lblk_t from, |
| - ext4_lblk_t count) |
| + ext4_lblk_t count, int *err) |
| { |
| struct ext4_ext_path *orig_path = NULL; |
| struct ext4_ext_path *donor_path = NULL; |
| struct ext4_extent *oext, *dext; |
| struct ext4_extent tmp_dext, tmp_oext; |
| ext4_lblk_t orig_off = from, donor_off = from; |
| - int err = 0; |
| int depth; |
| int replaced_count = 0; |
| int dext_alen; |
| @@ -691,13 +691,13 @@ mext_replace_branches(handle_t *handle, |
| mext_double_down_write(orig_inode, donor_inode); |
| |
| /* Get the original extent for the block "orig_off" */ |
| - err = get_ext_path(orig_inode, orig_off, &orig_path); |
| - if (err) |
| + *err = get_ext_path(orig_inode, orig_off, &orig_path); |
| + if (*err) |
| goto out; |
| |
| /* Get the donor extent for the head */ |
| - err = get_ext_path(donor_inode, donor_off, &donor_path); |
| - if (err) |
| + *err = get_ext_path(donor_inode, donor_off, &donor_path); |
| + if (*err) |
| goto out; |
| depth = ext_depth(orig_inode); |
| oext = orig_path[depth].p_ext; |
| @@ -707,9 +707,9 @@ mext_replace_branches(handle_t *handle, |
| dext = donor_path[depth].p_ext; |
| tmp_dext = *dext; |
| |
| - err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off, |
| + *err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off, |
| donor_off, count); |
| - if (err) |
| + if (*err) |
| goto out; |
| |
| /* Loop for the donor extents */ |
| @@ -718,7 +718,7 @@ mext_replace_branches(handle_t *handle, |
| if (!dext) { |
| ext4_error(donor_inode->i_sb, __func__, |
| "The extent for donor must be found"); |
| - err = -EIO; |
| + *err = -EIO; |
| goto out; |
| } else if (donor_off != le32_to_cpu(tmp_dext.ee_block)) { |
| ext4_error(donor_inode->i_sb, __func__, |
| @@ -726,20 +726,20 @@ mext_replace_branches(handle_t *handle, |
| "extent(%u) should be equal", |
| donor_off, |
| le32_to_cpu(tmp_dext.ee_block)); |
| - err = -EIO; |
| + *err = -EIO; |
| goto out; |
| } |
| |
| /* Set donor extent to orig extent */ |
| - err = mext_leaf_block(handle, orig_inode, |
| + *err = mext_leaf_block(handle, orig_inode, |
| orig_path, &tmp_dext, &orig_off); |
| - if (err < 0) |
| + if (*err) |
| goto out; |
| |
| /* Set orig extent to donor extent */ |
| - err = mext_leaf_block(handle, donor_inode, |
| + *err = mext_leaf_block(handle, donor_inode, |
| donor_path, &tmp_oext, &donor_off); |
| - if (err < 0) |
| + if (*err) |
| goto out; |
| |
| dext_alen = ext4_ext_get_actual_len(&tmp_dext); |
| @@ -753,35 +753,25 @@ mext_replace_branches(handle_t *handle, |
| |
| if (orig_path) |
| ext4_ext_drop_refs(orig_path); |
| - err = get_ext_path(orig_inode, orig_off, &orig_path); |
| - if (err) |
| + *err = get_ext_path(orig_inode, orig_off, &orig_path); |
| + if (*err) |
| goto out; |
| depth = ext_depth(orig_inode); |
| oext = orig_path[depth].p_ext; |
| - if (le32_to_cpu(oext->ee_block) + |
| - ext4_ext_get_actual_len(oext) <= orig_off) { |
| - err = 0; |
| - goto out; |
| - } |
| tmp_oext = *oext; |
| |
| if (donor_path) |
| ext4_ext_drop_refs(donor_path); |
| - err = get_ext_path(donor_inode, donor_off, &donor_path); |
| - if (err) |
| + *err = get_ext_path(donor_inode, donor_off, &donor_path); |
| + if (*err) |
| goto out; |
| depth = ext_depth(donor_inode); |
| dext = donor_path[depth].p_ext; |
| - if (le32_to_cpu(dext->ee_block) + |
| - ext4_ext_get_actual_len(dext) <= donor_off) { |
| - err = 0; |
| - goto out; |
| - } |
| tmp_dext = *dext; |
| |
| - err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off, |
| + *err = mext_calc_swap_extents(&tmp_dext, &tmp_oext, orig_off, |
| donor_off, count - replaced_count); |
| - if (err) |
| + if (*err) |
| goto out; |
| } |
| |
| @@ -796,7 +786,7 @@ out: |
| } |
| |
| mext_double_up_write(orig_inode, donor_inode); |
| - return err; |
| + return replaced_count; |
| } |
| |
| /** |
| @@ -808,16 +798,17 @@ out: |
| * @data_offset_in_page: block index where data swapping starts |
| * @block_len_in_page: the number of blocks to be swapped |
| * @uninit: orig extent is uninitialized or not |
| + * @err: pointer to save return value |
| * |
| * Save the data in original inode blocks and replace original inode extents |
| * with donor inode extents by calling mext_replace_branches(). |
| - * Finally, write out the saved data in new original inode blocks. Return 0 |
| - * on success, or a negative error value on failure. |
| + * Finally, write out the saved data in new original inode blocks. Return |
| + * replaced block count. |
| */ |
| static int |
| move_extent_per_page(struct file *o_filp, struct inode *donor_inode, |
| pgoff_t orig_page_offset, int data_offset_in_page, |
| - int block_len_in_page, int uninit) |
| + int block_len_in_page, int uninit, int *err) |
| { |
| struct inode *orig_inode = o_filp->f_dentry->d_inode; |
| struct address_space *mapping = orig_inode->i_mapping; |
| @@ -829,9 +820,11 @@ move_extent_per_page(struct file *o_filp |
| long long offs = orig_page_offset << PAGE_CACHE_SHIFT; |
| unsigned long blocksize = orig_inode->i_sb->s_blocksize; |
| unsigned int w_flags = 0; |
| - unsigned int tmp_data_len, data_len; |
| + unsigned int tmp_data_size, data_size, replaced_size; |
| void *fsdata; |
| - int ret, i, jblocks; |
| + int i, jblocks; |
| + int err2 = 0; |
| + int replaced_count = 0; |
| int blocks_per_page = PAGE_CACHE_SIZE >> orig_inode->i_blkbits; |
| |
| /* |
| @@ -841,8 +834,8 @@ move_extent_per_page(struct file *o_filp |
| jblocks = ext4_writepage_trans_blocks(orig_inode) * 2; |
| handle = ext4_journal_start(orig_inode, jblocks); |
| if (IS_ERR(handle)) { |
| - ret = PTR_ERR(handle); |
| - return ret; |
| + *err = PTR_ERR(handle); |
| + return 0; |
| } |
| |
| if (segment_eq(get_fs(), KERNEL_DS)) |
| @@ -858,9 +851,9 @@ move_extent_per_page(struct file *o_filp |
| * Just swap data blocks between orig and donor. |
| */ |
| if (uninit) { |
| - ret = mext_replace_branches(handle, orig_inode, |
| - donor_inode, orig_blk_offset, |
| - block_len_in_page); |
| + replaced_count = mext_replace_branches(handle, orig_inode, |
| + donor_inode, orig_blk_offset, |
| + block_len_in_page, err); |
| |
| /* Clear the inode cache not to refer to the old data */ |
| ext4_ext_invalidate_cache(orig_inode); |
| @@ -870,27 +863,28 @@ move_extent_per_page(struct file *o_filp |
| |
| offs = (long long)orig_blk_offset << orig_inode->i_blkbits; |
| |
| - /* Calculate data_len */ |
| + /* Calculate data_size */ |
| if ((orig_blk_offset + block_len_in_page - 1) == |
| ((orig_inode->i_size - 1) >> orig_inode->i_blkbits)) { |
| /* Replace the last block */ |
| - tmp_data_len = orig_inode->i_size & (blocksize - 1); |
| + tmp_data_size = orig_inode->i_size & (blocksize - 1); |
| /* |
| - * If data_len equal zero, it shows data_len is multiples of |
| + * If data_size equal zero, it shows data_size is multiples of |
| * blocksize. So we set appropriate value. |
| */ |
| - if (tmp_data_len == 0) |
| - tmp_data_len = blocksize; |
| + if (tmp_data_size == 0) |
| + tmp_data_size = blocksize; |
| |
| - data_len = tmp_data_len + |
| + data_size = tmp_data_size + |
| ((block_len_in_page - 1) << orig_inode->i_blkbits); |
| - } else { |
| - data_len = block_len_in_page << orig_inode->i_blkbits; |
| - } |
| + } else |
| + data_size = block_len_in_page << orig_inode->i_blkbits; |
| + |
| + replaced_size = data_size; |
| |
| - ret = a_ops->write_begin(o_filp, mapping, offs, data_len, w_flags, |
| + *err = a_ops->write_begin(o_filp, mapping, offs, data_size, w_flags, |
| &page, &fsdata); |
| - if (unlikely(ret < 0)) |
| + if (unlikely(*err < 0)) |
| goto out; |
| |
| if (!PageUptodate(page)) { |
| @@ -911,10 +905,17 @@ move_extent_per_page(struct file *o_filp |
| /* Release old bh and drop refs */ |
| try_to_release_page(page, 0); |
| |
| - ret = mext_replace_branches(handle, orig_inode, donor_inode, |
| - orig_blk_offset, block_len_in_page); |
| - if (ret < 0) |
| - goto out; |
| + replaced_count = mext_replace_branches(handle, orig_inode, donor_inode, |
| + orig_blk_offset, block_len_in_page, |
| + &err2); |
| + if (err2) { |
| + if (replaced_count) { |
| + block_len_in_page = replaced_count; |
| + replaced_size = |
| + block_len_in_page << orig_inode->i_blkbits; |
| + } else |
| + goto out; |
| + } |
| |
| /* Clear the inode cache not to refer to the old data */ |
| ext4_ext_invalidate_cache(orig_inode); |
| @@ -928,16 +929,16 @@ move_extent_per_page(struct file *o_filp |
| bh = bh->b_this_page; |
| |
| for (i = 0; i < block_len_in_page; i++) { |
| - ret = ext4_get_block(orig_inode, |
| + *err = ext4_get_block(orig_inode, |
| (sector_t)(orig_blk_offset + i), bh, 0); |
| - if (ret < 0) |
| + if (*err < 0) |
| goto out; |
| |
| if (bh->b_this_page != NULL) |
| bh = bh->b_this_page; |
| } |
| |
| - ret = a_ops->write_end(o_filp, mapping, offs, data_len, data_len, |
| + *err = a_ops->write_end(o_filp, mapping, offs, data_size, replaced_size, |
| page, fsdata); |
| page = NULL; |
| |
| @@ -951,7 +952,10 @@ out: |
| out2: |
| ext4_journal_stop(handle); |
| |
| - return ret < 0 ? ret : 0; |
| + if (err2) |
| + *err = err2; |
| + |
| + return replaced_count; |
| } |
| |
| /** |
| @@ -1367,15 +1371,17 @@ ext4_move_extents(struct file *o_filp, s |
| while (orig_page_offset <= seq_end_page) { |
| |
| /* Swap original branches with new branches */ |
| - ret1 = move_extent_per_page(o_filp, donor_inode, |
| + block_len_in_page = move_extent_per_page( |
| + o_filp, donor_inode, |
| orig_page_offset, |
| data_offset_in_page, |
| - block_len_in_page, uninit); |
| - if (ret1 < 0) |
| - goto out; |
| - orig_page_offset++; |
| + block_len_in_page, uninit, |
| + &ret1); |
| + |
| /* Count how many blocks we have exchanged */ |
| *moved_len += block_len_in_page; |
| + if (ret1 < 0) |
| + goto out; |
| if (*moved_len > len) { |
| ext4_error(orig_inode->i_sb, __func__, |
| "We replaced blocks too much! " |
| @@ -1385,6 +1391,7 @@ ext4_move_extents(struct file *o_filp, s |
| goto out; |
| } |
| |
| + orig_page_offset++; |
| data_offset_in_page = 0; |
| rest_blocks -= block_len_in_page; |
| if (rest_blocks > blocks_per_page) |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:06 2009 |
| Message-Id: <20091211042805.985279951@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:38 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [60/90] ext4: fix lock order problem in ext4_move_extents() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0060-ext4-fix-lock-order-problem-in-ext4_move_extents.patch |
| Content-Length: 10372 |
| Lines: 310 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit fc04cb49a898c372a22b21fffc47f299d8710801) |
| |
| ext4_move_extents() checks the logical block contiguousness |
| of original file with ext4_find_extent() and mext_next_extent(). |
| Therefore the extent which ext4_ext_path structure indicates |
| must not be changed between above functions. |
| |
| But in current implementation, there is no i_data_sem protection |
| between ext4_ext_find_extent() and mext_next_extent(). So the extent |
| which ext4_ext_path structure indicates may be overwritten by |
| delalloc. As a result, ext4_move_extents() will exchange wrong blocks |
| between original and donor files. I change the place where |
| acquire/release i_data_sem to solve this problem. |
| |
| Moreover, I changed move_extent_per_page() to start transaction first, |
| and then acquire i_data_sem. Without this change, there is a |
| possibility of the deadlock between mmap() and ext4_move_extents(): |
| |
| * NOTE: "A", "B" and "C" mean different processes |
| |
| A-1: ext4_ext_move_extents() acquires i_data_sem of two inodes. |
| |
| B: do_page_fault() starts the transaction (T), |
| and then tries to acquire i_data_sem. |
| But process "A" is already holding it, so it is kept waiting. |
| |
| C: While "A" and "B" running, kjournald2 tries to commit transaction (T) |
| but it is under updating, so kjournald2 waits for it. |
| |
| A-2: Call ext4_journal_start with holding i_data_sem, |
| but transaction (T) is locked. |
| |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 117 ++++++++++++++++++++++---------------------------- |
| 1 file changed, 53 insertions(+), 64 deletions(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -77,12 +77,14 @@ static int |
| mext_next_extent(struct inode *inode, struct ext4_ext_path *path, |
| struct ext4_extent **extent) |
| { |
| + struct ext4_extent_header *eh; |
| int ppos, leaf_ppos = path->p_depth; |
| |
| ppos = leaf_ppos; |
| if (EXT_LAST_EXTENT(path[ppos].p_hdr) > path[ppos].p_ext) { |
| /* leaf block */ |
| *extent = ++path[ppos].p_ext; |
| + path[ppos].p_block = ext_pblock(path[ppos].p_ext); |
| return 0; |
| } |
| |
| @@ -119,9 +121,18 @@ mext_next_extent(struct inode *inode, st |
| ext_block_hdr(path[cur_ppos+1].p_bh); |
| } |
| |
| + path[leaf_ppos].p_ext = *extent = NULL; |
| + |
| + eh = path[leaf_ppos].p_hdr; |
| + if (le16_to_cpu(eh->eh_entries) == 0) |
| + /* empty leaf is found */ |
| + return -ENODATA; |
| + |
| /* leaf block */ |
| path[leaf_ppos].p_ext = *extent = |
| EXT_FIRST_EXTENT(path[leaf_ppos].p_hdr); |
| + path[leaf_ppos].p_block = |
| + ext_pblock(path[leaf_ppos].p_ext); |
| return 0; |
| } |
| } |
| @@ -155,40 +166,15 @@ mext_check_null_inode(struct inode *inod |
| } |
| |
| /** |
| - * mext_double_down_read - Acquire two inodes' read semaphore |
| - * |
| - * @orig_inode: original inode structure |
| - * @donor_inode: donor inode structure |
| - * Acquire read semaphore of the two inodes (orig and donor) by i_ino order. |
| - */ |
| -static void |
| -mext_double_down_read(struct inode *orig_inode, struct inode *donor_inode) |
| -{ |
| - struct inode *first = orig_inode, *second = donor_inode; |
| - |
| - /* |
| - * Use the inode number to provide the stable locking order instead |
| - * of its address, because the C language doesn't guarantee you can |
| - * compare pointers that don't come from the same array. |
| - */ |
| - if (donor_inode->i_ino < orig_inode->i_ino) { |
| - first = donor_inode; |
| - second = orig_inode; |
| - } |
| - |
| - down_read(&EXT4_I(first)->i_data_sem); |
| - down_read(&EXT4_I(second)->i_data_sem); |
| -} |
| - |
| -/** |
| - * mext_double_down_write - Acquire two inodes' write semaphore |
| + * double_down_write_data_sem - Acquire two inodes' write lock of i_data_sem |
| * |
| * @orig_inode: original inode structure |
| * @donor_inode: donor inode structure |
| - * Acquire write semaphore of the two inodes (orig and donor) by i_ino order. |
| + * Acquire write lock of i_data_sem of the two inodes (orig and donor) by |
| + * i_ino order. |
| */ |
| static void |
| -mext_double_down_write(struct inode *orig_inode, struct inode *donor_inode) |
| +double_down_write_data_sem(struct inode *orig_inode, struct inode *donor_inode) |
| { |
| struct inode *first = orig_inode, *second = donor_inode; |
| |
| @@ -207,28 +193,14 @@ mext_double_down_write(struct inode *ori |
| } |
| |
| /** |
| - * mext_double_up_read - Release two inodes' read semaphore |
| + * double_up_write_data_sem - Release two inodes' write lock of i_data_sem |
| * |
| * @orig_inode: original inode structure to be released its lock first |
| * @donor_inode: donor inode structure to be released its lock second |
| - * Release read semaphore of two inodes (orig and donor). |
| + * Release write lock of i_data_sem of two inodes (orig and donor). |
| */ |
| static void |
| -mext_double_up_read(struct inode *orig_inode, struct inode *donor_inode) |
| -{ |
| - up_read(&EXT4_I(orig_inode)->i_data_sem); |
| - up_read(&EXT4_I(donor_inode)->i_data_sem); |
| -} |
| - |
| -/** |
| - * mext_double_up_write - Release two inodes' write semaphore |
| - * |
| - * @orig_inode: original inode structure to be released its lock first |
| - * @donor_inode: donor inode structure to be released its lock second |
| - * Release write semaphore of two inodes (orig and donor). |
| - */ |
| -static void |
| -mext_double_up_write(struct inode *orig_inode, struct inode *donor_inode) |
| +double_up_write_data_sem(struct inode *orig_inode, struct inode *donor_inode) |
| { |
| up_write(&EXT4_I(orig_inode)->i_data_sem); |
| up_write(&EXT4_I(donor_inode)->i_data_sem); |
| @@ -688,8 +660,6 @@ mext_replace_branches(handle_t *handle, |
| int replaced_count = 0; |
| int dext_alen; |
| |
| - mext_double_down_write(orig_inode, donor_inode); |
| - |
| /* Get the original extent for the block "orig_off" */ |
| *err = get_ext_path(orig_inode, orig_off, &orig_path); |
| if (*err) |
| @@ -785,7 +755,6 @@ out: |
| kfree(donor_path); |
| } |
| |
| - mext_double_up_write(orig_inode, donor_inode); |
| return replaced_count; |
| } |
| |
| @@ -851,6 +820,11 @@ move_extent_per_page(struct file *o_filp |
| * Just swap data blocks between orig and donor. |
| */ |
| if (uninit) { |
| + /* |
| + * Protect extent trees against block allocations |
| + * via delalloc |
| + */ |
| + double_down_write_data_sem(orig_inode, donor_inode); |
| replaced_count = mext_replace_branches(handle, orig_inode, |
| donor_inode, orig_blk_offset, |
| block_len_in_page, err); |
| @@ -858,6 +832,7 @@ move_extent_per_page(struct file *o_filp |
| /* Clear the inode cache not to refer to the old data */ |
| ext4_ext_invalidate_cache(orig_inode); |
| ext4_ext_invalidate_cache(donor_inode); |
| + double_up_write_data_sem(orig_inode, donor_inode); |
| goto out2; |
| } |
| |
| @@ -905,6 +880,8 @@ move_extent_per_page(struct file *o_filp |
| /* Release old bh and drop refs */ |
| try_to_release_page(page, 0); |
| |
| + /* Protect extent trees against block allocations via delalloc */ |
| + double_down_write_data_sem(orig_inode, donor_inode); |
| replaced_count = mext_replace_branches(handle, orig_inode, donor_inode, |
| orig_blk_offset, block_len_in_page, |
| &err2); |
| @@ -913,14 +890,18 @@ move_extent_per_page(struct file *o_filp |
| block_len_in_page = replaced_count; |
| replaced_size = |
| block_len_in_page << orig_inode->i_blkbits; |
| - } else |
| + } else { |
| + double_up_write_data_sem(orig_inode, donor_inode); |
| goto out; |
| + } |
| } |
| |
| /* Clear the inode cache not to refer to the old data */ |
| ext4_ext_invalidate_cache(orig_inode); |
| ext4_ext_invalidate_cache(donor_inode); |
| |
| + double_up_write_data_sem(orig_inode, donor_inode); |
| + |
| if (!page_has_buffers(page)) |
| create_empty_buffers(page, 1 << orig_inode->i_blkbits, 0); |
| |
| @@ -1236,16 +1217,16 @@ ext4_move_extents(struct file *o_filp, s |
| return -EINVAL; |
| } |
| |
| - /* protect orig and donor against a truncate */ |
| + /* Protect orig and donor inodes against a truncate */ |
| ret1 = mext_inode_double_lock(orig_inode, donor_inode); |
| if (ret1 < 0) |
| return ret1; |
| |
| - mext_double_down_read(orig_inode, donor_inode); |
| + /* Protect extent tree against block allocations via delalloc */ |
| + double_down_write_data_sem(orig_inode, donor_inode); |
| /* Check the filesystem environment whether move_extent can be done */ |
| ret1 = mext_check_arguments(orig_inode, donor_inode, orig_start, |
| donor_start, &len, *moved_len); |
| - mext_double_up_read(orig_inode, donor_inode); |
| if (ret1) |
| goto out; |
| |
| @@ -1308,6 +1289,10 @@ ext4_move_extents(struct file *o_filp, s |
| ext4_ext_get_actual_len(ext_cur), block_end + 1) - |
| max(le32_to_cpu(ext_cur->ee_block), block_start); |
| |
| + /* Discard preallocations of two inodes */ |
| + ext4_discard_preallocations(orig_inode); |
| + ext4_discard_preallocations(donor_inode); |
| + |
| while (!last_extent && le32_to_cpu(ext_cur->ee_block) <= block_end) { |
| seq_blocks += add_blocks; |
| |
| @@ -1359,14 +1344,14 @@ ext4_move_extents(struct file *o_filp, s |
| seq_start = le32_to_cpu(ext_cur->ee_block); |
| rest_blocks = seq_blocks; |
| |
| - /* Discard preallocations of two inodes */ |
| - down_write(&EXT4_I(orig_inode)->i_data_sem); |
| - ext4_discard_preallocations(orig_inode); |
| - up_write(&EXT4_I(orig_inode)->i_data_sem); |
| - |
| - down_write(&EXT4_I(donor_inode)->i_data_sem); |
| - ext4_discard_preallocations(donor_inode); |
| - up_write(&EXT4_I(donor_inode)->i_data_sem); |
| + /* |
| + * Up semaphore to avoid following problems: |
| + * a. transaction deadlock among ext4_journal_start, |
| + * ->write_begin via pagefault, and jbd2_journal_commit |
| + * b. racing with ->readpage, ->write_begin, and ext4_get_block |
| + * in move_extent_per_page |
| + */ |
| + double_up_write_data_sem(orig_inode, donor_inode); |
| |
| while (orig_page_offset <= seq_end_page) { |
| |
| @@ -1381,14 +1366,14 @@ ext4_move_extents(struct file *o_filp, s |
| /* Count how many blocks we have exchanged */ |
| *moved_len += block_len_in_page; |
| if (ret1 < 0) |
| - goto out; |
| + break; |
| if (*moved_len > len) { |
| ext4_error(orig_inode->i_sb, __func__, |
| "We replaced blocks too much! " |
| "sum of replaced: %llu requested: %llu", |
| *moved_len, len); |
| ret1 = -EIO; |
| - goto out; |
| + break; |
| } |
| |
| orig_page_offset++; |
| @@ -1400,6 +1385,10 @@ ext4_move_extents(struct file *o_filp, s |
| block_len_in_page = rest_blocks; |
| } |
| |
| + double_down_write_data_sem(orig_inode, donor_inode); |
| + if (ret1 < 0) |
| + break; |
| + |
| /* Decrease buffer counter */ |
| if (holecheck_path) |
| ext4_ext_drop_refs(holecheck_path); |
| @@ -1429,7 +1418,7 @@ out: |
| ext4_ext_drop_refs(holecheck_path); |
| kfree(holecheck_path); |
| } |
| - |
| + double_up_write_data_sem(orig_inode, donor_inode); |
| ret2 = mext_inode_double_unlock(orig_inode, donor_inode); |
| |
| if (ret1) |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:07 2009 |
| Message-Id: <20091211042806.581977969@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:39 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [61/90] ext4: fix possible recursive locking warning in EXT4_IOC_MOVE_EXT |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0061-ext4-fix-possible-recursive-locking-warning-in-EXT4_.patch |
| Content-Length: 1075 |
| Lines: 32 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 49bd22bc4d603a2a4fc2a6a60e156cbea52eb494) |
| |
| If CONFIG_PROVE_LOCKING is enabled, the double_down_write_data_sem() |
| will trigger a false-positive warning of a recursive lock. Since we |
| take i_data_sem for the two inodes ordered by their inode numbers, |
| this isn't a problem. Use of down_write_nested() will notify the lock |
| dependency checker machinery that there is no problem here. |
| |
| This problem was reported by Brian Rogers: |
| |
| http://marc.info/?l=linux-ext4&m=125115356928011&w=1 |
| |
| Reported-by: Brian Rogers <brian@xyzw.org> |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -189,7 +189,7 @@ double_down_write_data_sem(struct inode |
| } |
| |
| down_write(&EXT4_I(first)->i_data_sem); |
| - down_write(&EXT4_I(second)->i_data_sem); |
| + down_write_nested(&EXT4_I(second)->i_data_sem, SINGLE_DEPTH_NESTING); |
| } |
| |
| /** |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:07 2009 |
| Message-Id: <20091211042807.176333510@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:40 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [62/90] ext4: plug a buffer_head leak in an error path of ext4_iget() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0062-ext4-plug-a-buffer_head-leak-in-an-error-path-of-ext.patch |
| Content-Length: 2427 |
| Lines: 82 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 567f3e9a70d71e5c9be03701b8578be77857293b) |
| |
| One of the invalid error paths in ext4_iget() forgot to brelse() the |
| inode buffer head. Fix it by adding a brelse() in the common error |
| return path, which also simplifies function. |
| |
| Thanks to Andi Kleen <ak@linux.intel.com> reporting the problem. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 11 +++-------- |
| 1 file changed, 3 insertions(+), 8 deletions(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -4771,7 +4771,6 @@ struct inode *ext4_iget(struct super_blo |
| struct ext4_iloc iloc; |
| struct ext4_inode *raw_inode; |
| struct ext4_inode_info *ei; |
| - struct buffer_head *bh; |
| struct inode *inode; |
| long ret; |
| int block; |
| @@ -4783,11 +4782,11 @@ struct inode *ext4_iget(struct super_blo |
| return inode; |
| |
| ei = EXT4_I(inode); |
| + iloc.bh = 0; |
| |
| ret = __ext4_get_inode_loc(inode, &iloc, 0); |
| if (ret < 0) |
| goto bad_inode; |
| - bh = iloc.bh; |
| raw_inode = ext4_raw_inode(&iloc); |
| inode->i_mode = le16_to_cpu(raw_inode->i_mode); |
| inode->i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low); |
| @@ -4810,7 +4809,6 @@ struct inode *ext4_iget(struct super_blo |
| if (inode->i_mode == 0 || |
| !(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ORPHAN_FS)) { |
| /* this inode is deleted */ |
| - brelse(bh); |
| ret = -ESTALE; |
| goto bad_inode; |
| } |
| @@ -4842,7 +4840,6 @@ struct inode *ext4_iget(struct super_blo |
| ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize); |
| if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize > |
| EXT4_INODE_SIZE(inode->i_sb)) { |
| - brelse(bh); |
| ret = -EIO; |
| goto bad_inode; |
| } |
| @@ -4895,10 +4892,8 @@ struct inode *ext4_iget(struct super_blo |
| /* Validate block references which are part of inode */ |
| ret = ext4_check_inode_blockref(inode); |
| } |
| - if (ret) { |
| - brelse(bh); |
| + if (ret) |
| goto bad_inode; |
| - } |
| |
| if (S_ISREG(inode->i_mode)) { |
| inode->i_op = &ext4_file_inode_operations; |
| @@ -4926,7 +4921,6 @@ struct inode *ext4_iget(struct super_blo |
| init_special_inode(inode, inode->i_mode, |
| new_decode_dev(le32_to_cpu(raw_inode->i_block[1]))); |
| } else { |
| - brelse(bh); |
| ret = -EIO; |
| ext4_error(inode->i_sb, __func__, |
| "bogus i_mode (%o) for inode=%lu", |
| @@ -4939,6 +4933,7 @@ struct inode *ext4_iget(struct super_blo |
| return inode; |
| |
| bad_inode: |
| + brelse(iloc.bh); |
| iget_failed(inode); |
| return ERR_PTR(ret); |
| } |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:08 2009 |
| Message-Id: <20091211042807.711256423@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:41 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [63/90] ext4: make sure directory and symlink blocks are revoked |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0063-ext4-make-sure-directory-and-symlink-blocks-are-revo.patch |
| Content-Length: 2052 |
| Lines: 58 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 50689696867d95b38d9c7be640a311494a04fb86) |
| |
| When an inode gets unlinked, the functions ext4_clear_blocks() and |
| ext4_remove_blocks() call ext4_forget() for all the buffer heads |
| corresponding to the deleted inode's data blocks. If the inode is a |
| directory or a symlink, the is_metadata parameter must be non-zero so |
| ext4_forget() will revoke them via jbd2_journal_revoke(). Otherwise, |
| if these blocks are reused for a data file, and the system crashes |
| before a journal checkpoint, the journal replay could end up |
| corrupting these data blocks. |
| |
| Thanks to Curt Wohlgemuth for pointing out potential problems in this |
| area. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/extents.c | 2 +- |
| fs/ext4/inode.c | 6 ++++-- |
| 2 files changed, 5 insertions(+), 3 deletions(-) |
| |
| --- a/fs/ext4/extents.c |
| +++ b/fs/ext4/extents.c |
| @@ -2055,7 +2055,7 @@ static int ext4_remove_blocks(handle_t * |
| ext_debug("free last %u blocks starting %llu\n", num, start); |
| for (i = 0; i < num; i++) { |
| bh = sb_find_get_block(inode->i_sb, start + i); |
| - ext4_forget(handle, 0, inode, bh, start + i); |
| + ext4_forget(handle, metadata, inode, bh, start + i); |
| } |
| ext4_free_blocks(handle, inode, start, num, metadata); |
| } else if (from == le32_to_cpu(ex->ee_block) |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -4110,6 +4110,8 @@ static void ext4_clear_blocks(handle_t * |
| __le32 *last) |
| { |
| __le32 *p; |
| + int is_metadata = S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode); |
| + |
| if (try_to_extend_transaction(handle, inode)) { |
| if (bh) { |
| BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata"); |
| @@ -4140,11 +4142,11 @@ static void ext4_clear_blocks(handle_t * |
| |
| *p = 0; |
| tbh = sb_find_get_block(inode->i_sb, nr); |
| - ext4_forget(handle, 0, inode, tbh, nr); |
| + ext4_forget(handle, is_metadata, inode, tbh, nr); |
| } |
| } |
| |
| - ext4_free_blocks(handle, inode, block_to_free, count, 0); |
| + ext4_free_blocks(handle, inode, block_to_free, count, is_metadata); |
| } |
| |
| /** |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:08 2009 |
| Message-Id: <20091211042808.337529149@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:42 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Julia Lawall <julia@diku.dk>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [64/90] ext4: fix i_flags access in ext4_da_writepages_trans_blocks() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0064-ext4-fix-i_flags-access-in-ext4_da_writepages_trans_.patch |
| Content-Length: 846 |
| Lines: 25 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 30c6e07a92ea4cb87160d32ffa9bce172576ae4c) |
| |
| We need to be testing the i_flags field in the ext4 specific portion |
| of the inode, instead of the (confusingly aliased) i_flags field in |
| the generic struct inode. |
| |
| Signed-off-by: Julia Lawall <julia@diku.dk> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -2785,7 +2785,7 @@ static int ext4_da_writepages_trans_bloc |
| * number of contiguous block. So we will limit |
| * number of contiguous block to a sane value |
| */ |
| - if (!(inode->i_flags & EXT4_EXTENTS_FL) && |
| + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) && |
| (max_blocks > EXT4_MAX_TRANS_DATA)) |
| max_blocks = EXT4_MAX_TRANS_DATA; |
| |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:09 2009 |
| Message-Id: <20091211042808.870915761@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:43 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Eric Sandeen <sandeen@redhat.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [65/90] ext4: journal all modifications in ext4_xattr_set_handle |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0065-ext4-journal-all-modifications-in-ext4_xattr_set_han.patch |
| Content-Length: 1254 |
| Lines: 39 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 86ebfd08a1930ccedb8eac0aeb1ed4b8b6a41dbc) |
| |
| ext4_xattr_set_handle() was zeroing out an inode outside |
| of journaling constraints; this is one of the accesses that |
| was causing the crc errors in journal replay as seen in |
| kernel.org bugzilla #14354. |
| |
| Reviewed-by: Andreas Dilger <adilger@sun.com> |
| Signed-off-by: Eric Sandeen <sandeen@redhat.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/xattr.c | 7 ++++--- |
| 1 file changed, 4 insertions(+), 3 deletions(-) |
| |
| --- a/fs/ext4/xattr.c |
| +++ b/fs/ext4/xattr.c |
| @@ -988,6 +988,10 @@ ext4_xattr_set_handle(handle_t *handle, |
| if (error) |
| goto cleanup; |
| |
| + error = ext4_journal_get_write_access(handle, is.iloc.bh); |
| + if (error) |
| + goto cleanup; |
| + |
| if (EXT4_I(inode)->i_state & EXT4_STATE_NEW) { |
| struct ext4_inode *raw_inode = ext4_raw_inode(&is.iloc); |
| memset(raw_inode, 0, EXT4_SB(inode->i_sb)->s_inode_size); |
| @@ -1013,9 +1017,6 @@ ext4_xattr_set_handle(handle_t *handle, |
| if (flags & XATTR_CREATE) |
| goto cleanup; |
| } |
| - error = ext4_journal_get_write_access(handle, is.iloc.bh); |
| - if (error) |
| - goto cleanup; |
| if (!value) { |
| if (!is.s.not_found) |
| error = ext4_xattr_ibody_set(handle, inode, &i, &is); |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:09 2009 |
| Message-Id: <20091211042809.446063479@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:44 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [66/90] ext4: dont update the superblock in ext4_statfs() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0066-ext4-don-t-update-the-superblock-in-ext4_statfs.patch |
| Content-Length: 1341 |
| Lines: 31 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 3f8fb9490efbd300887470a2a880a64e04dcc3f5) |
| |
| commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext4_statfs() |
| to update the on-disk superblock counters, but modified this buffer |
| directly without any journaling of the change. This is one of the |
| accesses that was causing the crc errors in journal replay as seen in |
| kernel.org bugzilla #14354. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/super.c | 2 -- |
| 1 file changed, 2 deletions(-) |
| |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -3693,13 +3693,11 @@ static int ext4_statfs(struct dentry *de |
| buf->f_blocks = ext4_blocks_count(es) - sbi->s_overhead_last; |
| buf->f_bfree = percpu_counter_sum_positive(&sbi->s_freeblocks_counter) - |
| percpu_counter_sum_positive(&sbi->s_dirtyblocks_counter); |
| - ext4_free_blocks_count_set(es, buf->f_bfree); |
| buf->f_bavail = buf->f_bfree - ext4_r_blocks_count(es); |
| if (buf->f_bfree < ext4_r_blocks_count(es)) |
| buf->f_bavail = 0; |
| buf->f_files = le32_to_cpu(es->s_inodes_count); |
| buf->f_ffree = percpu_counter_sum_positive(&sbi->s_freeinodes_counter); |
| - es->s_free_inodes_count = cpu_to_le32(buf->f_ffree); |
| buf->f_namelen = EXT4_NAME_LEN; |
| fsid = le64_to_cpup((void *)es->s_uuid) ^ |
| le64_to_cpup((void *)es->s_uuid + sizeof(u64)); |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:10 2009 |
| Message-Id: <20091211042810.021726276@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:45 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [67/90] ext4: fix uninit block bitmap initialization when s_meta_first_bg is non-zero |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0067-ext4-fix-uninit-block-bitmap-initialization-when-s_m.patch |
| Content-Length: 875 |
| Lines: 29 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 8dadb198cb70ef811916668fe67eeec82e8858dd) |
| |
| The number of old-style block group descriptor blocks is |
| s_meta_first_bg when the meta_bg feature flag is set. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/balloc.c | 8 +++++++- |
| 1 file changed, 7 insertions(+), 1 deletion(-) |
| |
| --- a/fs/ext4/balloc.c |
| +++ b/fs/ext4/balloc.c |
| @@ -761,7 +761,13 @@ static unsigned long ext4_bg_num_gdb_met |
| static unsigned long ext4_bg_num_gdb_nometa(struct super_block *sb, |
| ext4_group_t group) |
| { |
| - return ext4_bg_has_super(sb, group) ? EXT4_SB(sb)->s_gdb_count : 0; |
| + if (!ext4_bg_has_super(sb, group)) |
| + return 0; |
| + |
| + if (EXT4_HAS_INCOMPAT_FEATURE(sb,EXT4_FEATURE_INCOMPAT_META_BG)) |
| + return le32_to_cpu(EXT4_SB(sb)->s_es->s_first_meta_bg); |
| + else |
| + return EXT4_SB(sb)->s_gdb_count; |
| } |
| |
| /** |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:11 2009 |
| Message-Id: <20091211042810.591847517@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:46 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [68/90] ext4: fix block validity checks so they work correctly with meta_bg |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0068-ext4-fix-block-validity-checks-so-they-work-correctl.patch |
| Content-Length: 1411 |
| Lines: 39 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 1032988c71f3f85483b2b4319684d1205a704c02) |
| |
| The block validity checks used by ext4_data_block_valid() wasn't |
| correctly written to check file systems with the meta_bg feature. Fix |
| this. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/block_validity.c | 2 +- |
| fs/ext4/inode.c | 5 +---- |
| 2 files changed, 2 insertions(+), 5 deletions(-) |
| |
| --- a/fs/ext4/block_validity.c |
| +++ b/fs/ext4/block_validity.c |
| @@ -160,7 +160,7 @@ int ext4_setup_system_zone(struct super_ |
| if (ext4_bg_has_super(sb, i) && |
| ((i < 5) || ((i % flex_size) == 0))) |
| add_system_zone(sbi, ext4_group_first_block_no(sb, i), |
| - sbi->s_gdb_count + 1); |
| + ext4_bg_num_gdb(sb, i) + 1); |
| gdp = ext4_get_group_desc(sb, i, NULL); |
| ret = add_system_zone(sbi, ext4_block_bitmap(sb, gdp), 1); |
| if (ret) |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -4873,10 +4873,7 @@ struct inode *ext4_iget(struct super_blo |
| |
| ret = 0; |
| if (ei->i_file_acl && |
| - ((ei->i_file_acl < |
| - (le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block) + |
| - EXT4_SB(sb)->s_gdb_count)) || |
| - (ei->i_file_acl >= ext4_blocks_count(EXT4_SB(sb)->s_es)))) { |
| + !ext4_data_block_valid(EXT4_SB(sb), ei->i_file_acl, 1)) { |
| ext4_error(sb, __func__, |
| "bad extended attribute block %llu in inode #%lu", |
| ei->i_file_acl, inode->i_ino); |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:11 2009 |
| Message-Id: <20091211042811.145411136@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:47 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Jan Kara <jack@suse.cz>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [69/90] ext4: avoid issuing unnecessary barriers |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0069-ext4-avoid-issuing-unnecessary-barriers.patch |
| Content-Length: 1115 |
| Lines: 37 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 6b17d902fdd241adfa4ce780df20547b28bf5801) |
| |
| We don't to issue an I/O barrier on an error or if we force commit |
| because we are doing data journaling. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Cc: Jan Kara <jack@suse.cz> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/fsync.c | 8 +++----- |
| 1 file changed, 3 insertions(+), 5 deletions(-) |
| |
| --- a/fs/ext4/fsync.c |
| +++ b/fs/ext4/fsync.c |
| @@ -60,7 +60,7 @@ int ext4_sync_file(struct file *file, st |
| |
| ret = flush_aio_dio_completed_IO(inode); |
| if (ret < 0) |
| - goto out; |
| + return ret; |
| /* |
| * data=writeback: |
| * The caller's filemap_fdatawrite()/wait will sync the data. |
| @@ -79,10 +79,8 @@ int ext4_sync_file(struct file *file, st |
| * (they were dirtied by commit). But that's OK - the blocks are |
| * safe in-journal, which is all fsync() needs to ensure. |
| */ |
| - if (ext4_should_journal_data(inode)) { |
| - ret = ext4_force_commit(inode->i_sb); |
| - goto out; |
| - } |
| + if (ext4_should_journal_data(inode)) |
| + return ext4_force_commit(inode->i_sb); |
| |
| if (!journal) |
| ret = sync_mapping_buffers(inode->i_mapping); |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:12 2009 |
| Message-Id: <20091211042811.707301090@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:48 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Jan Kara <jack@suse.cz>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [70/90] ext4: fix error handling in ext4_ind_get_blocks() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0070-ext4-fix-error-handling-in-ext4_ind_get_blocks.patch |
| Content-Length: 733 |
| Lines: 25 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 2bba702d4f88d7b010ec37e2527b552588404ae7) |
| |
| When an error happened in ext4_splice_branch we failed to notice that |
| in ext4_ind_get_blocks and mapped the buffer anyway. Fix the problem |
| by checking for error properly. |
| |
| Signed-off-by: Jan Kara <jack@suse.cz> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -1022,7 +1022,7 @@ static int ext4_ind_get_blocks(handle_t |
| if (!err) |
| err = ext4_splice_branch(handle, inode, iblock, |
| partial, indirect_blks, count); |
| - else |
| + if (err) |
| goto cleanup; |
| |
| set_buffer_new(bh_result); |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:12 2009 |
| Message-Id: <20091211042812.322370572@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:49 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Eric Sandeen <sandeen@redhat.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [71/90] ext4: make trim/discard optional (and off by default) |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0071-ext4-make-trim-discard-optional-and-off-by-default.patch |
| Content-Length: 4275 |
| Lines: 124 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 5328e635315734d42080de9a5a1ee87bf4cae0a4) |
| |
| It is anticipated that when sb_issue_discard starts doing |
| real work on trim-capable devices, we may see issues. Make |
| this mount-time optional, and default it to off until we know |
| that things are working out OK. |
| |
| Signed-off-by: Eric Sandeen <sandeen@redhat.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| Documentation/filesystems/ext4.txt | 6 ++++++ |
| fs/ext4/ext4.h | 1 + |
| fs/ext4/mballoc.c | 21 +++++++++++++-------- |
| fs/ext4/super.c | 14 +++++++++++++- |
| 4 files changed, 33 insertions(+), 9 deletions(-) |
| |
| --- a/Documentation/filesystems/ext4.txt |
| +++ b/Documentation/filesystems/ext4.txt |
| @@ -338,6 +338,12 @@ noauto_da_alloc replacing existing file |
| system crashes before the delayed allocation |
| blocks are forced to disk. |
| |
| +discard Controls whether ext4 should issue discard/TRIM |
| +nodiscard(*) commands to the underlying block device when |
| + blocks are freed. This is useful for SSD devices |
| + and sparse/thinly-provisioned LUNs, but it is off |
| + by default until sufficient testing has been done. |
| + |
| Data Mode |
| ========= |
| There are 3 different data modes: |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -747,6 +747,7 @@ struct ext4_inode_info { |
| #define EXT4_MOUNT_DELALLOC 0x8000000 /* Delalloc support */ |
| #define EXT4_MOUNT_DATA_ERR_ABORT 0x10000000 /* Abort on file data write */ |
| #define EXT4_MOUNT_BLOCK_VALIDITY 0x20000000 /* Block validity checking */ |
| +#define EXT4_MOUNT_DISCARD 0x40000000 /* Issue DISCARD requests */ |
| |
| #define clear_opt(o, opt) o &= ~EXT4_MOUNT_##opt |
| #define set_opt(o, opt) o |= EXT4_MOUNT_##opt |
| --- a/fs/ext4/mballoc.c |
| +++ b/fs/ext4/mballoc.c |
| @@ -2810,7 +2810,6 @@ static void release_blocks_on_commit(jou |
| struct ext4_group_info *db; |
| int err, count = 0, count2 = 0; |
| struct ext4_free_data *entry; |
| - ext4_fsblk_t discard_block; |
| struct list_head *l, *ltmp; |
| |
| list_for_each_safe(l, ltmp, &txn->t_private_list) { |
| @@ -2840,13 +2839,19 @@ static void release_blocks_on_commit(jou |
| page_cache_release(e4b.bd_bitmap_page); |
| } |
| ext4_unlock_group(sb, entry->group); |
| - discard_block = (ext4_fsblk_t) entry->group * EXT4_BLOCKS_PER_GROUP(sb) |
| - + entry->start_blk |
| - + le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block); |
| - trace_ext4_discard_blocks(sb, (unsigned long long)discard_block, |
| - entry->count); |
| - sb_issue_discard(sb, discard_block, entry->count); |
| - |
| + if (test_opt(sb, DISCARD)) { |
| + ext4_fsblk_t discard_block; |
| + struct ext4_super_block *es = EXT4_SB(sb)->s_es; |
| + |
| + discard_block = (ext4_fsblk_t)entry->group * |
| + EXT4_BLOCKS_PER_GROUP(sb) |
| + + entry->start_blk |
| + + le32_to_cpu(es->s_first_data_block); |
| + trace_ext4_discard_blocks(sb, |
| + (unsigned long long)discard_block, |
| + entry->count); |
| + sb_issue_discard(sb, discard_block, entry->count); |
| + } |
| kmem_cache_free(ext4_free_ext_cachep, entry); |
| ext4_mb_release_desc(&e4b); |
| } |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -906,6 +906,9 @@ static int ext4_show_options(struct seq_ |
| if (test_opt(sb, NO_AUTO_DA_ALLOC)) |
| seq_puts(seq, ",noauto_da_alloc"); |
| |
| + if (test_opt(sb, DISCARD)) |
| + seq_puts(seq, ",discard"); |
| + |
| ext4_show_quota_options(seq, sb); |
| |
| return 0; |
| @@ -1086,7 +1089,8 @@ enum { |
| Opt_usrquota, Opt_grpquota, Opt_i_version, |
| Opt_stripe, Opt_delalloc, Opt_nodelalloc, |
| Opt_block_validity, Opt_noblock_validity, |
| - Opt_inode_readahead_blks, Opt_journal_ioprio |
| + Opt_inode_readahead_blks, Opt_journal_ioprio, |
| + Opt_discard, Opt_nodiscard, |
| }; |
| |
| static const match_table_t tokens = { |
| @@ -1152,6 +1156,8 @@ static const match_table_t tokens = { |
| {Opt_auto_da_alloc, "auto_da_alloc=%u"}, |
| {Opt_auto_da_alloc, "auto_da_alloc"}, |
| {Opt_noauto_da_alloc, "noauto_da_alloc"}, |
| + {Opt_discard, "discard"}, |
| + {Opt_nodiscard, "nodiscard"}, |
| {Opt_err, NULL}, |
| }; |
| |
| @@ -1580,6 +1586,12 @@ set_qf_format: |
| else |
| set_opt(sbi->s_mount_opt,NO_AUTO_DA_ALLOC); |
| break; |
| + case Opt_discard: |
| + set_opt(sbi->s_mount_opt, DISCARD); |
| + break; |
| + case Opt_nodiscard: |
| + clear_opt(sbi->s_mount_opt, DISCARD); |
| + break; |
| default: |
| ext4_msg(sb, KERN_ERR, |
| "Unrecognized mount option \"%s\" " |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:13 2009 |
| Message-Id: <20091211042812.915030291@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:50 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Eric Sandeen <sandeen@redhat.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [72/90] ext4: make "norecovery" an alias for "noload" |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0072-ext4-make-norecovery-an-alias-for-noload.patch |
| Content-Length: 1856 |
| Lines: 53 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit e3bb52ae2bb9573e84c17b8e3560378d13a5c798) |
| |
| Users on the linux-ext4 list recently complained about differences |
| across filesystems w.r.t. how to mount without a journal replay. |
| |
| In the discussion it was noted that xfs's "norecovery" option is |
| perhaps more descriptively accurate than "noload," so let's make |
| that an alias for ext4. |
| |
| Also show this status in /proc/mounts |
| |
| Signed-off-by: Eric Sandeen <sandeen@redhat.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| Documentation/filesystems/ext4.txt | 4 ++-- |
| fs/ext4/super.c | 4 ++++ |
| 2 files changed, 6 insertions(+), 2 deletions(-) |
| |
| --- a/Documentation/filesystems/ext4.txt |
| +++ b/Documentation/filesystems/ext4.txt |
| @@ -153,8 +153,8 @@ journal_dev=devnum When the external jou |
| identified through its new major/minor numbers encoded |
| in devnum. |
| |
| -noload Don't load the journal on mounting. Note that |
| - if the filesystem was not unmounted cleanly, |
| +norecovery Don't load the journal on mounting. Note that |
| +noload if the filesystem was not unmounted cleanly, |
| skipping the journal replay will lead to the |
| filesystem containing inconsistencies that can |
| lead to any number of problems. |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -909,6 +909,9 @@ static int ext4_show_options(struct seq_ |
| if (test_opt(sb, DISCARD)) |
| seq_puts(seq, ",discard"); |
| |
| + if (test_opt(sb, NOLOAD)) |
| + seq_puts(seq, ",norecovery"); |
| + |
| ext4_show_quota_options(seq, sb); |
| |
| return 0; |
| @@ -1115,6 +1118,7 @@ static const match_table_t tokens = { |
| {Opt_acl, "acl"}, |
| {Opt_noacl, "noacl"}, |
| {Opt_noload, "noload"}, |
| + {Opt_noload, "norecovery"}, |
| {Opt_nobh, "nobh"}, |
| {Opt_bh, "bh"}, |
| {Opt_commit, "commit=%u"}, |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:13 2009 |
| Message-Id: <20091211042813.423360988@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:51 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [73/90] ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0073-ext4-Fix-double-free-of-blocks-with-EXT4_IOC_MOVE_EX.patch |
| Content-Length: 2565 |
| Lines: 75 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 94d7c16cbbbd0e03841fcf272bcaf0620ad39618) |
| |
| At the beginning of ext4_move_extent(), we call |
| ext4_discard_preallocations() to discard inode PAs of orig and donor |
| inodes. But in the following case, blocks can be double freed, so |
| move ext4_discard_preallocations() to the end of ext4_move_extents(). |
| |
| 1. Discard inode PAs of orig and donor inodes with |
| ext4_discard_preallocations() in ext4_move_extents(). |
| |
| orig : [ DATA1 ] |
| donor: [ DATA2 ] |
| |
| 2. While data blocks are exchanging between orig and donor inodes, new |
| inode PAs is created to orig by other process's block allocation. |
| (Since there are semaphore gaps in ext4_move_extents().) And new |
| inode PAs is used partially (2-1). |
| |
| 2-1 Create new inode PAs to orig inode |
| orig : [ DATA1 | used PA1 | free PA1 ] |
| donor: [ DATA2 ] |
| |
| 3. Donor inode which has old orig inode's blocks is deleted after |
| EXT4_IOC_MOVE_EXT finished (3-1, 3-2). So the block bitmap |
| corresponds to old orig inode's blocks are freed. |
| |
| 3-1 After EXT4_IOC_MOVE_EXT finished |
| orig : [ DATA2 | free PA1 ] |
| donor: [ DATA1 | used PA1 ] |
| |
| 3-2 Delete donor inode |
| orig : [ DATA2 | free PA1 ] |
| donor: [ FREE SPACE(DATA1) | FREE SPACE(used PA1) ] |
| |
| 4. The double-free of blocks is occurred, when close() is called to |
| orig inode. Because ext4_discard_preallocations() for orig inode |
| frees used PA1 and free PA1, though used PA1 is already freed in 3. |
| |
| 4-1 Double-free of blocks is occurred |
| orig : [ DATA2 | FREE SPACE(free PA1) ] |
| donor: [ FREE SPACE(DATA1) | DOUBLE FREE(used PA1) ] |
| |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 9 +++++---- |
| 1 file changed, 5 insertions(+), 4 deletions(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -1289,10 +1289,6 @@ ext4_move_extents(struct file *o_filp, s |
| ext4_ext_get_actual_len(ext_cur), block_end + 1) - |
| max(le32_to_cpu(ext_cur->ee_block), block_start); |
| |
| - /* Discard preallocations of two inodes */ |
| - ext4_discard_preallocations(orig_inode); |
| - ext4_discard_preallocations(donor_inode); |
| - |
| while (!last_extent && le32_to_cpu(ext_cur->ee_block) <= block_end) { |
| seq_blocks += add_blocks; |
| |
| @@ -1410,6 +1406,11 @@ ext4_move_extents(struct file *o_filp, s |
| |
| } |
| out: |
| + if (*moved_len) { |
| + ext4_discard_preallocations(orig_inode); |
| + ext4_discard_preallocations(donor_inode); |
| + } |
| + |
| if (orig_path) { |
| ext4_ext_drop_refs(orig_path); |
| kfree(orig_path); |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:14 2009 |
| Message-Id: <20091211042814.022299856@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:52 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Kazuya Mio <k-mio@sx.jp.nec.com>, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [74/90] ext4: initialize moved_len before calling ext4_move_extents() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0074-ext4-initialize-moved_len-before-calling-ext4_move_e.patch |
| Content-Length: 2445 |
| Lines: 72 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 446aaa6e7e993b38a6f21c6acfa68f3f1af3dbe3) |
| |
| The move_extent.moved_len is used to pass back the number of exchanged |
| blocks count to user space. Currently the caller must clear this |
| field; but we spend more code space checking for this requirement than |
| simply zeroing the field ourselves, so let's just make life easier for |
| everyone all around. |
| |
| Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ioctl.c | 1 + |
| fs/ext4/move_extent.c | 14 +++----------- |
| 2 files changed, 4 insertions(+), 11 deletions(-) |
| |
| --- a/fs/ext4/ioctl.c |
| +++ b/fs/ext4/ioctl.c |
| @@ -239,6 +239,7 @@ setversion_out: |
| } |
| } |
| |
| + me.moved_len = 0; |
| err = ext4_move_extents(filp, donor_filp, me.orig_start, |
| me.donor_start, me.len, &me.moved_len); |
| fput(donor_filp); |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -947,7 +947,6 @@ out2: |
| * @orig_start: logical start offset in block for orig |
| * @donor_start: logical start offset in block for donor |
| * @len: the number of blocks to be moved |
| - * @moved_len: moved block length |
| * |
| * Check the arguments of ext4_move_extents() whether the files can be |
| * exchanged with each other. |
| @@ -955,8 +954,8 @@ out2: |
| */ |
| static int |
| mext_check_arguments(struct inode *orig_inode, |
| - struct inode *donor_inode, __u64 orig_start, |
| - __u64 donor_start, __u64 *len, __u64 moved_len) |
| + struct inode *donor_inode, __u64 orig_start, |
| + __u64 donor_start, __u64 *len) |
| { |
| ext4_lblk_t orig_blocks, donor_blocks; |
| unsigned int blkbits = orig_inode->i_blkbits; |
| @@ -1010,13 +1009,6 @@ mext_check_arguments(struct inode *orig_ |
| return -EINVAL; |
| } |
| |
| - if (moved_len) { |
| - ext4_debug("ext4 move extent: moved_len should be 0 " |
| - "[ino:orig %lu, donor %lu]\n", orig_inode->i_ino, |
| - donor_inode->i_ino); |
| - return -EINVAL; |
| - } |
| - |
| if ((orig_start > MAX_DEFRAG_SIZE) || |
| (donor_start > MAX_DEFRAG_SIZE) || |
| (*len > MAX_DEFRAG_SIZE) || |
| @@ -1226,7 +1218,7 @@ ext4_move_extents(struct file *o_filp, s |
| double_down_write_data_sem(orig_inode, donor_inode); |
| /* Check the filesystem environment whether move_extent can be done */ |
| ret1 = mext_check_arguments(orig_inode, donor_inode, orig_start, |
| - donor_start, &len, *moved_len); |
| + donor_start, &len); |
| if (ret1) |
| goto out; |
| |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:15 2009 |
| Message-Id: <20091211042814.628012070@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:53 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [75/90] ext4: move_extent_per_page() cleanup |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0075-ext4-move_extent_per_page-cleanup.patch |
| Content-Length: 2733 |
| Lines: 87 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit ac48b0a1d068887141581bea8285de5fcab182b0) |
| |
| Integrate duplicate lines (acquire/release semaphore and invalidate |
| extent cache in move_extent_per_page()) into mext_replace_branches(), |
| to reduce source and object code size. |
| |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/move_extent.c | 30 +++++++++--------------------- |
| 1 file changed, 9 insertions(+), 21 deletions(-) |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -660,6 +660,9 @@ mext_replace_branches(handle_t *handle, |
| int replaced_count = 0; |
| int dext_alen; |
| |
| + /* Protect extent trees against block allocations via delalloc */ |
| + double_down_write_data_sem(orig_inode, donor_inode); |
| + |
| /* Get the original extent for the block "orig_off" */ |
| *err = get_ext_path(orig_inode, orig_off, &orig_path); |
| if (*err) |
| @@ -755,6 +758,11 @@ out: |
| kfree(donor_path); |
| } |
| |
| + ext4_ext_invalidate_cache(orig_inode); |
| + ext4_ext_invalidate_cache(donor_inode); |
| + |
| + double_up_write_data_sem(orig_inode, donor_inode); |
| + |
| return replaced_count; |
| } |
| |
| @@ -820,19 +828,9 @@ move_extent_per_page(struct file *o_filp |
| * Just swap data blocks between orig and donor. |
| */ |
| if (uninit) { |
| - /* |
| - * Protect extent trees against block allocations |
| - * via delalloc |
| - */ |
| - double_down_write_data_sem(orig_inode, donor_inode); |
| replaced_count = mext_replace_branches(handle, orig_inode, |
| donor_inode, orig_blk_offset, |
| block_len_in_page, err); |
| - |
| - /* Clear the inode cache not to refer to the old data */ |
| - ext4_ext_invalidate_cache(orig_inode); |
| - ext4_ext_invalidate_cache(donor_inode); |
| - double_up_write_data_sem(orig_inode, donor_inode); |
| goto out2; |
| } |
| |
| @@ -880,8 +878,6 @@ move_extent_per_page(struct file *o_filp |
| /* Release old bh and drop refs */ |
| try_to_release_page(page, 0); |
| |
| - /* Protect extent trees against block allocations via delalloc */ |
| - double_down_write_data_sem(orig_inode, donor_inode); |
| replaced_count = mext_replace_branches(handle, orig_inode, donor_inode, |
| orig_blk_offset, block_len_in_page, |
| &err2); |
| @@ -890,18 +886,10 @@ move_extent_per_page(struct file *o_filp |
| block_len_in_page = replaced_count; |
| replaced_size = |
| block_len_in_page << orig_inode->i_blkbits; |
| - } else { |
| - double_up_write_data_sem(orig_inode, donor_inode); |
| + } else |
| goto out; |
| - } |
| } |
| |
| - /* Clear the inode cache not to refer to the old data */ |
| - ext4_ext_invalidate_cache(orig_inode); |
| - ext4_ext_invalidate_cache(donor_inode); |
| - |
| - double_up_write_data_sem(orig_inode, donor_inode); |
| - |
| if (!page_has_buffers(page)) |
| create_empty_buffers(page, 1 << orig_inode->i_blkbits, 0); |
| |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:15 2009 |
| Message-Id: <20091211042815.186798204@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:54 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [76/90] jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0076-jbd2-Add-ENOMEM-checking-in-and-for-jbd2_journal_wri.patch |
| Content-Length: 1035 |
| Lines: 38 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit e6ec116b67f46e0e7808276476554727b2e6240b) |
| |
| OOM happens. |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/jbd2/commit.c | 4 ++++ |
| fs/jbd2/journal.c | 4 ++++ |
| 2 files changed, 8 insertions(+) |
| |
| --- a/fs/jbd2/commit.c |
| +++ b/fs/jbd2/commit.c |
| @@ -636,6 +636,10 @@ void jbd2_journal_commit_transaction(jou |
| JBUFFER_TRACE(jh, "ph3: write metadata"); |
| flags = jbd2_journal_write_metadata_buffer(commit_transaction, |
| jh, &new_jh, blocknr); |
| + if (flags < 0) { |
| + jbd2_journal_abort(journal, flags); |
| + continue; |
| + } |
| set_bit(BH_JWrite, &jh2bh(new_jh)->b_state); |
| wbuf[bufs++] = jh2bh(new_jh); |
| |
| --- a/fs/jbd2/journal.c |
| +++ b/fs/jbd2/journal.c |
| @@ -361,6 +361,10 @@ repeat: |
| |
| jbd_unlock_bh_state(bh_in); |
| tmp = jbd2_alloc(bh_in->b_size, GFP_NOFS); |
| + if (!tmp) { |
| + jbd2_journal_put_journal_head(new_jh); |
| + return -ENOMEM; |
| + } |
| jbd_lock_bh_state(bh_in); |
| if (jh_in->b_frozen_data) { |
| jbd2_free(tmp, bh_in->b_size); |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:16 2009 |
| Message-Id: <20091211042815.716499145@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:55 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Roel Kluin <roel.kluin@gmail.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [77/90] ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks() |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0077-ext4-Return-the-PTR_ERR-of-the-correct-pointer-in-se.patch |
| Content-Length: 595 |
| Lines: 21 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit c09eef305dd43846360944ad072f051f964fa383) |
| |
| Signed-off-by: Roel Kluin <roel.kluin@gmail.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/resize.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/ext4/resize.c |
| +++ b/fs/ext4/resize.c |
| @@ -247,7 +247,7 @@ static int setup_new_group_blocks(struct |
| goto exit_bh; |
| |
| if (IS_ERR(gdb = bclean(handle, sb, block))) { |
| - err = PTR_ERR(bh); |
| + err = PTR_ERR(gdb); |
| goto exit_bh; |
| } |
| ext4_handle_dirty_metadata(handle, NULL, gdb); |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:16 2009 |
| Message-Id: <20091211042816.324947251@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:56 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Jan Kara <jack@suse.cz>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [78/90] ext4: Avoid data / filesystem corruption when write fails to copy data |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0078-ext4-Avoid-data-filesystem-corruption-when-write-fai.patch |
| Content-Length: 2923 |
| Lines: 84 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit b9a4207d5e911b938f73079a83cc2ae10524ec7f) |
| |
| When ext4_write_begin fails after allocating some blocks or |
| generic_perform_write fails to copy data to write, we truncate blocks |
| already instantiated beyond i_size. Although these blocks were never |
| inside i_size, we have to truncate the pagecache of these blocks so |
| that corresponding buffers get unmapped. Otherwise subsequent |
| __block_prepare_write (called because we are retrying the write) will |
| find the buffers mapped, not call ->get_block, and thus the page will |
| be backed by already freed blocks leading to filesystem and data |
| corruption. |
| |
| Signed-off-by: Jan Kara <jack@suse.cz> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 20 +++++++++++++++----- |
| 1 file changed, 15 insertions(+), 5 deletions(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -1535,6 +1535,16 @@ static int do_journal_get_write_access(h |
| return ext4_journal_get_write_access(handle, bh); |
| } |
| |
| +/* |
| + * Truncate blocks that were not used by write. We have to truncate the |
| + * pagecache as well so that corresponding buffers get properly unmapped. |
| + */ |
| +static void ext4_truncate_failed_write(struct inode *inode) |
| +{ |
| + truncate_inode_pages(inode->i_mapping, inode->i_size); |
| + ext4_truncate(inode); |
| +} |
| + |
| static int ext4_write_begin(struct file *file, struct address_space *mapping, |
| loff_t pos, unsigned len, unsigned flags, |
| struct page **pagep, void **fsdata) |
| @@ -1600,7 +1610,7 @@ retry: |
| |
| ext4_journal_stop(handle); |
| if (pos + len > inode->i_size) { |
| - ext4_truncate(inode); |
| + ext4_truncate_failed_write(inode); |
| /* |
| * If truncate failed early the inode might |
| * still be on the orphan list; we need to |
| @@ -1710,7 +1720,7 @@ static int ext4_ordered_write_end(struct |
| ret = ret2; |
| |
| if (pos + len > inode->i_size) { |
| - ext4_truncate(inode); |
| + ext4_truncate_failed_write(inode); |
| /* |
| * If truncate failed early the inode might still be |
| * on the orphan list; we need to make sure the inode |
| @@ -1752,7 +1762,7 @@ static int ext4_writeback_write_end(stru |
| ret = ret2; |
| |
| if (pos + len > inode->i_size) { |
| - ext4_truncate(inode); |
| + ext4_truncate_failed_write(inode); |
| /* |
| * If truncate failed early the inode might still be |
| * on the orphan list; we need to make sure the inode |
| @@ -1815,7 +1825,7 @@ static int ext4_journalled_write_end(str |
| if (!ret) |
| ret = ret2; |
| if (pos + len > inode->i_size) { |
| - ext4_truncate(inode); |
| + ext4_truncate_failed_write(inode); |
| /* |
| * If truncate failed early the inode might still be |
| * on the orphan list; we need to make sure the inode |
| @@ -3087,7 +3097,7 @@ retry: |
| * i_size_read because we hold i_mutex. |
| */ |
| if (pos + len > inode->i_size) |
| - ext4_truncate(inode); |
| + ext4_truncate_failed_write(inode); |
| } |
| |
| if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:17 2009 |
| Message-Id: <20091211042816.881920653@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:57 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Josef Bacik <josef@redhat.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [79/90] ext4: wait for log to commit when umounting |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0079-ext4-wait-for-log-to-commit-when-umounting.patch |
| Content-Length: 1540 |
| Lines: 46 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit d4edac314e9ad0b21ba20ba8bc61b61f186f79e1) |
| |
| There is a potential race when a transaction is committing right when |
| the file system is being umounting. This could reduce in a race |
| because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super |
| before the commit code calls a callback so the mballoc code can |
| release freed blocks in the transaction, resulting in a panic trying |
| to access the freed s_group_info. |
| |
| The fix is to wait for the transaction to finish committing before we |
| shutdown the multiblock allocator. |
| |
| Signed-off-by: Josef Bacik <josef@redhat.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/super.c | 10 ++++++---- |
| 1 file changed, 6 insertions(+), 4 deletions(-) |
| |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -610,10 +610,6 @@ static void ext4_put_super(struct super_ |
| if (sb->s_dirt) |
| ext4_commit_super(sb, 1); |
| |
| - ext4_release_system_zone(sb); |
| - ext4_mb_release(sb); |
| - ext4_ext_release(sb); |
| - ext4_xattr_put_super(sb); |
| if (sbi->s_journal) { |
| err = jbd2_journal_destroy(sbi->s_journal); |
| sbi->s_journal = NULL; |
| @@ -621,6 +617,12 @@ static void ext4_put_super(struct super_ |
| ext4_abort(sb, __func__, |
| "Couldn't clean up the journal"); |
| } |
| + |
| + ext4_release_system_zone(sb); |
| + ext4_mb_release(sb); |
| + ext4_ext_release(sb); |
| + ext4_xattr_put_super(sb); |
| + |
| if (!(sb->s_flags & MS_RDONLY)) { |
| EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER); |
| es->s_state = cpu_to_le16(sbi->s_mount_state); |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:17 2009 |
| Message-Id: <20091211042817.411285726@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:58 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Curt Wohlgemuth <curtw@google.com>, |
| "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [80/90] ext4: remove blocks from inode prealloc list on failure |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0080-ext4-remove-blocks-from-inode-prealloc-list-on-failu.patch |
| Content-Length: 1476 |
| Lines: 49 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit b844167edc7fcafda9623955c05e4c1b3c32ebc7) |
| |
| This fixes a leak of blocks in an inode prealloc list if device failures |
| cause ext4_mb_mark_diskspace_used() to fail. |
| |
| Signed-off-by: Curt Wohlgemuth <curtw@google.com> |
| Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/mballoc.c | 19 +++++++++++++++++++ |
| 1 file changed, 19 insertions(+) |
| |
| --- a/fs/ext4/mballoc.c |
| +++ b/fs/ext4/mballoc.c |
| @@ -3258,6 +3258,24 @@ static void ext4_mb_collect_stats(struct |
| } |
| |
| /* |
| + * Called on failure; free up any blocks from the inode PA for this |
| + * context. We don't need this for MB_GROUP_PA because we only change |
| + * pa_free in ext4_mb_release_context(), but on failure, we've already |
| + * zeroed out ac->ac_b_ex.fe_len, so group_pa->pa_free is not changed. |
| + */ |
| +static void ext4_discard_allocated_blocks(struct ext4_allocation_context *ac) |
| +{ |
| + struct ext4_prealloc_space *pa = ac->ac_pa; |
| + int len; |
| + |
| + if (pa && pa->pa_type == MB_INODE_PA) { |
| + len = ac->ac_b_ex.fe_len; |
| + pa->pa_free += len; |
| + } |
| + |
| +} |
| + |
| +/* |
| * use blocks preallocated to inode |
| */ |
| static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac, |
| @@ -4546,6 +4564,7 @@ repeat: |
| ac->ac_status = AC_STATUS_CONTINUE; |
| goto repeat; |
| } else if (*errp) { |
| + ext4_discard_allocated_blocks(ac); |
| ac->ac_b_ex.fe_len = 0; |
| ar->len = 0; |
| ext4_mb_show_ac(ac); |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:18 2009 |
| Message-Id: <20091211042818.062493875@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:25:59 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Dmitry Monakhov <dmonakhov@openvz.org>, |
| Mingming Cao <cmm@us.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [81/90] ext4: ext4_get_reserved_space() must return bytes instead of blocks |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0081-ext4-ext4_get_reserved_space-must-return-bytes-inste.patch |
| Content-Length: 718 |
| Lines: 23 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 8aa6790f876e81f5a2211fe1711a5fe3fe2d7b20) |
| |
| Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> |
| Reviewed-by: Eric Sandeen <sandeen@redhat.com> |
| Acked-by: Mingming Cao <cmm@us.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -1053,7 +1053,7 @@ qsize_t ext4_get_reserved_space(struct i |
| EXT4_I(inode)->i_reserved_meta_blocks; |
| spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); |
| |
| - return total; |
| + return (total << inode->i_blkbits); |
| } |
| /* |
| * Calculate the number of metadata blocks need to reserve |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:19 2009 |
| Message-Id: <20091211042818.571106799@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:26:00 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Dmitry Monakhov <dmonakhov@openvz.org>, |
| Mingming Cao <cmm@us.ibm.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [82/90] ext4: quota macros cleanup |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0082-ext4-quota-macros-cleanup.patch |
| Content-Length: 5167 |
| Lines: 138 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 5aca07eb7d8f14d90c740834d15ca15277f4820c) |
| |
| Currently all quota block reservation macros contains hard-coded "2" |
| aka MAXQUOTAS value. This is no good because in some places it is not |
| obvious to understand what does this digit represent. Let's introduce |
| new macro with self descriptive name. |
| |
| Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> |
| Acked-by: Mingming Cao <cmm@us.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4_jbd2.h | 8 ++++++-- |
| fs/ext4/extents.c | 2 +- |
| fs/ext4/inode.c | 2 +- |
| fs/ext4/migrate.c | 4 ++-- |
| fs/ext4/namei.c | 8 ++++---- |
| 5 files changed, 14 insertions(+), 10 deletions(-) |
| |
| --- a/fs/ext4/ext4_jbd2.h |
| +++ b/fs/ext4/ext4_jbd2.h |
| @@ -49,7 +49,7 @@ |
| |
| #define EXT4_DATA_TRANS_BLOCKS(sb) (EXT4_SINGLEDATA_TRANS_BLOCKS(sb) + \ |
| EXT4_XATTR_TRANS_BLOCKS - 2 + \ |
| - 2*EXT4_QUOTA_TRANS_BLOCKS(sb)) |
| + EXT4_MAXQUOTAS_TRANS_BLOCKS(sb)) |
| |
| /* |
| * Define the number of metadata blocks we need to account to modify data. |
| @@ -57,7 +57,7 @@ |
| * This include super block, inode block, quota blocks and xattr blocks |
| */ |
| #define EXT4_META_TRANS_BLOCKS(sb) (EXT4_XATTR_TRANS_BLOCKS + \ |
| - 2*EXT4_QUOTA_TRANS_BLOCKS(sb)) |
| + EXT4_MAXQUOTAS_TRANS_BLOCKS(sb)) |
| |
| /* Delete operations potentially hit one directory's namespace plus an |
| * entire inode, plus arbitrary amounts of bitmap/indirection data. Be |
| @@ -92,6 +92,7 @@ |
| * but inode, sb and group updates are done only once */ |
| #define EXT4_QUOTA_INIT_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_INIT_ALLOC*\ |
| (EXT4_SINGLEDATA_TRANS_BLOCKS(sb)-3)+3+DQUOT_INIT_REWRITE) : 0) |
| + |
| #define EXT4_QUOTA_DEL_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_DEL_ALLOC*\ |
| (EXT4_SINGLEDATA_TRANS_BLOCKS(sb)-3)+3+DQUOT_DEL_REWRITE) : 0) |
| #else |
| @@ -99,6 +100,9 @@ |
| #define EXT4_QUOTA_INIT_BLOCKS(sb) 0 |
| #define EXT4_QUOTA_DEL_BLOCKS(sb) 0 |
| #endif |
| +#define EXT4_MAXQUOTAS_TRANS_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_TRANS_BLOCKS(sb)) |
| +#define EXT4_MAXQUOTAS_INIT_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_INIT_BLOCKS(sb)) |
| +#define EXT4_MAXQUOTAS_DEL_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_DEL_BLOCKS(sb)) |
| |
| int |
| ext4_mark_iloc_dirty(handle_t *handle, |
| --- a/fs/ext4/extents.c |
| +++ b/fs/ext4/extents.c |
| @@ -2147,7 +2147,7 @@ ext4_ext_rm_leaf(handle_t *handle, struc |
| correct_index = 1; |
| credits += (ext_depth(inode)) + 1; |
| } |
| - credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb); |
| + credits += EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb); |
| |
| err = ext4_ext_truncate_extend_restart(handle, inode, credits); |
| if (err) |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -5221,7 +5221,7 @@ int ext4_setattr(struct dentry *dentry, |
| |
| /* (user+group)*(old+new) structure, inode write (sb, |
| * inode block, ? - but truncate inode update has it) */ |
| - handle = ext4_journal_start(inode, 2*(EXT4_QUOTA_INIT_BLOCKS(inode->i_sb)+ |
| + handle = ext4_journal_start(inode, (EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)+ |
| EXT4_QUOTA_DEL_BLOCKS(inode->i_sb))+3); |
| if (IS_ERR(handle)) { |
| error = PTR_ERR(handle); |
| --- a/fs/ext4/migrate.c |
| +++ b/fs/ext4/migrate.c |
| @@ -238,7 +238,7 @@ static int extend_credit_for_blkdel(hand |
| * So allocate a credit of 3. We may update |
| * quota (user and group). |
| */ |
| - needed = 3 + 2*EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb); |
| + needed = 3 + EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb); |
| |
| if (ext4_journal_extend(handle, needed) != 0) |
| retval = ext4_journal_restart(handle, needed); |
| @@ -477,7 +477,7 @@ int ext4_ext_migrate(struct inode *inode |
| handle = ext4_journal_start(inode, |
| EXT4_DATA_TRANS_BLOCKS(inode->i_sb) + |
| EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 + |
| - 2 * EXT4_QUOTA_INIT_BLOCKS(inode->i_sb) |
| + EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb) |
| + 1); |
| if (IS_ERR(handle)) { |
| retval = PTR_ERR(handle); |
| --- a/fs/ext4/namei.c |
| +++ b/fs/ext4/namei.c |
| @@ -1769,7 +1769,7 @@ static int ext4_create(struct inode *dir |
| retry: |
| handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) + |
| EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 + |
| - 2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb)); |
| + EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)); |
| if (IS_ERR(handle)) |
| return PTR_ERR(handle); |
| |
| @@ -1803,7 +1803,7 @@ static int ext4_mknod(struct inode *dir, |
| retry: |
| handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) + |
| EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 + |
| - 2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb)); |
| + EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)); |
| if (IS_ERR(handle)) |
| return PTR_ERR(handle); |
| |
| @@ -1840,7 +1840,7 @@ static int ext4_mkdir(struct inode *dir, |
| retry: |
| handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) + |
| EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 + |
| - 2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb)); |
| + EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)); |
| if (IS_ERR(handle)) |
| return PTR_ERR(handle); |
| |
| @@ -2253,7 +2253,7 @@ static int ext4_symlink(struct inode *di |
| retry: |
| handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) + |
| EXT4_INDEX_EXTRA_TRANS_BLOCKS + 5 + |
| - 2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb)); |
| + EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)); |
| if (IS_ERR(handle)) |
| return PTR_ERR(handle); |
| |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:19 2009 |
| Message-Id: <20091211042819.212643394@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:26:01 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Dmitry Monakhov <dmonakhov@openvz.org>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [83/90] ext4: fix incorrect block reservation on quota transfer. |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0083-ext4-fix-incorrect-block-reservation-on-quota-transf.patch |
| Content-Length: 1036 |
| Lines: 27 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 194074acacebc169ded90a4657193f5180015051) |
| |
| Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid |
| This means that we may end-up with transferring all quotas. Add |
| we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in |
| case of QUOTA_INIT_BLOCKS. |
| |
| Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> |
| Reviewed-by: Mingming Cao <cmm@us.ibm.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/inode.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -5222,7 +5222,7 @@ int ext4_setattr(struct dentry *dentry, |
| /* (user+group)*(old+new) structure, inode write (sb, |
| * inode block, ? - but truncate inode update has it) */ |
| handle = ext4_journal_start(inode, (EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)+ |
| - EXT4_QUOTA_DEL_BLOCKS(inode->i_sb))+3); |
| + EXT4_MAXQUOTAS_DEL_BLOCKS(inode->i_sb))+3); |
| if (IS_ERR(handle)) { |
| error = PTR_ERR(handle); |
| goto err_out; |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:20 2009 |
| Message-Id: <20091211042819.790485160@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:26:02 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Jan Kara <jack@suse.cz>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [84/90] ext4: Wait for proper transaction commit on fsync |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0084-ext4-Wait-for-proper-transaction-commit-on-fsync.patch |
| Content-Length: 7849 |
| Lines: 252 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit b436b9bef84de6893e86346d8fbf7104bc520645) |
| |
| We cannot rely on buffer dirty bits during fsync because pdflush can come |
| before fsync is called and clear dirty bits without forcing a transaction |
| commit. What we do is that we track which transaction has last changed |
| the inode and which transaction last changed allocation and force it to |
| disk on fsync. |
| |
| Signed-off-by: Jan Kara <jack@suse.cz> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ext4.h | 7 +++++++ |
| fs/ext4/ext4_jbd2.h | 13 +++++++++++++ |
| fs/ext4/extents.c | 14 ++++++++++++-- |
| fs/ext4/fsync.c | 46 +++++++++++++++++----------------------------- |
| fs/ext4/inode.c | 29 +++++++++++++++++++++++++++++ |
| fs/ext4/super.c | 2 ++ |
| fs/jbd2/journal.c | 1 + |
| 7 files changed, 81 insertions(+), 31 deletions(-) |
| |
| --- a/fs/ext4/ext4.h |
| +++ b/fs/ext4/ext4.h |
| @@ -700,6 +700,13 @@ struct ext4_inode_info { |
| struct list_head i_aio_dio_complete_list; |
| /* current io_end structure for async DIO write*/ |
| ext4_io_end_t *cur_aio_dio; |
| + |
| + /* |
| + * Transactions that contain inode's metadata needed to complete |
| + * fsync and fdatasync, respectively. |
| + */ |
| + tid_t i_sync_tid; |
| + tid_t i_datasync_tid; |
| }; |
| |
| /* |
| --- a/fs/ext4/ext4_jbd2.h |
| +++ b/fs/ext4/ext4_jbd2.h |
| @@ -258,6 +258,19 @@ static inline int ext4_jbd2_file_inode(h |
| return 0; |
| } |
| |
| +static inline void ext4_update_inode_fsync_trans(handle_t *handle, |
| + struct inode *inode, |
| + int datasync) |
| +{ |
| + struct ext4_inode_info *ei = EXT4_I(inode); |
| + |
| + if (ext4_handle_valid(handle)) { |
| + ei->i_sync_tid = handle->h_transaction->t_tid; |
| + if (datasync) |
| + ei->i_datasync_tid = handle->h_transaction->t_tid; |
| + } |
| +} |
| + |
| /* super.c */ |
| int ext4_force_commit(struct super_block *sb); |
| |
| --- a/fs/ext4/extents.c |
| +++ b/fs/ext4/extents.c |
| @@ -3041,6 +3041,8 @@ ext4_ext_handle_uninitialized_extents(ha |
| if (flags == EXT4_GET_BLOCKS_DIO_CONVERT_EXT) { |
| ret = ext4_convert_unwritten_extents_dio(handle, inode, |
| path); |
| + if (ret >= 0) |
| + ext4_update_inode_fsync_trans(handle, inode, 1); |
| goto out2; |
| } |
| /* buffered IO case */ |
| @@ -3068,6 +3070,8 @@ ext4_ext_handle_uninitialized_extents(ha |
| ret = ext4_ext_convert_to_initialized(handle, inode, |
| path, iblock, |
| max_blocks); |
| + if (ret >= 0) |
| + ext4_update_inode_fsync_trans(handle, inode, 1); |
| out: |
| if (ret <= 0) { |
| err = ret; |
| @@ -3306,10 +3310,16 @@ int ext4_ext_get_blocks(handle_t *handle |
| allocated = ext4_ext_get_actual_len(&newex); |
| set_buffer_new(bh_result); |
| |
| - /* Cache only when it is _not_ an uninitialized extent */ |
| - if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0) |
| + /* |
| + * Cache the extent and update transaction to commit on fdatasync only |
| + * when it is _not_ an uninitialized extent. |
| + */ |
| + if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0) { |
| ext4_ext_put_in_cache(inode, iblock, allocated, newblock, |
| EXT4_EXT_CACHE_EXTENT); |
| + ext4_update_inode_fsync_trans(handle, inode, 1); |
| + } else |
| + ext4_update_inode_fsync_trans(handle, inode, 0); |
| out: |
| if (allocated > max_blocks) |
| allocated = max_blocks; |
| --- a/fs/ext4/fsync.c |
| +++ b/fs/ext4/fsync.c |
| @@ -51,25 +51,30 @@ |
| int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) |
| { |
| struct inode *inode = dentry->d_inode; |
| + struct ext4_inode_info *ei = EXT4_I(inode); |
| journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; |
| - int err, ret = 0; |
| + int ret; |
| + tid_t commit_tid; |
| |
| J_ASSERT(ext4_journal_current_handle() == NULL); |
| |
| trace_ext4_sync_file(file, dentry, datasync); |
| |
| + if (inode->i_sb->s_flags & MS_RDONLY) |
| + return 0; |
| + |
| ret = flush_aio_dio_completed_IO(inode); |
| if (ret < 0) |
| return ret; |
| + |
| + if (!journal) |
| + return simple_fsync(file, dentry, datasync); |
| + |
| /* |
| - * data=writeback: |
| + * data=writeback,ordered: |
| * The caller's filemap_fdatawrite()/wait will sync the data. |
| - * sync_inode() will sync the metadata |
| - * |
| - * data=ordered: |
| - * The caller's filemap_fdatawrite() will write the data and |
| - * sync_inode() will write the inode if it is dirty. Then the caller's |
| - * filemap_fdatawait() will wait on the pages. |
| + * Metadata is in the journal, we wait for proper transaction to |
| + * commit here. |
| * |
| * data=journal: |
| * filemap_fdatawrite won't do anything (the buffers are clean). |
| @@ -82,27 +87,10 @@ int ext4_sync_file(struct file *file, st |
| if (ext4_should_journal_data(inode)) |
| return ext4_force_commit(inode->i_sb); |
| |
| - if (!journal) |
| - ret = sync_mapping_buffers(inode->i_mapping); |
| - |
| - if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) |
| - goto out; |
| - |
| - /* |
| - * The VFS has written the file data. If the inode is unaltered |
| - * then we need not start a commit. |
| - */ |
| - if (inode->i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC)) { |
| - struct writeback_control wbc = { |
| - .sync_mode = WB_SYNC_ALL, |
| - .nr_to_write = 0, /* sys_fsync did this */ |
| - }; |
| - err = sync_inode(inode, &wbc); |
| - if (ret == 0) |
| - ret = err; |
| - } |
| -out: |
| - if (journal && (journal->j_flags & JBD2_BARRIER)) |
| + commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid; |
| + if (jbd2_log_start_commit(journal, commit_tid)) |
| + jbd2_log_wait_commit(journal, commit_tid); |
| + else if (journal->j_flags & JBD2_BARRIER) |
| blkdev_issue_flush(inode->i_sb->s_bdev, NULL); |
| return ret; |
| } |
| --- a/fs/ext4/inode.c |
| +++ b/fs/ext4/inode.c |
| @@ -1026,6 +1026,8 @@ static int ext4_ind_get_blocks(handle_t |
| goto cleanup; |
| |
| set_buffer_new(bh_result); |
| + |
| + ext4_update_inode_fsync_trans(handle, inode, 1); |
| got_it: |
| map_bh(bh_result, inode->i_sb, le32_to_cpu(chain[depth-1].key)); |
| if (count > blocks_to_boundary) |
| @@ -4784,6 +4786,7 @@ struct inode *ext4_iget(struct super_blo |
| struct ext4_inode *raw_inode; |
| struct ext4_inode_info *ei; |
| struct inode *inode; |
| + journal_t *journal = EXT4_SB(sb)->s_journal; |
| long ret; |
| int block; |
| |
| @@ -4848,6 +4851,31 @@ struct inode *ext4_iget(struct super_blo |
| ei->i_data[block] = raw_inode->i_block[block]; |
| INIT_LIST_HEAD(&ei->i_orphan); |
| |
| + /* |
| + * Set transaction id's of transactions that have to be committed |
| + * to finish f[data]sync. We set them to currently running transaction |
| + * as we cannot be sure that the inode or some of its metadata isn't |
| + * part of the transaction - the inode could have been reclaimed and |
| + * now it is reread from disk. |
| + */ |
| + if (journal) { |
| + transaction_t *transaction; |
| + tid_t tid; |
| + |
| + spin_lock(&journal->j_state_lock); |
| + if (journal->j_running_transaction) |
| + transaction = journal->j_running_transaction; |
| + else |
| + transaction = journal->j_committing_transaction; |
| + if (transaction) |
| + tid = transaction->t_tid; |
| + else |
| + tid = journal->j_commit_sequence; |
| + spin_unlock(&journal->j_state_lock); |
| + ei->i_sync_tid = tid; |
| + ei->i_datasync_tid = tid; |
| + } |
| + |
| if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) { |
| ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize); |
| if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize > |
| @@ -5102,6 +5130,7 @@ static int ext4_do_update_inode(handle_t |
| err = rc; |
| ei->i_state &= ~EXT4_STATE_NEW; |
| |
| + ext4_update_inode_fsync_trans(handle, inode, 0); |
| out_brelse: |
| brelse(bh); |
| ext4_std_error(inode->i_sb, err); |
| --- a/fs/ext4/super.c |
| +++ b/fs/ext4/super.c |
| @@ -713,6 +713,8 @@ static struct inode *ext4_alloc_inode(st |
| spin_lock_init(&(ei->i_block_reservation_lock)); |
| INIT_LIST_HEAD(&ei->i_aio_dio_complete_list); |
| ei->cur_aio_dio = NULL; |
| + ei->i_sync_tid = 0; |
| + ei->i_datasync_tid = 0; |
| |
| return &ei->vfs_inode; |
| } |
| --- a/fs/jbd2/journal.c |
| +++ b/fs/jbd2/journal.c |
| @@ -78,6 +78,7 @@ EXPORT_SYMBOL(jbd2_journal_errno); |
| EXPORT_SYMBOL(jbd2_journal_ack_err); |
| EXPORT_SYMBOL(jbd2_journal_clear_err); |
| EXPORT_SYMBOL(jbd2_log_wait_commit); |
| +EXPORT_SYMBOL(jbd2_log_start_commit); |
| EXPORT_SYMBOL(jbd2_journal_start_commit); |
| EXPORT_SYMBOL(jbd2_journal_force_commit_nested); |
| EXPORT_SYMBOL(jbd2_journal_wipe); |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:20 2009 |
| Message-Id: <20091211042820.350577113@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:26:03 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Akira Fujita <a-fujita@rs.jp.nec.com>, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [85/90] ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=0085-ext4-Fix-insufficient-checks-in-EXT4_IOC_MOVE_EXT.patch |
| Content-Length: 2732 |
| Lines: 94 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit 4a58579b9e4e2a35d57e6c9c8483e52f6f1b7fd6) |
| |
| This patch fixes three problems in the handling of the |
| EXT4_IOC_MOVE_EXT ioctl: |
| |
| 1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for |
| original and donor files, but they allow the illegal write access to |
| donor file, since donor file is overwritten by original file data. To |
| fix this problem, change access mode checks of original (r->r/w) and |
| donor (r->w) files. |
| |
| 2. Disallow the use of donor files that have a setuid or setgid bits. |
| |
| 3. Call mnt_want_write() and mnt_drop_write() before and after |
| ext4_move_extents() calling to get write access to a mount. |
| |
| Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/ioctl.c | 30 ++++++++++++++++++------------ |
| fs/ext4/move_extent.c | 7 +++++++ |
| 2 files changed, 25 insertions(+), 12 deletions(-) |
| |
| --- a/fs/ext4/ioctl.c |
| +++ b/fs/ext4/ioctl.c |
| @@ -221,32 +221,38 @@ setversion_out: |
| struct file *donor_filp; |
| int err; |
| |
| + if (!(filp->f_mode & FMODE_READ) || |
| + !(filp->f_mode & FMODE_WRITE)) |
| + return -EBADF; |
| + |
| if (copy_from_user(&me, |
| (struct move_extent __user *)arg, sizeof(me))) |
| return -EFAULT; |
| + me.moved_len = 0; |
| |
| donor_filp = fget(me.donor_fd); |
| if (!donor_filp) |
| return -EBADF; |
| |
| - if (!capable(CAP_DAC_OVERRIDE)) { |
| - if ((current->real_cred->fsuid != inode->i_uid) || |
| - !(inode->i_mode & S_IRUSR) || |
| - !(donor_filp->f_dentry->d_inode->i_mode & |
| - S_IRUSR)) { |
| - fput(donor_filp); |
| - return -EACCES; |
| - } |
| + if (!(donor_filp->f_mode & FMODE_WRITE)) { |
| + err = -EBADF; |
| + goto mext_out; |
| } |
| |
| - me.moved_len = 0; |
| + err = mnt_want_write(filp->f_path.mnt); |
| + if (err) |
| + goto mext_out; |
| + |
| err = ext4_move_extents(filp, donor_filp, me.orig_start, |
| me.donor_start, me.len, &me.moved_len); |
| - fput(donor_filp); |
| + mnt_drop_write(filp->f_path.mnt); |
| + if (me.moved_len > 0) |
| + file_remove_suid(donor_filp); |
| |
| if (copy_to_user((struct move_extent *)arg, &me, sizeof(me))) |
| - return -EFAULT; |
| - |
| + err = -EFAULT; |
| +mext_out: |
| + fput(donor_filp); |
| return err; |
| } |
| |
| --- a/fs/ext4/move_extent.c |
| +++ b/fs/ext4/move_extent.c |
| @@ -957,6 +957,13 @@ mext_check_arguments(struct inode *orig_ |
| return -EINVAL; |
| } |
| |
| + if (donor_inode->i_mode & (S_ISUID|S_ISGID)) { |
| + ext4_debug("ext4 move extent: suid or sgid is set" |
| + " to donor file [ino:orig %lu, donor %lu]\n", |
| + orig_inode->i_ino, donor_inode->i_ino); |
| + return -EINVAL; |
| + } |
| + |
| /* Ext4 move extent does not support swapfile */ |
| if (IS_SWAPFILE(orig_inode) || IS_SWAPFILE(donor_inode)) { |
| ext4_debug("ext4 move extent: The argument files should " |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:21 2009 |
| Message-Id: <20091211042820.904178854@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:26:04 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| James Bottomley <James.Bottomley@suse.de>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [86/90] SCSI: megaraid_sas: fix 64 bit sense pointer truncation |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=scsi-megaraid_sas-fix-64-bit-sense-pointer-truncation.patch |
| Content-Length: 1456 |
| Lines: 47 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| From: Yang, Bo <Bo.Yang@lsi.com> |
| |
| commit 7b2519afa1abd1b9f63aa1e90879307842422dae upstream. |
| |
| The current sense pointer is cast to a u32 pointer, which can truncate |
| on 64 bits. Fix by using unsigned long instead. |
| |
| Signed-off-by Bo Yang<bo.yang@lsi.com> |
| Signed-off-by: James Bottomley <James.Bottomley@suse.de> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| |
| --- |
| drivers/scsi/megaraid/megaraid_sas.c | 8 ++++---- |
| 1 file changed, 4 insertions(+), 4 deletions(-) |
| |
| --- a/drivers/scsi/megaraid/megaraid_sas.c |
| +++ b/drivers/scsi/megaraid/megaraid_sas.c |
| @@ -3032,7 +3032,7 @@ megasas_mgmt_fw_ioctl(struct megasas_ins |
| int error = 0, i; |
| void *sense = NULL; |
| dma_addr_t sense_handle; |
| - u32 *sense_ptr; |
| + unsigned long *sense_ptr; |
| |
| memset(kbuff_arr, 0, sizeof(kbuff_arr)); |
| |
| @@ -3109,7 +3109,7 @@ megasas_mgmt_fw_ioctl(struct megasas_ins |
| } |
| |
| sense_ptr = |
| - (u32 *) ((unsigned long)cmd->frame + ioc->sense_off); |
| + (unsigned long *) ((unsigned long)cmd->frame + ioc->sense_off); |
| *sense_ptr = sense_handle; |
| } |
| |
| @@ -3140,8 +3140,8 @@ megasas_mgmt_fw_ioctl(struct megasas_ins |
| * sense_ptr points to the location that has the user |
| * sense buffer address |
| */ |
| - sense_ptr = (u32 *) ((unsigned long)ioc->frame.raw + |
| - ioc->sense_off); |
| + sense_ptr = (unsigned long *) ((unsigned long)ioc->frame.raw + |
| + ioc->sense_off); |
| |
| if (copy_to_user((void __user *)((unsigned long)(*sense_ptr)), |
| sense, ioc->sense_len)) { |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:22 2009 |
| Message-Id: <20091211042821.450487768@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:26:05 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Martin Michlmayr <tbm@cyrius.com>, |
| Boaz Harrosh <bharrosh@panasas.com>, |
| James Bottomley <James.Bottomley@suse.de>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [87/90] SCSI: osd_protocol.h: Add missing #include |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=scsi-osd_protocol.h-add-missing-include.patch |
| Content-Length: 783 |
| Lines: 28 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| From: Martin Michlmayr <tbm@cyrius.com> |
| |
| commit 0899638688f223fd9e9fee60d662665e11693d12 upstream. |
| |
| include/scsi/osd_protocol.h uses ALIGN() without an #include |
| <linux/kernel.h>, leading to: |
| | include/scsi/osd_protocol.h:362: error: implicit declaration of function 'ALIGN' |
| |
| Signed-off-by: Martin Michlmayr <tbm@cyrius.com> |
| Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> |
| Signed-off-by: James Bottomley <James.Bottomley@suse.de> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| |
| --- |
| include/scsi/osd_protocol.h | 1 + |
| 1 file changed, 1 insertion(+) |
| |
| --- a/include/scsi/osd_protocol.h |
| +++ b/include/scsi/osd_protocol.h |
| @@ -17,6 +17,7 @@ |
| #define __OSD_PROTOCOL_H__ |
| |
| #include <linux/types.h> |
| +#include <linux/kernel.h> |
| #include <asm/unaligned.h> |
| #include <scsi/scsi.h> |
| |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:22 2009 |
| Message-Id: <20091211042822.204320413@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:26:06 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| James Smart <james.smart@emulex.com>, |
| James Bottomley <James.Bottomley@suse.de>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [88/90] SCSI: scsi_lib_dma: fix bug with dma maps on nested scsi objects |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=scsi-scsi_lib_dma-fix-bug-with-dma-maps-on-nested-scsi-objects.patch |
| Content-Length: 5210 |
| Lines: 149 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| From: James Bottomley <James.Bottomley@suse.de> |
| |
| commit d139b9bd0e52dda14fd13412e7096e68b56d0076 upstream. |
| |
| Some of our virtual SCSI hosts don't have a proper bus parent at the |
| top, which can be a problem for doing DMA on them |
| |
| This patch makes the host device cache a pointer to the physical bus |
| device and provides an extra API for setting it (the normal API picks |
| it up from the parent). This patch also modifies the qla2xxx and lpfc |
| vport logic to use the new DMA host setting API. |
| |
| Acked-By: James Smart <james.smart@emulex.com> |
| Signed-off-by: James Bottomley <James.Bottomley@suse.de> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| |
| --- |
| drivers/scsi/hosts.c | 13 ++++++++++--- |
| drivers/scsi/lpfc/lpfc_init.c | 2 +- |
| drivers/scsi/qla2xxx/qla_attr.c | 3 ++- |
| drivers/scsi/scsi_lib_dma.c | 4 ++-- |
| include/scsi/scsi_host.h | 16 +++++++++++++++- |
| 5 files changed, 30 insertions(+), 8 deletions(-) |
| |
| --- a/drivers/scsi/hosts.c |
| +++ b/drivers/scsi/hosts.c |
| @@ -180,14 +180,20 @@ void scsi_remove_host(struct Scsi_Host * |
| EXPORT_SYMBOL(scsi_remove_host); |
| |
| /** |
| - * scsi_add_host - add a scsi host |
| + * scsi_add_host_with_dma - add a scsi host with dma device |
| * @shost: scsi host pointer to add |
| * @dev: a struct device of type scsi class |
| + * @dma_dev: dma device for the host |
| + * |
| + * Note: You rarely need to worry about this unless you're in a |
| + * virtualised host environments, so use the simpler scsi_add_host() |
| + * function instead. |
| * |
| * Return value: |
| * 0 on success / != 0 for error |
| **/ |
| -int scsi_add_host(struct Scsi_Host *shost, struct device *dev) |
| +int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, |
| + struct device *dma_dev) |
| { |
| struct scsi_host_template *sht = shost->hostt; |
| int error = -EINVAL; |
| @@ -207,6 +213,7 @@ int scsi_add_host(struct Scsi_Host *shos |
| |
| if (!shost->shost_gendev.parent) |
| shost->shost_gendev.parent = dev ? dev : &platform_bus; |
| + shost->dma_dev = dma_dev; |
| |
| error = device_add(&shost->shost_gendev); |
| if (error) |
| @@ -262,7 +269,7 @@ int scsi_add_host(struct Scsi_Host *shos |
| fail: |
| return error; |
| } |
| -EXPORT_SYMBOL(scsi_add_host); |
| +EXPORT_SYMBOL(scsi_add_host_with_dma); |
| |
| static void scsi_host_dev_release(struct device *dev) |
| { |
| --- a/drivers/scsi/lpfc/lpfc_init.c |
| +++ b/drivers/scsi/lpfc/lpfc_init.c |
| @@ -2384,7 +2384,7 @@ lpfc_create_port(struct lpfc_hba *phba, |
| vport->els_tmofunc.function = lpfc_els_timeout; |
| vport->els_tmofunc.data = (unsigned long)vport; |
| |
| - error = scsi_add_host(shost, dev); |
| + error = scsi_add_host_with_dma(shost, dev, &phba->pcidev->dev); |
| if (error) |
| goto out_put_shost; |
| |
| --- a/drivers/scsi/qla2xxx/qla_attr.c |
| +++ b/drivers/scsi/qla2xxx/qla_attr.c |
| @@ -1654,7 +1654,8 @@ qla24xx_vport_create(struct fc_vport *fc |
| fc_vport_set_state(fc_vport, FC_VPORT_LINKDOWN); |
| } |
| |
| - if (scsi_add_host(vha->host, &fc_vport->dev)) { |
| + if (scsi_add_host_with_dma(vha->host, &fc_vport->dev, |
| + &ha->pdev->dev)) { |
| DEBUG15(printk("scsi(%ld): scsi_add_host failure for VP[%d].\n", |
| vha->host_no, vha->vp_idx)); |
| goto vport_create_failed_2; |
| --- a/drivers/scsi/scsi_lib_dma.c |
| +++ b/drivers/scsi/scsi_lib_dma.c |
| @@ -23,7 +23,7 @@ int scsi_dma_map(struct scsi_cmnd *cmd) |
| int nseg = 0; |
| |
| if (scsi_sg_count(cmd)) { |
| - struct device *dev = cmd->device->host->shost_gendev.parent; |
| + struct device *dev = cmd->device->host->dma_dev; |
| |
| nseg = dma_map_sg(dev, scsi_sglist(cmd), scsi_sg_count(cmd), |
| cmd->sc_data_direction); |
| @@ -41,7 +41,7 @@ EXPORT_SYMBOL(scsi_dma_map); |
| void scsi_dma_unmap(struct scsi_cmnd *cmd) |
| { |
| if (scsi_sg_count(cmd)) { |
| - struct device *dev = cmd->device->host->shost_gendev.parent; |
| + struct device *dev = cmd->device->host->dma_dev; |
| |
| dma_unmap_sg(dev, scsi_sglist(cmd), scsi_sg_count(cmd), |
| cmd->sc_data_direction); |
| --- a/include/scsi/scsi_host.h |
| +++ b/include/scsi/scsi_host.h |
| @@ -677,6 +677,12 @@ struct Scsi_Host { |
| void *shost_data; |
| |
| /* |
| + * Points to the physical bus device we'd use to do DMA |
| + * Needed just in case we have virtual hosts. |
| + */ |
| + struct device *dma_dev; |
| + |
| + /* |
| * We should ensure that this is aligned, both for better performance |
| * and also because some compilers (m68k) don't automatically force |
| * alignment to a long boundary. |
| @@ -720,7 +726,9 @@ extern int scsi_queue_work(struct Scsi_H |
| extern void scsi_flush_work(struct Scsi_Host *); |
| |
| extern struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *, int); |
| -extern int __must_check scsi_add_host(struct Scsi_Host *, struct device *); |
| +extern int __must_check scsi_add_host_with_dma(struct Scsi_Host *, |
| + struct device *, |
| + struct device *); |
| extern void scsi_scan_host(struct Scsi_Host *); |
| extern void scsi_rescan_device(struct device *); |
| extern void scsi_remove_host(struct Scsi_Host *); |
| @@ -731,6 +739,12 @@ extern const char *scsi_host_state_name( |
| |
| extern u64 scsi_calculate_bounce_limit(struct Scsi_Host *); |
| |
| +static inline int __must_check scsi_add_host(struct Scsi_Host *host, |
| + struct device *dev) |
| +{ |
| + return scsi_add_host_with_dma(host, dev, dev); |
| +} |
| + |
| static inline struct device *scsi_get_device(struct Scsi_Host *shost) |
| { |
| return shost->shost_gendev.parent; |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:23 2009 |
| Message-Id: <20091211042822.857844494@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:26:07 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| Sebastian Andrzej Siewior <sebastian@breakpoint.cc>, |
| Oleg Nesterov <oleg@redhat.com>, |
| Roland McGrath <roland@redhat.com>, |
| Kyle McMartin <kyle@mcmartin.ca>, |
| Thomas Gleixner <tglx@linutronix.de>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [89/90] signal: Fix alternate signal stack check |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=signal-fix-alternate-signal-stack-check.patch |
| Content-Length: 2919 |
| Lines: 83 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| From: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> |
| |
| commit 2a855dd01bc1539111adb7233f587c5c468732ac upstream. |
| |
| All architectures in the kernel increment/decrement the stack pointer |
| before storing values on the stack. |
| |
| On architectures which have the stack grow down sas_ss_sp == sp is not |
| on the alternate signal stack while sas_ss_sp + sas_ss_size == sp is |
| on the alternate signal stack. |
| |
| On architectures which have the stack grow up sas_ss_sp == sp is on |
| the alternate signal stack while sas_ss_sp + sas_ss_size == sp is not |
| on the alternate signal stack. |
| |
| The current implementation fails for architectures which have the |
| stack grow down on the corner case where sas_ss_sp == sp.This was |
| reported as Debian bug #544905 on AMD64. |
| Simplified test case: http://download.breakpoint.cc/tc-sig-stack.c |
| |
| The test case creates the following stack scenario: |
| 0xn0300 stack top |
| 0xn0200 alt stack pointer top (when switching to alt stack) |
| 0xn01ff alt stack end |
| 0xn0100 alt stack start == stack pointer |
| |
| If the signal is sent the stack pointer is pointing to the base |
| address of the alt stack and the kernel erroneously decides that it |
| has already switched to the alternate stack because of the current |
| check for "sp - sas_ss_sp < sas_ss_size" |
| |
| On parisc (stack grows up) the scenario would be: |
| 0xn0200 stack pointer |
| 0xn01ff alt stack end |
| 0xn0100 alt stack start = alt stack pointer base |
| (when switching to alt stack) |
| 0xn0000 stack base |
| |
| This is handled correctly by the current implementation. |
| |
| [ tglx: Modified for archs which have the stack grow up (parisc) which |
| would fail with the correct implementation for stack grows |
| down. Added a check for sp >= current->sas_ss_sp which is |
| strictly not necessary but makes the code symetric for both |
| variants ] |
| |
| Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> |
| Cc: Oleg Nesterov <oleg@redhat.com> |
| Cc: Roland McGrath <roland@redhat.com> |
| Cc: Kyle McMartin <kyle@mcmartin.ca> |
| LKML-Reference: <20091025143758.GA6653@Chamillionaire.breakpoint.cc> |
| Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| |
| --- |
| include/linux/sched.h | 13 ++++++++++--- |
| 1 file changed, 10 insertions(+), 3 deletions(-) |
| |
| --- a/include/linux/sched.h |
| +++ b/include/linux/sched.h |
| @@ -1999,11 +1999,18 @@ static inline int is_si_special(const st |
| return info <= SEND_SIG_FORCED; |
| } |
| |
| -/* True if we are on the alternate signal stack. */ |
| - |
| +/* |
| + * True if we are on the alternate signal stack. |
| + */ |
| static inline int on_sig_stack(unsigned long sp) |
| { |
| - return (sp - current->sas_ss_sp < current->sas_ss_size); |
| +#ifdef CONFIG_STACK_GROWSUP |
| + return sp >= current->sas_ss_sp && |
| + sp - current->sas_ss_sp < current->sas_ss_size; |
| +#else |
| + return sp > current->sas_ss_sp && |
| + sp - current->sas_ss_sp <= current->sas_ss_size; |
| +#endif |
| } |
| |
| static inline int sas_ss_flags(unsigned long sp) |
| |
| |
| From linux@linux.site Thu Dec 10 20:28:24 2009 |
| Message-Id: <20091211042823.450484017@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:26:08 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk, |
| "Theodore Tso" <tytso@mit.edu>, |
| Greg Kroah-Hartman <gregkh@suse.de> |
| Subject: [90/90] ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem) |
| References: <20091211042438.970725457@linux.site> |
| Content-Disposition: inline; filename=ext4-fix-potential-fiemap-deadlock-mmap_sem-vs.-i_data_sem.patch |
| Content-Length: 5029 |
| Lines: 115 |
| |
| 2.6.31-stable review patch. If anyone has any objections, please let us know. |
| |
| ------------------ |
| (cherry picked from commit fab3a549e204172236779f502eccb4f9bf0dc87d) |
| |
| Fix the following potential circular locking dependency between |
| mm->mmap_sem and ei->i_data_sem: |
| |
| ======================================================= |
| [ INFO: possible circular locking dependency detected ] |
| 2.6.32-04115-gec044c5 #37 |
| ------------------------------------------------------- |
| ureadahead/1855 is trying to acquire lock: |
| (&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac |
| |
| but task is already holding lock: |
| (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 |
| |
| which lock already depends on the new lock. |
| |
| the existing dependency chain (in reverse order) is: |
| |
| -> #1 (&ei->i_data_sem){++++..}: |
| [<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f |
| [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 |
| [<ffffffff81516633>] down_read+0x51/0x84 |
| [<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5 |
| [<ffffffff811a3453>] ext4_get_block+0xab/0xef |
| [<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d |
| [<ffffffff81155360>] mpage_readpages+0xd0/0x114 |
| [<ffffffff811a104b>] ext4_readpages+0x1d/0x1f |
| [<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc |
| [<ffffffff810f86f2>] ra_submit+0x21/0x25 |
| [<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c |
| [<ffffffff81107b97>] __do_fault+0x55/0x3a2 |
| [<ffffffff81109db0>] handle_mm_fault+0x327/0x734 |
| [<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa |
| [<ffffffff81518205>] page_fault+0x25/0x30 |
| [<ffffffff812a34d8>] clear_user+0x38/0x3c |
| [<ffffffff81167e16>] padzero+0x20/0x31 |
| [<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed |
| [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 |
| [<ffffffff81166d64>] load_script+0x1b8/0x1cc |
| [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 |
| [<ffffffff8113255f>] do_execve+0x1ce/0x2cf |
| [<ffffffff81027494>] sys_execve+0x43/0x5a |
| [<ffffffff8102918a>] stub_execve+0x6a/0xc0 |
| |
| -> #0 (&mm->mmap_sem){++++++}: |
| [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f |
| [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 |
| [<ffffffff81107251>] might_fault+0x89/0xac |
| [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda |
| [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 |
| [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 |
| [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 |
| [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 |
| [<ffffffff811392ca>] sys_ioctl+0x56/0x79 |
| [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b |
| |
| other info that might help us debug this: |
| |
| 1 lock held by ureadahead/1855: |
| #0: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 |
| |
| stack backtrace: |
| Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37 |
| Call Trace: |
| [<ffffffff81098c70>] print_circular_bug+0xa8/0xb7 |
| [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f |
| [<ffffffff8102f229>] ? sched_clock+0x9/0xd |
| [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 |
| [<ffffffff81107224>] ? might_fault+0x5c/0xac |
| [<ffffffff81107251>] might_fault+0x89/0xac |
| [<ffffffff81107224>] ? might_fault+0x5c/0xac |
| [<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c |
| [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda |
| [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 |
| [<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157 |
| [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 |
| [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 |
| [<ffffffff81107224>] ? might_fault+0x5c/0xac |
| [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 |
| [<ffffffff8129f6d0>] ? __up_read+0x8d/0x95 |
| [<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b |
| [<ffffffff811392ca>] sys_ioctl+0x56/0x79 |
| [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b |
| |
| Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| fs/ext4/extents.c | 4 ++-- |
| 1 file changed, 2 insertions(+), 2 deletions(-) |
| |
| --- a/fs/ext4/extents.c |
| +++ b/fs/ext4/extents.c |
| @@ -1742,7 +1742,9 @@ int ext4_ext_walk_space(struct inode *in |
| while (block < last && block != EXT_MAX_BLOCK) { |
| num = last - block; |
| /* find extent for this block */ |
| + down_read(&EXT4_I(inode)->i_data_sem); |
| path = ext4_ext_find_extent(inode, block, path); |
| + up_read(&EXT4_I(inode)->i_data_sem); |
| if (IS_ERR(path)) { |
| err = PTR_ERR(path); |
| path = NULL; |
| @@ -3707,10 +3709,8 @@ int ext4_fiemap(struct inode *inode, str |
| * Walk the extent tree gathering extent information. |
| * ext4_ext_fiemap_cb will push extents back to user. |
| */ |
| - down_read(&EXT4_I(inode)->i_data_sem); |
| error = ext4_ext_walk_space(inode, start_blk, len_blks, |
| ext4_ext_fiemap_cb, fieinfo); |
| - up_read(&EXT4_I(inode)->i_data_sem); |
| } |
| |
| return error; |
| |
| |
| From linux@linux.site Thu Dec 10 20:27:24 2009 |
| Message-Id: <20091211042438.970725457@linux.site> |
| User-Agent: quilt/0.47-14.9 |
| Date: Thu, 10 Dec 2009 20:24:38 -0800 |
| From: Greg KH <gregkh@suse.de> |
| To: linux-kernel@vger.kernel.org, |
| stable@kernel.org |
| Cc: stable-review@kernel.org, |
| torvalds@linux-foundation.org, |
| akpm@linux-foundation.org, |
| alan@lxorguk.ukuu.org.uk |
| Subject: [00/90] 2.6.31.8-stable review |
| Content-Length: 2517 |
| Lines: 53 |
| |
| This is the start of the stable review cycle for the 2.6.31.8 release. |
| There are 90 patches in this series, all will be posted as a response |
| to this one. If anyone has any issues with these being applied, please |
| let us know. If anyone is a maintainer of the proper subsystem, and |
| wants to add a Signed-off-by: line to the patch, please respond with it. |
| |
| Yes, there are still more patches to be queued up for the .31-stable |
| tree, but as I just queued up 86 ext4 patches, I figured I would add 4 |
| more to make it a nice even 90 and push it out for everyone to enjoy |
| while I work on getting the rest out after this. |
| |
| Responses should be made by Sunday, Dec 13 04:00:00 UTC 2009 |
| Anything received after that time might be too late. |
| |
| The whole patch series can be found in one patch at: |
| kernel.org/pub/linux/kernel/v2.6/stable-review/patch-2.6.31.8-rc1.gz |
| and the diffstat can be found below. |
| |
| thanks, |
| |
| greg k-h |
| |
| Documentation/filesystems/ext4.txt | 10 +- |
| drivers/scsi/hosts.c | 13 +- |
| drivers/scsi/lpfc/lpfc_init.c | 2 +- |
| drivers/scsi/megaraid/megaraid_sas.c | 8 +- |
| drivers/scsi/qla2xxx/qla_attr.c | 3 +- |
| drivers/scsi/scsi_lib_dma.c | 4 +- |
| fs/ext4/balloc.c | 8 +- |
| fs/ext4/block_validity.c | 2 +- |
| fs/ext4/ext4.h | 105 +++++- |
| fs/ext4/ext4_extents.h | 7 +- |
| fs/ext4/ext4_jbd2.c | 9 +- |
| fs/ext4/ext4_jbd2.h | 27 ++- |
| fs/ext4/extents.c | 493 +++++++++++++++++++++--- |
| fs/ext4/fsync.c | 54 ++-- |
| fs/ext4/inode.c | 705 +++++++++++++++++++++++++++++----- |
| fs/ext4/ioctl.c | 32 +- |
| fs/ext4/mballoc.c | 322 ++++++++-------- |
| fs/ext4/migrate.c | 28 +- |
| fs/ext4/move_extent.c | 572 ++++++++++++++++------------ |
| fs/ext4/namei.c | 47 +-- |
| fs/ext4/resize.c | 2 +- |
| fs/ext4/super.c | 239 ++++++++---- |
| fs/ext4/xattr.c | 22 +- |
| fs/jbd2/commit.c | 4 + |
| fs/jbd2/journal.c | 11 + |
| fs/jbd2/transaction.c | 7 +- |
| include/linux/sched.h | 13 +- |
| include/scsi/osd_protocol.h | 1 + |
| include/scsi/scsi_host.h | 16 +- |
| include/trace/events/ext4.h | 60 +++- |
| 30 files changed, 2082 insertions(+), 744 deletions(-) |
| |