| From foo@baz Mon Apr 9 17:09:24 CEST 2018 |
| From: Tang Junhui <tang.junhui@zte.com.cn> |
| Date: Mon, 8 Jan 2018 12:21:21 -0800 |
| Subject: bcache: segregate flash only volume write streams |
| |
| From: Tang Junhui <tang.junhui@zte.com.cn> |
| |
| |
| [ Upstream commit 4eca1cb28d8b0574ca4f1f48e9331c5f852d43b9 ] |
| |
| In such scenario that there are some flash only volumes |
| , and some cached devices, when many tasks request these devices in |
| writeback mode, the write IOs may fall to the same bucket as bellow: |
| | cached data | flash data | cached data | cached data| flash data| |
| then after writeback of these cached devices, the bucket would |
| be like bellow bucket: |
| | free | flash data | free | free | flash data | |
| |
| So, there are many free space in this bucket, but since data of flash |
| only volumes still exists, so this bucket cannot be reclaimable, |
| which would cause waste of bucket space. |
| |
| In this patch, we segregate flash only volume write streams from |
| cached devices, so data from flash only volumes and cached devices |
| can store in different buckets. |
| |
| Compare to v1 patch, this patch do not add a additionally open bucket |
| list, and it is try best to segregate flash only volume write streams |
| from cached devices, sectors of flash only volumes may still be mixed |
| with dirty sectors of cached device, but the number is very small. |
| |
| [mlyle: fixed commit log formatting, permissions, line endings] |
| |
| Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> |
| Reviewed-by: Michael Lyle <mlyle@lyle.org> |
| Signed-off-by: Michael Lyle <mlyle@lyle.org> |
| Signed-off-by: Jens Axboe <axboe@kernel.dk> |
| Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| --- |
| drivers/md/bcache/alloc.c | 19 ++++++++++++++----- |
| 1 file changed, 14 insertions(+), 5 deletions(-) |
| |
| --- a/drivers/md/bcache/alloc.c |
| +++ b/drivers/md/bcache/alloc.c |
| @@ -512,15 +512,21 @@ struct open_bucket { |
| |
| /* |
| * We keep multiple buckets open for writes, and try to segregate different |
| - * write streams for better cache utilization: first we look for a bucket where |
| - * the last write to it was sequential with the current write, and failing that |
| - * we look for a bucket that was last used by the same task. |
| + * write streams for better cache utilization: first we try to segregate flash |
| + * only volume write streams from cached devices, secondly we look for a bucket |
| + * where the last write to it was sequential with the current write, and |
| + * failing that we look for a bucket that was last used by the same task. |
| * |
| * The ideas is if you've got multiple tasks pulling data into the cache at the |
| * same time, you'll get better cache utilization if you try to segregate their |
| * data and preserve locality. |
| * |
| - * For example, say you've starting Firefox at the same time you're copying a |
| + * For example, dirty sectors of flash only volume is not reclaimable, if their |
| + * dirty sectors mixed with dirty sectors of cached device, such buckets will |
| + * be marked as dirty and won't be reclaimed, though the dirty data of cached |
| + * device have been written back to backend device. |
| + * |
| + * And say you've starting Firefox at the same time you're copying a |
| * bunch of files. Firefox will likely end up being fairly hot and stay in the |
| * cache awhile, but the data you copied might not be; if you wrote all that |
| * data to the same buckets it'd get invalidated at the same time. |
| @@ -537,7 +543,10 @@ static struct open_bucket *pick_data_buc |
| struct open_bucket *ret, *ret_task = NULL; |
| |
| list_for_each_entry_reverse(ret, &c->data_buckets, list) |
| - if (!bkey_cmp(&ret->key, search)) |
| + if (UUID_FLASH_ONLY(&c->uuids[KEY_INODE(&ret->key)]) != |
| + UUID_FLASH_ONLY(&c->uuids[KEY_INODE(search)])) |
| + continue; |
| + else if (!bkey_cmp(&ret->key, search)) |
| goto found; |
| else if (ret->last_write_point == write_point) |
| ret_task = ret; |