tux3: Inode number extrapolation heuristics
The idea is to make directory order similar to inode tree order,
so traversing a directory in physical order accesses the inode
table roughly in linear order.
On create, the inode number goal is chosen by extrapolating from
directory entry position in the directory file. This will tend to
preserve linear order even when a directory entry for a new inode
is created in a position previously emptied by a delete. The
exact correspondence is affected by name length, so that longer
names will create bigger gaps in the inode table. We want to
leave some gaps in the inode table anyway, so this is unlikely to
be a serious problem. Name length will not affect the relative
order of inodes.
There is some attempt to cluster non-directory inodes tightly
together, using a global next inode variable, set to one more
than the last chosen inode number. This is used as the inode
number goal, unless it is far away from the extrapolated goal,
in which case the latter is preferred
These heuristics do not include sensitivity to congestion and do
not attempt to spread directories out in a young volume. Actual
file size and directory population may be far different from the
assumptions used in the extrapolation. This may cause excessive
searching, more fragmentation and less temporal correlation of
block updates. These deficiencies will be addressed later.
The base for extrapolation is the parent inode. This may turn
out to be a bad choice due to congestion in an aged volume. It
is planned to be able to override that initial choice by a goal
attribute.
The usage sb->nextinum variable is racy and should be addressed
but is unlikely to cause problems because of the distance check
above.
Signed-off-by: Daniel Phillips <d.phillips@partner.samsung.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
diff --git a/fs/tux3/dir.c b/fs/tux3/dir.c
index f69fe56..1c0cfb3 100644
--- a/fs/tux3/dir.c
+++ b/fs/tux3/dir.c
@@ -25,6 +25,10 @@
#include "tux3.h"
#include "kcompat.h"
+#ifndef trace
+#define trace trace_off
+#endif
+
#define TUX_DIR_ALIGN sizeof(inum_t)
#define TUX_DIR_HEAD (offsetof(tux_dirent, name))
#define TUX_REC_LEN(name_len) ALIGN((name_len) + TUX_DIR_HEAD, TUX_DIR_ALIGN)
@@ -203,6 +207,8 @@
{
struct sb *sb = tux_sb(dir->i_sb);
inum_t inum = tux_inode(inode)->inum;
+ const char *name = (const char *)qstr->name;
+ unsigned len = qstr->len;
struct buffer_head *buffer;
tux_dirent *entry;
loff_t i_size, where;
@@ -210,14 +216,26 @@
/* Holding dir->i_mutex, so no i_size_read() */
i_size = dir->i_size;
- where = tux_alloc_entry(dir, (const char *)qstr->name, qstr->len,
- &i_size, &buffer);
+ where = tux_alloc_entry(dir, name, len, &i_size, &buffer);
if (where < 0)
return where;
entry = bufdata(buffer) + (where & sb->blockmask);
if (inum == TUX_INVALID_INO) {
- err = tux_assign_inum(inode, sb->nextinum);
+ enum { guess_filesize = 1 << 13, guess_dirsize = 50 * guess_filesize };
+ enum { guess_dirent_size = 24, cluster = 32 };
+ enum { file_factor = guess_filesize / guess_dirent_size };
+ enum { dir_factor = guess_dirsize / guess_dirent_size };
+
+ int is_dir = S_ISDIR(inode->i_mode);
+ unsigned factor = is_dir ? dir_factor : file_factor;
+ inum_t next = sb->nextinum; /* FIXME: racy */
+ inum_t base = max(tux_inode(dir)->inum + 1, (inum_t)TUX_NORMAL_INO);
+ inum_t guess = base + ((factor * where) >> sb->blockbits);
+ inum_t goal = (is_dir || abs64(next - guess) > cluster) ? guess : next;
+ trace("'%.*s' base = 0x%Lx, guess = 0x%Lx, goal = 0x%Lx", len, name, base, guess, goal);
+
+ err = tux_assign_inum(inode, goal);
if (err)
goto error;
inum = tux_inode(inode)->inum;