f2fs_io: fix to keep output order of do_read() for forward compatibility

Some scripts relies on output order of do_read(), let's append the
new logs to keep forward compatibility.

e.g.
f2fs_io read 128 0 $((2*1024)) buffered 1 0 /mnt/f2fs/file

Before:
Read 1073741824 bytes IO time = 153715 us mlock time = 0 us, BW = 6985 MB/s print 0 bytes:
00000000 :

After:
Read 1073741824 bytes total_time = 155166 us, BW = 6920 MB/s, IO time = 155166 us, mlock time = 0 us, print 0 bytes:
00000000 :

Fixes: 615036f ("f2fs_io: calculate IO bandwidth vs. mlock latency")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff --git a/tools/f2fs_io/f2fs_io.c b/tools/f2fs_io/f2fs_io.c
index 231da47..326b3bd 100644
--- a/tools/f2fs_io/f2fs_io.c
+++ b/tools/f2fs_io/f2fs_io.c
@@ -1065,10 +1065,13 @@
 		}
 		io_time_end = get_current_us();
 	}
-	printf("Read %"PRIu64" bytes IO time = %"PRIu64" us mlock time = %"PRIu64" us, BW = %.Lf MB/s print %u bytes:\n",
-		read_cnt, io_time_end - io_time_start,
+	printf("Read %"PRIu64" bytes total_time = %"PRIu64" us, BW = %.Lf MB/s, "
+		"IO time = %"PRIu64" us, mlock time = %"PRIu64" us, print %u bytes:\n",
+		read_cnt, get_current_us() - io_time_start,
+		((long double)read_cnt / (get_current_us() - io_time_start)),
+		io_time_end - io_time_start,
 		mlock_time_end - mlock_time_start,
-		((long double)read_cnt / (io_time_end - io_time_start)), print_bytes);
+		print_bytes);
 	printf("%08"PRIx64" : ", offset);
 	for (i = 1; i <= print_bytes; i++) {
 		printf("%02x", print_buf[i - 1]);