)]}'
{
  "commit": "d5efdcc2b03f0a76553b77fde57c675fa5f03872",
  "tree": "cfb2b7eeccd921301e2df8ddb6336dc681409186",
  "parents": [
    "3952719ec60626ede42ddc3e93f3537e5e3c96c0"
  ],
  "author": {
    "name": "Dave Chinner",
    "email": "dchinner@redhat.com",
    "time": "Wed Nov 27 11:36:08 2024 +1100"
  },
  "committer": {
    "name": "Dave Chinner",
    "email": "david@fromorbit.com",
    "time": "Wed Nov 27 11:36:08 2024 +1100"
  },
  "message": "fstests: check-parallel\n\nRuns tests in parallel runner threads. Each runner thread has it\u0027s\nown set of tests to run, and runs a separate instance of check\nto run those tests.\n\ncheck-parallel sets up loop devices, mount points, results\ndirectories, etc for each instance and divides the tests up between\nthe runner threads.\n\nIt currently hard codes the XFS and generic test lists, and then\ngives each check invocation an explicit list of tests to run. It\nalso passes through exclusions so that test exclude filtering is\nstill done by check.\n\nThis is far from ideal, but I didn\u0027t want to have to embark on a\nmajor refactoring of check to be able to run stuff in parallel.\nIt was quite the challenge just to get all the tests and test\ninfrastructure up to the point where they can run reliably in\nparallel.\n\nHence I\u0027ve left the actual factoring of test selection and setup\nout of the patchset for the moment. The plan is to factor both the\ntest setup and the test list runner loop out of check and share them\nbetween check and check-parallel, hence not requiring check-parallel\nto run check directly. That is future work, however. \n\nWith the current test runner setup, it is not uncommon to see \u003e5000%\ncpu usage, 150-200kiops and 4-5GB/s of disk bandwidth being used\nwhen running 64 runners. This is a serious stress load as it is\nconstantly mounting and unmounting dozens of filesystems, creating\nand destroying devices, dropping caches, running sync, running CPU\nhot plug, running page cache migration, etc.\n\nThe massive amount of IO that load generates causes qemu hosts to\nabort (i.e. crash) because they run out of vm map segments. Hence\nbumping up the max_map_count on the host like so:\n\necho 1048576 \u003e /proc/sys/vm/max_map_count\n\nis necessary.\n\nThere is no significant memory pressure to speak of from running the\ntests like this. I\u0027ve seen a maximum of about 50GB of RAM used when\nrunning tests like this, so running on a 64p/64GB VM the additional\nconcurrency doesn\u0027t really stress memory capacity like it does CPU\nand IO.\n\nAll the runners are executed in private mount namespaces. This is\nto prevent ephemeral mount namespace clones from taking a reference\nto every mounted filesystem in the machine and so causing random\n\"device busy after unmount\" failures in the tests that are running\nconcurrently with the mount namespace setup and teardown.\n\nA typical `pstree -N mnt` looks like:\n\n$ pstree -N mnt\n[4026531841]\nbash\nbash───pstree\n[0]\nsudo───sudo───check-parallel─┬─check-parallel───nsexec───check───311─┬─cut\n                             │                                       └─md5sum\n                             ├─check-parallel───nsexec───check───750─┬─750───sleep\n                             │                                       └─750.fsstress───4*[750.fsstress───{750.fsstress}]\n                             ├─check-parallel───nsexec───check───013───013───sed\n                             ├─check-parallel───nsexec───check───251───cp\n                             ├─check-parallel───nsexec───check───467───open_by_handle\n                             ├─check-parallel───nsexec───check───650─┬─650───sleep\n                             │                                       └─650.fsstress─┬─61*[650.fsstress───{650.fsstress}]\n                             │                                                      └─2*[650.fsstress]\n                             ├─check-parallel───nsexec───check───707\n                             ├─check-parallel───nsexec───check───705\n                             ├─check-parallel───nsexec───check───416\n                             ├─check-parallel───nsexec───check───477───2*[open_by_handle]\n                             ├─check-parallel───nsexec───check───140───140\n                             ├─check-parallel───nsexec───check───562\n                             ├─check-parallel───nsexec───check───415───xfs_io───{xfs_io}\n                             ├─check-parallel───nsexec───check───291\n                             ├─check-parallel───nsexec───check───017\n                             ├─check-parallel───nsexec───check───016\n                             ├─check-parallel───nsexec───check───168───2*[168───168]\n                             ├─check-parallel───nsexec───check───672───2*[672───672]\n                             ├─check-parallel───nsexec───check───170─┬─170───170───170\n                             │                                       └─170───170\n                             ├─check-parallel───nsexec───check───531───122*[t_open_tmpfiles]\n                             ├─check-parallel───nsexec───check───387\n                             ├─check-parallel───nsexec───check───748\n                             ├─check-parallel───nsexec───check───388─┬─388.fsstress───4*[388.fsstress───{388.fsstress}]\n                             │                                       └─sleep\n                             ├─check-parallel───nsexec───check───328───328\n                             ├─check-parallel───nsexec───check───352\n                             ├─check-parallel───nsexec───check───042\n                             ├─check-parallel───nsexec───check───426───open_by_handle\n                             ├─check-parallel───nsexec───check───756───2*[open_by_handle]\n                             ├─check-parallel───nsexec───check───227\n                             ├─check-parallel───nsexec───check───208───aio-dio-invalid───2*[aio-dio-invalid]\n                             ├─check-parallel───nsexec───check───746───cp\n                             ├─check-parallel───nsexec───check───187───187\n                             ├─check-parallel───nsexec───check───027───8*[027]\n                             ├─check-parallel───nsexec───check───045───xfs_io───{xfs_io}\n                             ├─check-parallel───nsexec───check───044\n                             ├─check-parallel───nsexec───check───204\n                             ├─check-parallel───nsexec───check───186───186\n                             ├─check-parallel───nsexec───check───449\n                             ├─check-parallel───nsexec───check───231───su───fsx\n                             ├─check-parallel───nsexec───check───509\n                             ├─check-parallel───nsexec───check───127───5*[127───fsx]\n                             ├─check-parallel───nsexec───check───047\n                             ├─check-parallel───nsexec───check───043\n                             ├─check-parallel───nsexec───check───475───pkill\n                             ├─check-parallel───nsexec───check───299─┬─fio─┬─4*[fio]\n                             │                                       │     ├─2*[fio───4*[{fio}]]\n                             │                                       │     └─{fio}\n                             │                                       └─pgrep\n                             ├─check-parallel───nsexec───check───551───aio-dio-write-v\n                             ├─check-parallel───nsexec───check───323───aio-last-ref-he───100*[{aio-last-ref-he}]\n                             ├─check-parallel───nsexec───check───648───sleep\n                             ├─check-parallel───nsexec───check───046\n                             ├─check-parallel───nsexec───check───753─┬─753.fsstress───4*[753.fsstress]\n                             │                                       └─pkill\n                             ├─check-parallel───nsexec───check───507───507\n                             ├─check-parallel───nsexec───check───629─┬─3*[629───xfs_io───{xfs_io}]\n                             │                                       └─5*[629]\n                             ├─check-parallel───nsexec───check───073───umount\n                             ├─check-parallel───nsexec───check───615───615\n                             ├─check-parallel───nsexec───check───176───punch-alternati\n                             ├─check-parallel───nsexec───check───294\n                             ├─check-parallel───nsexec───check───236───236\n                             ├─check-parallel───nsexec───check───165─┬─165─┬─165─┬─cut\n                             │                                       │     │     └─xfs_io───{xfs_io}\n                             │                                       │     └─165───grep\n                             │                                       └─165\n                             ├─check-parallel───nsexec───check───259───sync\n                             ├─check-parallel───nsexec───check───442───442.fsstress───4*[442.fsstress───{442.fsstress}]\n                             ├─check-parallel───nsexec───check───558───255*[558]\n                             ├─check-parallel───nsexec───check───358───358───358\n                             ├─check-parallel───nsexec───check───169───169\n                             └─check-parallel───nsexec───check───297─┬─297.fsstress─┬─284*[297.fsstress───{297.fsstress}]\n                                                                     │              └─716*[297.fsstress]\n                                                                     └─sleep\n\nA typical test run looks like:\n\n$ time sudo ./check-parallel /mnt/xfs -s xfs -x dump\nRunner 63 Failures:  xfs/170\nRunner 36 Failures:  xfs/050\nRunner 30 Failures:  xfs/273\nRunner 29 Failures:  generic/135\nRunner 25 Failures:  generic/603\nTests run: 1140\nFailure count: 5\n\nTen slowest tests - runtime in seconds:\nxfs/013 454\ngeneric/707 414\ngeneric/017 398\ngeneric/387 395\ngeneric/748 390\nxfs/140 351\ngeneric/562 351\ngeneric/705 347\ngeneric/251 344\nxfs/016 343\n\nCleanup on Aisle 5?\n\ntotal 0\ncrw-------. 1 root root 10, 236 Nov 27 09:27 control\nlrwxrwxrwx. 1 root root       7 Nov 27 09:27 fast -\u003e ../dm-0\n/dev/mapper/fast  1.4T  192G  1.2T  14% /mnt/xfs\n\nreal    9m29.056s\nuser    0m0.005s\nsys     0m0.022s\n$\n\nYeah, that runtime is real - under 10 minutes for a full XFS auto\ngroup test run. When running this normally (i.e. via check) on this\nmachine, it usually takes just under 4 hours to run the same set\nof tests. i.e. I can run ./check-parallel roughly 25x times on this\nmachine in the same time it takes to run ./check.\n\nSigned-off-by: Dave Chinner \u003cdchinner@redhat.com\u003e\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "8131f4e2ee1614f567bd9c841e0748aaeb7c303b",
      "old_mode": 33261,
      "old_path": "check",
      "new_id": "607d2456e6a1fef8a4179660ef749ed735ce064c",
      "new_mode": 33261,
      "new_path": "check"
    },
    {
      "type": "add",
      "old_id": "0000000000000000000000000000000000000000",
      "old_mode": 0,
      "old_path": "/dev/null",
      "new_id": "c85437252a532054457431bdad9142060b683142",
      "new_mode": 33261,
      "new_path": "check-parallel"
    }
  ]
}
