| ======== | 
 | zsmalloc | 
 | ======== | 
 |  | 
 | This allocator is designed for use with zram. Thus, the allocator is | 
 | supposed to work well under low memory conditions. In particular, it | 
 | never attempts higher order page allocation which is very likely to | 
 | fail under memory pressure. On the other hand, if we just use single | 
 | (0-order) pages, it would suffer from very high fragmentation -- | 
 | any object of size PAGE_SIZE/2 or larger would occupy an entire page. | 
 | This was one of the major issues with its predecessor (xvmalloc). | 
 |  | 
 | To overcome these issues, zsmalloc allocates a bunch of 0-order pages | 
 | and links them together using various 'struct page' fields. These linked | 
 | pages act as a single higher-order page i.e. an object can span 0-order | 
 | page boundaries. The code refers to these linked pages as a single entity | 
 | called zspage. | 
 |  | 
 | For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE | 
 | since this satisfies the requirements of all its current users (in the | 
 | worst case, page is incompressible and is thus stored "as-is" i.e. in | 
 | uncompressed form). For allocation requests larger than this size, failure | 
 | is returned (see zs_malloc). | 
 |  | 
 | Additionally, zs_malloc() does not return a dereferenceable pointer. | 
 | Instead, it returns an opaque handle (unsigned long) which encodes actual | 
 | location of the allocated object. The reason for this indirection is that | 
 | zsmalloc does not keep zspages permanently mapped since that would cause | 
 | issues on 32-bit systems where the VA region for kernel space mappings | 
 | is very small. So, using the allocated memory should be done through the | 
 | proper handle-based APIs. | 
 |  | 
 | stat | 
 | ==== | 
 |  | 
 | With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via | 
 | ``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output:: | 
 |  | 
 |  # cat /sys/kernel/debug/zsmalloc/zram0/classes | 
 |  | 
 |  class  size       10%       20%       30%       40%       50%       60%       70%       80%       90%       99%      100% obj_allocated   obj_used pages_used pages_per_zspage freeable | 
 |     ... | 
 |     ... | 
 |     30   512         0        12         4         1         0         1         0         0         1         0       414          3464       3346        433                1       14 | 
 |     31   528         2         7         2         2         1         0         1         0         0         2       117          4154       3793        536                4       44 | 
 |     32   544         6         3         4         1         2         1         0         0         0         1       260          4170       3965        556                2       26 | 
 |     ... | 
 |     ... | 
 |  | 
 |  | 
 | class | 
 | 	index | 
 | size | 
 | 	object size zspage stores | 
 | 10% | 
 | 	the number of zspages with usage ratio less than 10% (see below) | 
 | 20% | 
 | 	the number of zspages with usage ratio between 10% and 20% | 
 | 30% | 
 | 	the number of zspages with usage ratio between 20% and 30% | 
 | 40% | 
 | 	the number of zspages with usage ratio between 30% and 40% | 
 | 50% | 
 | 	the number of zspages with usage ratio between 40% and 50% | 
 | 60% | 
 | 	the number of zspages with usage ratio between 50% and 60% | 
 | 70% | 
 | 	the number of zspages with usage ratio between 60% and 70% | 
 | 80% | 
 | 	the number of zspages with usage ratio between 70% and 80% | 
 | 90% | 
 | 	the number of zspages with usage ratio between 80% and 90% | 
 | 99% | 
 | 	the number of zspages with usage ratio between 90% and 99% | 
 | 100% | 
 | 	the number of zspages with usage ratio 100% | 
 | obj_allocated | 
 | 	the number of objects allocated | 
 | obj_used | 
 | 	the number of objects allocated to the user | 
 | pages_used | 
 | 	the number of pages allocated for the class | 
 | pages_per_zspage | 
 | 	the number of 0-order pages to make a zspage | 
 | freeable | 
 | 	the approximate number of pages class compaction can free | 
 |  | 
 | Each zspage maintains inuse counter which keeps track of the number of | 
 | objects stored in the zspage.  The inuse counter determines the zspage's | 
 | "fullness group" which is calculated as the ratio of the "inuse" objects to | 
 | the total number of objects the zspage can hold (objs_per_zspage). The | 
 | closer the inuse counter is to objs_per_zspage, the better. | 
 |  | 
 | Internals | 
 | ========= | 
 |  | 
 | zsmalloc has 255 size classes, each of which can hold a number of zspages. | 
 | Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages. | 
 | The optimal zspage chain size for each size class is calculated during the | 
 | creation of the zsmalloc pool (see calculate_zspage_chain_size()). | 
 |  | 
 | As an optimization, zsmalloc merges size classes that have similar | 
 | characteristics in terms of the number of pages per zspage and the number | 
 | of objects that each zspage can store. | 
 |  | 
 | For instance, consider the following size classes::: | 
 |  | 
 |   class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable | 
 |   ... | 
 |      94  1536        0    ....       0             0          0          0                3        0 | 
 |     100  1632        0    ....       0             0          0          0                2        0 | 
 |   ... | 
 |  | 
 |  | 
 | Size classes #95-99 are merged with size class #100. This means that when we | 
 | need to store an object of size, say, 1568 bytes, we end up using size class | 
 | #100 instead of size class #96. Size class #100 is meant for objects of size | 
 | 1632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes. | 
 |  | 
 | Size class #100 consists of zspages with 2 physical pages each, which can | 
 | hold a total of 5 objects. If we need to store 13 objects of size 1568, we | 
 | end up allocating three zspages, or 6 physical pages. | 
 |  | 
 | However, if we take a closer look at size class #96 (which is meant for | 
 | objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we | 
 | find that the most optimal zspage configuration for this class is a chain | 
 | of 5 physical pages::: | 
 |  | 
 |     pages per zspage      wasted bytes     used% | 
 |            1                  960           76 | 
 |            2                  352           95 | 
 |            3                 1312           89 | 
 |            4                  704           95 | 
 |            5                   96           99 | 
 |  | 
 | This means that a class #96 configuration with 5 physical pages can store 13 | 
 | objects of size 1568 in a single zspage, using a total of 5 physical pages. | 
 | This is more efficient than the class #100 configuration, which would use 6 | 
 | physical pages to store the same number of objects. | 
 |  | 
 | As the zspage chain size for class #96 increases, its key characteristics | 
 | such as pages per-zspage and objects per-zspage also change. This leads to | 
 | dewer class mergers, resulting in a more compact grouping of classes, which | 
 | reduces memory wastage. | 
 |  | 
 | Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`::: | 
 |  | 
 |   class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable | 
 |  | 
 |   ... | 
 |     202  3264         0   ..         0             0          0          0                4        0 | 
 |     254  4096         0   ..         0             0          0          0                1        0 | 
 |   ... | 
 |  | 
 | Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages | 
 | per zspage. Any object larger than 3264 bytes is considered huge and belongs | 
 | to size class #254, which stores each object in its own physical page (objects | 
 | in huge classes do not share pages). | 
 |  | 
 | Increasing the size of the chain of zspages also results in a higher watermark | 
 | for the huge size class and fewer huge classes overall. This allows for more | 
 | efficient storage of large objects. | 
 |  | 
 | For zspage chain size of 8, huge class watermark becomes 3632 bytes::: | 
 |  | 
 |   class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable | 
 |  | 
 |   ... | 
 |     202  3264         0   ..         0             0          0          0                4        0 | 
 |     211  3408         0   ..         0             0          0          0                5        0 | 
 |     217  3504         0   ..         0             0          0          0                6        0 | 
 |     222  3584         0   ..         0             0          0          0                7        0 | 
 |     225  3632         0   ..         0             0          0          0                8        0 | 
 |     254  4096         0   ..         0             0          0          0                1        0 | 
 |   ... | 
 |  | 
 | For zspage chain size of 16, huge class watermark becomes 3840 bytes::: | 
 |  | 
 |   class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable | 
 |  | 
 |   ... | 
 |     202  3264         0   ..         0             0          0          0                4        0 | 
 |     206  3328         0   ..         0             0          0          0               13        0 | 
 |     207  3344         0   ..         0             0          0          0                9        0 | 
 |     208  3360         0   ..         0             0          0          0               14        0 | 
 |     211  3408         0   ..         0             0          0          0                5        0 | 
 |     212  3424         0   ..         0             0          0          0               16        0 | 
 |     214  3456         0   ..         0             0          0          0               11        0 | 
 |     217  3504         0   ..         0             0          0          0                6        0 | 
 |     219  3536         0   ..         0             0          0          0               13        0 | 
 |     222  3584         0   ..         0             0          0          0                7        0 | 
 |     223  3600         0   ..         0             0          0          0               15        0 | 
 |     225  3632         0   ..         0             0          0          0                8        0 | 
 |     228  3680         0   ..         0             0          0          0                9        0 | 
 |     230  3712         0   ..         0             0          0          0               10        0 | 
 |     232  3744         0   ..         0             0          0          0               11        0 | 
 |     234  3776         0   ..         0             0          0          0               12        0 | 
 |     235  3792         0   ..         0             0          0          0               13        0 | 
 |     236  3808         0   ..         0             0          0          0               14        0 | 
 |     238  3840         0   ..         0             0          0          0               15        0 | 
 |     254  4096         0   ..         0             0          0          0                1        0 | 
 |   ... | 
 |  | 
 | Overall the combined zspage chain size effect on zsmalloc pool configuration::: | 
 |  | 
 |   pages per zspage   number of size classes (clusters)   huge size class watermark | 
 |          4                        69                               3264 | 
 |          5                        86                               3408 | 
 |          6                        93                               3504 | 
 |          7                       112                               3584 | 
 |          8                       123                               3632 | 
 |          9                       140                               3680 | 
 |         10                       143                               3712 | 
 |         11                       159                               3744 | 
 |         12                       164                               3776 | 
 |         13                       180                               3792 | 
 |         14                       183                               3808 | 
 |         15                       188                               3840 | 
 |         16                       191                               3840 | 
 |  | 
 |  | 
 | A synthetic test | 
 | ---------------- | 
 |  | 
 | zram as a build artifacts storage (Linux kernel compilation). | 
 |  | 
 | * `CONFIG_ZSMALLOC_CHAIN_SIZE=4` | 
 |  | 
 |   zsmalloc classes stats::: | 
 |  | 
 |     class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable | 
 |  | 
 |     ... | 
 |     Total              13   ..        51        413836     412973     159955                         3 | 
 |  | 
 |   zram mm_stat::: | 
 |  | 
 |    1691783168 628083717 655175680        0 655175680       60        0    34048    34049 | 
 |  | 
 |  | 
 | * `CONFIG_ZSMALLOC_CHAIN_SIZE=8` | 
 |  | 
 |   zsmalloc classes stats::: | 
 |  | 
 |     class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable | 
 |  | 
 |     ... | 
 |     Total              18   ..        87        414852     412978     156666                         0 | 
 |  | 
 |   zram mm_stat::: | 
 |  | 
 |     1691803648 627793930 641703936        0 641703936       60        0    33591    33591 | 
 |  | 
 | Using larger zspage chains may result in using fewer physical pages, as seen | 
 | in the example where the number of physical pages used decreased from 159955 | 
 | to 156666, at the same time maximum zsmalloc pool memory usage went down from | 
 | 655175680 to 641703936 bytes. | 
 |  | 
 | However, this advantage may be offset by the potential for increased system | 
 | memory pressure (as some zspages have larger chain sizes) in cases where there | 
 | is heavy internal fragmentation and zspool compaction is unable to relocate | 
 | objects and release zspages. In these cases, it is recommended to decrease | 
 | the limit on the size of the zspage chains (as specified by the | 
 | CONFIG_ZSMALLOC_CHAIN_SIZE option). | 
 |  | 
 | Functions | 
 | ========= | 
 |  | 
 | .. kernel-doc:: mm/zsmalloc.c |