| From: Johannes Weiner <hannes@cmpxchg.org> |
| Subject: mm: kill frontswap |
| Date: Mon, 17 Jul 2023 12:02:27 -0400 |
| |
| The only user of frontswap is zswap, and has been for a long time. Have |
| swap call into zswap directly and remove the indirection. |
| |
| [hannes@cmpxchg.org: remove obsolete comment, per Yosry] |
| Link: https://lkml.kernel.org/r/20230719142832.GA932528@cmpxchg.org |
| [fengwei.yin@intel.com: don't warn if none swapcache folio is passed to zswap_load] |
| Link: https://lkml.kernel.org/r/20230810095652.3905184-1-fengwei.yin@intel.com |
| Link: https://lkml.kernel.org/r/20230717160227.GA867137@cmpxchg.org |
| Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> |
| Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> |
| Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
| Acked-by: Nhat Pham <nphamcs@gmail.com> |
| Acked-by: Yosry Ahmed <yosryahmed@google.com> |
| Acked-by: Christoph Hellwig <hch@lst.de> |
| Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> |
| Cc: Matthew Wilcox (Oracle) <willy@infradead.org> |
| Cc: Vitaly Wool <vitaly.wool@konsulko.com> |
| Cc: Vlastimil Babka <vbabka@suse.cz> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| Documentation/admin-guide/mm/zswap.rst | 14 |
| Documentation/mm/frontswap.rst | 264 ----------- |
| Documentation/mm/index.rst | 1 |
| Documentation/translations/zh_CN/mm/frontswap.rst | 196 -------- |
| Documentation/translations/zh_CN/mm/index.rst | 1 |
| MAINTAINERS | 7 |
| fs/proc/meminfo.c | 1 |
| include/linux/frontswap.h | 91 --- |
| include/linux/swap.h | 9 |
| include/linux/swapfile.h | 5 |
| include/linux/zswap.h | 37 + |
| mm/Kconfig | 4 |
| mm/Makefile | 1 |
| mm/frontswap.c | 283 ------------ |
| mm/page_io.c | 6 |
| mm/swapfile.c | 33 - |
| mm/zswap.c | 159 ++---- |
| 17 files changed, 121 insertions(+), 991 deletions(-) |
| |
| --- a/Documentation/admin-guide/mm/zswap.rst~mm-kill-frontswap |
| +++ a/Documentation/admin-guide/mm/zswap.rst |
| @@ -49,7 +49,7 @@ compressed pool. |
| Design |
| ====== |
| |
| -Zswap receives pages for compression through the Frontswap API and is able to |
| +Zswap receives pages for compression from the swap subsystem and is able to |
| evict pages from its own compressed pool on an LRU basis and write them back to |
| the backing swap device in the case that the compressed pool is full. |
| |
| @@ -70,19 +70,19 @@ means the compression ratio will always |
| zbud pages). The zsmalloc type zpool has a more complex compressed page |
| storage method, and it can achieve greater storage densities. |
| |
| -When a swap page is passed from frontswap to zswap, zswap maintains a mapping |
| +When a swap page is passed from swapout to zswap, zswap maintains a mapping |
| of the swap entry, a combination of the swap type and swap offset, to the zpool |
| handle that references that compressed swap page. This mapping is achieved |
| with a red-black tree per swap type. The swap offset is the search key for the |
| tree nodes. |
| |
| -During a page fault on a PTE that is a swap entry, frontswap calls the zswap |
| -load function to decompress the page into the page allocated by the page fault |
| -handler. |
| +During a page fault on a PTE that is a swap entry, the swapin code calls the |
| +zswap load function to decompress the page into the page allocated by the page |
| +fault handler. |
| |
| Once there are no PTEs referencing a swap page stored in zswap (i.e. the count |
| -in the swap_map goes to 0) the swap code calls the zswap invalidate function, |
| -via frontswap, to free the compressed entry. |
| +in the swap_map goes to 0) the swap code calls the zswap invalidate function |
| +to free the compressed entry. |
| |
| Zswap seeks to be simple in its policies. Sysfs attributes allow for one user |
| controlled policy: |
| --- a/Documentation/mm/frontswap.rst |
| +++ /dev/null |
| @@ -1,264 +0,0 @@ |
| -========= |
| -Frontswap |
| -========= |
| - |
| -Frontswap provides a "transcendent memory" interface for swap pages. |
| -In some environments, dramatic performance savings may be obtained because |
| -swapped pages are saved in RAM (or a RAM-like device) instead of a swap disk. |
| - |
| -.. _Transcendent memory in a nutshell: https://lwn.net/Articles/454795/ |
| - |
| -Frontswap is so named because it can be thought of as the opposite of |
| -a "backing" store for a swap device. The storage is assumed to be |
| -a synchronous concurrency-safe page-oriented "pseudo-RAM device" conforming |
| -to the requirements of transcendent memory (such as Xen's "tmem", or |
| -in-kernel compressed memory, aka "zcache", or future RAM-like devices); |
| -this pseudo-RAM device is not directly accessible or addressable by the |
| -kernel and is of unknown and possibly time-varying size. The driver |
| -links itself to frontswap by calling frontswap_register_ops to set the |
| -frontswap_ops funcs appropriately and the functions it provides must |
| -conform to certain policies as follows: |
| - |
| -An "init" prepares the device to receive frontswap pages associated |
| -with the specified swap device number (aka "type"). A "store" will |
| -copy the page to transcendent memory and associate it with the type and |
| -offset associated with the page. A "load" will copy the page, if found, |
| -from transcendent memory into kernel memory, but will NOT remove the page |
| -from transcendent memory. An "invalidate_page" will remove the page |
| -from transcendent memory and an "invalidate_area" will remove ALL pages |
| -associated with the swap type (e.g., like swapoff) and notify the "device" |
| -to refuse further stores with that swap type. |
| - |
| -Once a page is successfully stored, a matching load on the page will normally |
| -succeed. So when the kernel finds itself in a situation where it needs |
| -to swap out a page, it first attempts to use frontswap. If the store returns |
| -success, the data has been successfully saved to transcendent memory and |
| -a disk write and, if the data is later read back, a disk read are avoided. |
| -If a store returns failure, transcendent memory has rejected the data, and the |
| -page can be written to swap as usual. |
| - |
| -Note that if a page is stored and the page already exists in transcendent memory |
| -(a "duplicate" store), either the store succeeds and the data is overwritten, |
| -or the store fails AND the page is invalidated. This ensures stale data may |
| -never be obtained from frontswap. |
| - |
| -If properly configured, monitoring of frontswap is done via debugfs in |
| -the `/sys/kernel/debug/frontswap` directory. The effectiveness of |
| -frontswap can be measured (across all swap devices) with: |
| - |
| -``failed_stores`` |
| - how many store attempts have failed |
| - |
| -``loads`` |
| - how many loads were attempted (all should succeed) |
| - |
| -``succ_stores`` |
| - how many store attempts have succeeded |
| - |
| -``invalidates`` |
| - how many invalidates were attempted |
| - |
| -A backend implementation may provide additional metrics. |
| - |
| -FAQ |
| -=== |
| - |
| -* Where's the value? |
| - |
| -When a workload starts swapping, performance falls through the floor. |
| -Frontswap significantly increases performance in many such workloads by |
| -providing a clean, dynamic interface to read and write swap pages to |
| -"transcendent memory" that is otherwise not directly addressable to the kernel. |
| -This interface is ideal when data is transformed to a different form |
| -and size (such as with compression) or secretly moved (as might be |
| -useful for write-balancing for some RAM-like devices). Swap pages (and |
| -evicted page-cache pages) are a great use for this kind of slower-than-RAM- |
| -but-much-faster-than-disk "pseudo-RAM device". |
| - |
| -Frontswap with a fairly small impact on the kernel, |
| -provides a huge amount of flexibility for more dynamic, flexible RAM |
| -utilization in various system configurations: |
| - |
| -In the single kernel case, aka "zcache", pages are compressed and |
| -stored in local memory, thus increasing the total anonymous pages |
| -that can be safely kept in RAM. Zcache essentially trades off CPU |
| -cycles used in compression/decompression for better memory utilization. |
| -Benchmarks have shown little or no impact when memory pressure is |
| -low while providing a significant performance improvement (25%+) |
| -on some workloads under high memory pressure. |
| - |
| -"RAMster" builds on zcache by adding "peer-to-peer" transcendent memory |
| -support for clustered systems. Frontswap pages are locally compressed |
| -as in zcache, but then "remotified" to another system's RAM. This |
| -allows RAM to be dynamically load-balanced back-and-forth as needed, |
| -i.e. when system A is overcommitted, it can swap to system B, and |
| -vice versa. RAMster can also be configured as a memory server so |
| -many servers in a cluster can swap, dynamically as needed, to a single |
| -server configured with a large amount of RAM... without pre-configuring |
| -how much of the RAM is available for each of the clients! |
| - |
| -In the virtual case, the whole point of virtualization is to statistically |
| -multiplex physical resources across the varying demands of multiple |
| -virtual machines. This is really hard to do with RAM and efforts to do |
| -it well with no kernel changes have essentially failed (except in some |
| -well-publicized special-case workloads). |
| -Specifically, the Xen Transcendent Memory backend allows otherwise |
| -"fallow" hypervisor-owned RAM to not only be "time-shared" between multiple |
| -virtual machines, but the pages can be compressed and deduplicated to |
| -optimize RAM utilization. And when guest OS's are induced to surrender |
| -underutilized RAM (e.g. with "selfballooning"), sudden unexpected |
| -memory pressure may result in swapping; frontswap allows those pages |
| -to be swapped to and from hypervisor RAM (if overall host system memory |
| -conditions allow), thus mitigating the potentially awful performance impact |
| -of unplanned swapping. |
| - |
| -A KVM implementation is underway and has been RFC'ed to lkml. And, |
| -using frontswap, investigation is also underway on the use of NVM as |
| -a memory extension technology. |
| - |
| -* Sure there may be performance advantages in some situations, but |
| - what's the space/time overhead of frontswap? |
| - |
| -If CONFIG_FRONTSWAP is disabled, every frontswap hook compiles into |
| -nothingness and the only overhead is a few extra bytes per swapon'ed |
| -swap device. If CONFIG_FRONTSWAP is enabled but no frontswap "backend" |
| -registers, there is one extra global variable compared to zero for |
| -every swap page read or written. If CONFIG_FRONTSWAP is enabled |
| -AND a frontswap backend registers AND the backend fails every "store" |
| -request (i.e. provides no memory despite claiming it might), |
| -CPU overhead is still negligible -- and since every frontswap fail |
| -precedes a swap page write-to-disk, the system is highly likely |
| -to be I/O bound and using a small fraction of a percent of a CPU |
| -will be irrelevant anyway. |
| - |
| -As for space, if CONFIG_FRONTSWAP is enabled AND a frontswap backend |
| -registers, one bit is allocated for every swap page for every swap |
| -device that is swapon'd. This is added to the EIGHT bits (which |
| -was sixteen until about 2.6.34) that the kernel already allocates |
| -for every swap page for every swap device that is swapon'd. (Hugh |
| -Dickins has observed that frontswap could probably steal one of |
| -the existing eight bits, but let's worry about that minor optimization |
| -later.) For very large swap disks (which are rare) on a standard |
| -4K pagesize, this is 1MB per 32GB swap. |
| - |
| -When swap pages are stored in transcendent memory instead of written |
| -out to disk, there is a side effect that this may create more memory |
| -pressure that can potentially outweigh the other advantages. A |
| -backend, such as zcache, must implement policies to carefully (but |
| -dynamically) manage memory limits to ensure this doesn't happen. |
| - |
| -* OK, how about a quick overview of what this frontswap patch does |
| - in terms that a kernel hacker can grok? |
| - |
| -Let's assume that a frontswap "backend" has registered during |
| -kernel initialization; this registration indicates that this |
| -frontswap backend has access to some "memory" that is not directly |
| -accessible by the kernel. Exactly how much memory it provides is |
| -entirely dynamic and random. |
| - |
| -Whenever a swap-device is swapon'd frontswap_init() is called, |
| -passing the swap device number (aka "type") as a parameter. |
| -This notifies frontswap to expect attempts to "store" swap pages |
| -associated with that number. |
| - |
| -Whenever the swap subsystem is readying a page to write to a swap |
| -device (c.f swap_writepage()), frontswap_store is called. Frontswap |
| -consults with the frontswap backend and if the backend says it does NOT |
| -have room, frontswap_store returns -1 and the kernel swaps the page |
| -to the swap device as normal. Note that the response from the frontswap |
| -backend is unpredictable to the kernel; it may choose to never accept a |
| -page, it could accept every ninth page, or it might accept every |
| -page. But if the backend does accept a page, the data from the page |
| -has already been copied and associated with the type and offset, |
| -and the backend guarantees the persistence of the data. In this case, |
| -frontswap sets a bit in the "frontswap_map" for the swap device |
| -corresponding to the page offset on the swap device to which it would |
| -otherwise have written the data. |
| - |
| -When the swap subsystem needs to swap-in a page (swap_readpage()), |
| -it first calls frontswap_load() which checks the frontswap_map to |
| -see if the page was earlier accepted by the frontswap backend. If |
| -it was, the page of data is filled from the frontswap backend and |
| -the swap-in is complete. If not, the normal swap-in code is |
| -executed to obtain the page of data from the real swap device. |
| - |
| -So every time the frontswap backend accepts a page, a swap device read |
| -and (potentially) a swap device write are replaced by a "frontswap backend |
| -store" and (possibly) a "frontswap backend loads", which are presumably much |
| -faster. |
| - |
| -* Can't frontswap be configured as a "special" swap device that is |
| - just higher priority than any real swap device (e.g. like zswap, |
| - or maybe swap-over-nbd/NFS)? |
| - |
| -No. First, the existing swap subsystem doesn't allow for any kind of |
| -swap hierarchy. Perhaps it could be rewritten to accommodate a hierarchy, |
| -but this would require fairly drastic changes. Even if it were |
| -rewritten, the existing swap subsystem uses the block I/O layer which |
| -assumes a swap device is fixed size and any page in it is linearly |
| -addressable. Frontswap barely touches the existing swap subsystem, |
| -and works around the constraints of the block I/O subsystem to provide |
| -a great deal of flexibility and dynamicity. |
| - |
| -For example, the acceptance of any swap page by the frontswap backend is |
| -entirely unpredictable. This is critical to the definition of frontswap |
| -backends because it grants completely dynamic discretion to the |
| -backend. In zcache, one cannot know a priori how compressible a page is. |
| -"Poorly" compressible pages can be rejected, and "poorly" can itself be |
| -defined dynamically depending on current memory constraints. |
| - |
| -Further, frontswap is entirely synchronous whereas a real swap |
| -device is, by definition, asynchronous and uses block I/O. The |
| -block I/O layer is not only unnecessary, but may perform "optimizations" |
| -that are inappropriate for a RAM-oriented device including delaying |
| -the write of some pages for a significant amount of time. Synchrony is |
| -required to ensure the dynamicity of the backend and to avoid thorny race |
| -conditions that would unnecessarily and greatly complicate frontswap |
| -and/or the block I/O subsystem. That said, only the initial "store" |
| -and "load" operations need be synchronous. A separate asynchronous thread |
| -is free to manipulate the pages stored by frontswap. For example, |
| -the "remotification" thread in RAMster uses standard asynchronous |
| -kernel sockets to move compressed frontswap pages to a remote machine. |
| -Similarly, a KVM guest-side implementation could do in-guest compression |
| -and use "batched" hypercalls. |
| - |
| -In a virtualized environment, the dynamicity allows the hypervisor |
| -(or host OS) to do "intelligent overcommit". For example, it can |
| -choose to accept pages only until host-swapping might be imminent, |
| -then force guests to do their own swapping. |
| - |
| -There is a downside to the transcendent memory specifications for |
| -frontswap: Since any "store" might fail, there must always be a real |
| -slot on a real swap device to swap the page. Thus frontswap must be |
| -implemented as a "shadow" to every swapon'd device with the potential |
| -capability of holding every page that the swap device might have held |
| -and the possibility that it might hold no pages at all. This means |
| -that frontswap cannot contain more pages than the total of swapon'd |
| -swap devices. For example, if NO swap device is configured on some |
| -installation, frontswap is useless. Swapless portable devices |
| -can still use frontswap but a backend for such devices must configure |
| -some kind of "ghost" swap device and ensure that it is never used. |
| - |
| -* Why this weird definition about "duplicate stores"? If a page |
| - has been previously successfully stored, can't it always be |
| - successfully overwritten? |
| - |
| -Nearly always it can, but no, sometimes it cannot. Consider an example |
| -where data is compressed and the original 4K page has been compressed |
| -to 1K. Now an attempt is made to overwrite the page with data that |
| -is non-compressible and so would take the entire 4K. But the backend |
| -has no more space. In this case, the store must be rejected. Whenever |
| -frontswap rejects a store that would overwrite, it also must invalidate |
| -the old data and ensure that it is no longer accessible. Since the |
| -swap subsystem then writes the new data to the read swap device, |
| -this is the correct course of action to ensure coherency. |
| - |
| -* Why does the frontswap patch create the new include file swapfile.h? |
| - |
| -The frontswap code depends on some swap-subsystem-internal data |
| -structures that have, over the years, moved back and forth between |
| -static and global. This seemed a reasonable compromise: Define |
| -them as global but declare them in a new include file that isn't |
| -included by the large number of source files that include swap.h. |
| - |
| -Dan Magenheimer, last updated April 9, 2012 |
| --- a/Documentation/mm/index.rst~mm-kill-frontswap |
| +++ a/Documentation/mm/index.rst |
| @@ -44,7 +44,6 @@ above structured documentation, or delet |
| balance |
| damon/index |
| free_page_reporting |
| - frontswap |
| hmm |
| hwpoison |
| hugetlbfs_reserv |
| --- a/Documentation/translations/zh_CN/mm/frontswap.rst |
| +++ /dev/null |
| @@ -1,196 +0,0 @@ |
| -:Original: Documentation/mm/frontswap.rst |
| - |
| -:ç¿»è¯: |
| - |
| - å¸å»¶è
¾ Yanteng Si <siyanteng@loongson.cn> |
| - |
| -:æ ¡è¯: |
| - |
| -========= |
| -Frontswap |
| -========= |
| - |
| -Frontswap为交æ¢é¡µæä¾äºä¸ä¸ª âtranscendent memoryâ çæ¥å£ãå¨ä¸äºç¯å¢ä¸ï¼ç± |
| -äºäº¤æ¢é¡µè¢«ä¿åå¨RAMï¼æç±»ä¼¼RAMç设å¤ï¼ä¸ï¼è䏿¯äº¤æ¢ç£çï¼å æ¤å¯ä»¥è·å¾å·¨å¤§çæ§è½ |
| -èçï¼æé«ï¼ã |
| - |
| -.. _Transcendent memory in a nutshell: https://lwn.net/Articles/454795/ |
| - |
| -Frontswap乿以è¿ä¹å½åï¼æ¯å 为å®å¯ä»¥è¢«è®¤ä¸ºæ¯ä¸swap设å¤çâbackâåå¨ç¸åãå |
| -å¨å¨è¢«è®¤ä¸ºæ¯ä¸ä¸ªåæ¥å¹¶åå®å
¨çé¢å页é¢çâ伪RAM设å¤âï¼ç¬¦åtranscendent memory |
| -ï¼å¦Xençâtmemâï¼æå
æ ¸å
å缩å
åï¼åç§°âzcacheâï¼ææªæ¥ç类似RAMç设å¤ï¼çè¦ |
| -æ±ï¼è¿ä¸ªä¼ªRAM设å¤ä¸è½è¢«å
æ ¸ç´æ¥è®¿é®æå¯»åï¼å
¶å¤§å°æªç¥ä¸å¯è½éæ¶é´ååã驱å¨ç¨åºéè¿ |
| -è°ç¨frontswap_register_opså°èªå·±ä¸frontswap龿¥èµ·æ¥ï¼ä»¥éå½å°è®¾ç½®frontswap_ops |
| -çåè½ï¼å®æä¾çåè½å¿
须符åæäºçç¥ï¼å¦ä¸æç¤º: |
| - |
| -ä¸ä¸ª âinitâ å°è®¾å¤åå¤å¥½æ¥æ¶ä¸æå®ç交æ¢è®¾å¤ç¼å·ï¼åç§°âç±»åâï¼ç¸å
³çfrontswap |
| -交æ¢é¡µãä¸ä¸ª âstoreâ å°æè¯¥é¡µå¤å¶å°transcendent memoryï¼å¹¶ä¸è¯¥é¡µçç±»åååç§» |
| -éç¸å
³èãä¸ä¸ª âloadâ å°æè¯¥é¡µï¼å¦ææ¾å°çè¯ï¼ä»transcendent memoryå¤å¶å°å
æ ¸ |
| -å
åï¼ä½ä¸ä¼ä»transcendent memoryä¸å é¤è¯¥é¡µãä¸ä¸ª âinvalidate_pageâ å°ä» |
| -transcendent memoryä¸å é¤è¯¥é¡µï¼ä¸ä¸ª âinvalidate_areaâ å°å 餿æä¸äº¤æ¢ç±»å |
| -ç¸å
³ç页ï¼ä¾å¦ï¼åswapoffï¼å¹¶éç¥ âdeviceâ æç»è¿ä¸æ¥åå¨è¯¥äº¤æ¢ç±»åã |
| - |
| -䏿¦ä¸ä¸ªé¡µé¢è¢«æååå¨ï¼å¨è¯¥é¡µé¢ä¸çå¹é
å è½½é叏伿åãå æ¤ï¼å½å
æ ¸åç°èªå·±å¤äºé |
| -è¦äº¤æ¢é¡µé¢çæ
嵿¶ï¼å®é¦å
å°è¯ä½¿ç¨frontswapã妿åå¨çç»ææ¯æåçï¼é£ä¹æ°æ®å°±å·² |
| -ç»æåçä¿åå°äºtranscendent memoryä¸ï¼å¹¶ä¸é¿å
äºç£çåå
¥ï¼å¦æåæ¥åè¯»åæ°æ®ï¼ |
| -ä¹é¿å
äºç£ç读åã妿åå¨è¿å失败ï¼transcendent memoryå·²ç»æç»äºè¯¥æ°æ®ï¼ä¸è¯¥é¡µ |
| -å¯ä»¥åå¾å¸¸ä¸æ ·è¢«åå
¥äº¤æ¢ç©ºé´ã |
| - |
| -请注æï¼å¦æä¸ä¸ªé¡µé¢è¢«åå¨ï¼è该页é¢å·²ç»åå¨äºtranscendent memoryä¸ï¼ä¸ä¸ª âéå¤â |
| -çåå¨ï¼ï¼è¦ä¹å卿åï¼æ°æ®è¢«è¦çï¼è¦ä¹åå¨å¤±è´¥ï¼è¯¥é¡µé¢è¢«åºæ¢ãè¿ç¡®ä¿äºæ§çæ°æ®æ°¸è¿ |
| -ä¸ä¼ä»frontswapä¸è·å¾ã |
| - |
| -妿é
ç½®æ£ç¡®ï¼å¯¹frontswapççæ§æ¯éè¿ `/sys/kernel/debug/frontswap` ç®å½ä¸ç |
| -debugfs宿çãfrontswapçæææ§å¯ä»¥éè¿ä»¥ä¸æ¹å¼æµéï¼å¨ææäº¤æ¢è®¾å¤ä¸ï¼: |
| - |
| -``failed_stores`` |
| - æå¤å°æ¬¡åå¨çå°è¯æ¯å¤±è´¥ç |
| - |
| -``loads`` |
| - å°è¯äºå¤å°æ¬¡å è½½ï¼åºè¯¥å
¨é¨æåï¼ |
| - |
| -``succ_stores`` |
| - æå¤å°æ¬¡åå¨çå°è¯æ¯æåç |
| - |
| -``invalidates`` |
| - å°è¯äºå¤å°æ¬¡ä½åº |
| - |
| -åå°å®ç°å¯ä»¥æä¾é¢å¤çææ ã |
| - |
| -ç»å¸¸é®å°çé®é¢ |
| -============== |
| - |
| -* ä»·å¼å¨åªé? |
| - |
| -å½ä¸ä¸ªå·¥ä½è´è½½å¼å§äº¤æ¢æ¶ï¼æ§è½å°±ä¼ä¸éãFrontswapéè¿æä¾ä¸ä¸ªå¹²åçãå¨æçæ¥å£æ¥ |
| -读åååå
¥äº¤æ¢é¡µå° âtranscendent memoryâï¼ä»è大大å¢å äºè®¸å¤è¿æ ·çå·¥ä½è´è½½çæ§ |
| -è½ï¼å¦åå
æ ¸æ¯æ æ³ç´æ¥å¯»åçã彿°æ®è¢«è½¬æ¢ä¸ºä¸åçå½¢å¼å大å°ï¼æ¯å¦åç¼©ï¼æè
被ç§å¯ |
| -ç§»å¨ï¼å¯¹äºä¸äºç±»ä¼¼RAMçè®¾å¤æ¥è¯´ï¼è¿å¯è½å¯¹åå¹³è¡¡å¾æç¨ï¼æ¶ï¼è¿ä¸ªæ¥å£æ¯çæ³çãäº¤æ¢ |
| -页ï¼å被驱éç页é¢ç¼åé¡µï¼æ¯è¿ç§æ¯RAMæ
¢ä½æ¯ç£çå¿«å¾å¤çâ伪RAM设å¤âçä¸å¤§ç¨éã |
| - |
| -Frontswap对å
æ ¸çå½±åç¸å½å°ï¼ä¸ºåç§ç³»ç»é
ç½®ä¸æ´å¨æãæ´çµæ´»çRAMå©ç¨æä¾äºå·¨å¤§ç |
| -çµæ´»æ§ï¼ |
| - |
| -å¨åä¸å
æ ¸çæ
åµä¸ï¼åç§°âzcacheâï¼é¡µé¢è¢«å缩并åå¨å¨æ¬å°å
åä¸ï¼ä»èå¢å äºå¯ä»¥å® |
| -å
¨ä¿åå¨RAMä¸çå¿å页颿»æ°ãZcacheæ¬è´¨ä¸æ¯ç¨å缩/è§£å缩çCPU卿æ¢åæ´å¥½çå
åå© |
| -ç¨çãBenchmarksæµè¯æ¾ç¤ºï¼å½å
åååè¾ä½æ¶ï¼å 乿²¡æå½±åï¼èå¨é«å
åååä¸çä¸äº |
| -å·¥ä½è´è½½ä¸ï¼åæææ¾çæ§è½æ¹åï¼25%以ä¸ï¼ã |
| - |
| -âRAMsterâ å¨zcacheçåºç¡ä¸å¢å äºå¯¹é群系ç»ç âpeer-to-peerâ transcendent memory |
| -çæ¯æãFrontswap页é¢åzcache䏿 ·è¢«æ¬å°å缩ï¼ä½éå被âremotifiedâ å°å¦ä¸ä¸ªç³» |
| -ç»çRAMãè¿ä½¿å¾RAMå¯ä»¥æ ¹æ®éè¦å¨æå°æ¥åè´è½½å¹³è¡¡ï¼ä¹å°±æ¯è¯´ï¼å½ç³»ç»Aè¶
è½½æ¶ï¼å®å¯ä»¥ |
| -交æ¢å°ç³»ç»Bï¼åä¹äº¦ç¶ãRAMsterä¹å¯ä»¥è¢«é
ç½®æä¸ä¸ªå
åæå¡å¨ï¼å æ¤é群ä¸çè®¸å¤æå¡å¨ |
| -å¯ä»¥æ ¹æ®éè¦å¨æå°äº¤æ¢å°é
ç½®æå¤§éå
åçå䏿å¡å¨ä¸......èä¸éè¦é¢å
é
ç½®æ¯ä¸ªå®¢æ· |
| -æå¤å°å
åå¯ç¨ |
| - |
| -å¨èææ
åµä¸ï¼èæåçå
¨é¨æä¹å¨äºç»è®¡å°å°ç©çèµæºå¨å¤ä¸ªèææºçä¸åéæ±ä¹é´è¿è¡å¤ |
| -ç¨ã对äºRAMæ¥è¯´ï¼è¿ççå¾é¾åå°ï¼èä¸å¨ä¸æ¹åå
æ ¸çæ
åµä¸ï¼è¦å好è¿ä¸ç¹çåªååºæ¬ä¸ |
| -æ¯å¤±è´¥çï¼é¤äºä¸äºå¹¿ä¸ºäººç¥çç¹æ®æ
åµä¸çå·¥ä½è´è½½ï¼ãå
·ä½æ¥è¯´ï¼Xen Transcendent Memory |
| -å端å
许管ç卿¥æçRAM âfallowâï¼ä¸ä»
å¯ä»¥å¨å¤ä¸ªèææºä¹é´è¿è¡âtime-sharedâï¼ |
| -èä¸é¡µé¢å¯ä»¥è¢«å缩åéå¤å©ç¨ï¼ä»¥ä¼åRAMçå©ç¨çãå½å®¢æ·æä½ç³»ç»è¢«è¯±å¯¼äº¤åºæªå
åå©ç¨ |
| -çRAMæ¶ï¼å¦ âselfballooningâï¼ï¼çªç¶åºç°çæå¤å
åååå¯è½ä¼å¯¼è´äº¤æ¢ï¼frontswap |
| -å
许è¿äºé¡µé¢è¢«äº¤æ¢å°ç®¡çå¨RAM䏿ä»ç®¡çå¨RAMä¸äº¤æ¢ï¼å¦ææ´ä½ä¸»æºç³»ç»å
忡件å
许ï¼ï¼ |
| -ä»èå轻计åå¤äº¤æ¢å¯è½å¸¦æ¥çå¯æçæ§è½å½±åã |
| - |
| -ä¸ä¸ªKVMçå®ç°æ£å¨è¿è¡ä¸ï¼å¹¶ä¸å·²ç»è¢«RFC'edå°lkmlãèä¸ï¼å©ç¨frontswapï¼å¯¹NVMä½ä¸º |
| -å
忩屿æ¯çè°æ¥ä¹å¨è¿è¡ä¸ã |
| - |
| -* å½ç¶ï¼å¨æäºæ
åµä¸å¯è½ææ§è½ä¸çä¼å¿ï¼ä½frontswapç空é´/æ¶é´å¼éæ¯å¤å°ï¼ |
| - |
| -妿 CONFIG_FRONTSWAP 被ç¦ç¨ï¼æ¯ä¸ª frontswap é©åé½ä¼ç¼è¯æç©ºï¼å¯ä¸çå¼éæ¯æ¯ |
| -个 swapon'ed swap 设å¤çå 个é¢å¤åèã妿 CONFIG_FRONTSWAP 被å¯ç¨ï¼ä½æ²¡æ |
| -frontswapç âbackendâ å¯åå¨ï¼æ¯è¯»æåä¸ä¸ªäº¤æ¢é¡µå°±ä¼æä¸ä¸ªé¢å¤çå
¨å±åéï¼èä¸ |
| -æ¯é¶ã妿 CONFIG_FRONTSWAP 被å¯ç¨ï¼å¹¶ä¸æä¸ä¸ªfrontswapçbackendå¯åå¨ï¼å¹¶ä¸ |
| -åç«¯æ¯æ¬¡ âstoreâ 请æ±é½å¤±è´¥ï¼å³å°½ç®¡å£°ç§°å¯è½ï¼ä½æ²¡ææä¾å
åï¼ï¼CPU çå¼éä»ç¶å¯ä»¥ |
| -忽ç¥ä¸è®¡ - å ä¸ºæ¯æ¬¡frontswapå¤±è´¥é½æ¯å¨äº¤æ¢é¡µåå°ç£çä¹åï¼ç³»ç»å¾å¯è½æ¯ I/O ç»å® |
| -çï¼æ 论å¦ä½ä½¿ç¨ä¸å°é¨åç CPU 齿¯ä¸ç¸å
³çã |
| - |
| -è³äºç©ºé´ï¼å¦æCONFIG_FRONTSWAP被å¯ç¨ï¼å¹¶ä¸æä¸ä¸ªfrontswapçbackend注åï¼é£ä¹ |
| -æ¯ä¸ªäº¤æ¢è®¾å¤çæ¯ä¸ªäº¤æ¢é¡µé½ä¼è¢«åé
ä¸ä¸ªæ¯ç¹ãè¿æ¯å¨å
æ ¸å·²ç»ä¸ºæ¯ä¸ªäº¤æ¢è®¾å¤çæ¯ä¸ªäº¤æ¢ |
| -页åé
ç8ä½ï¼å¨2.6.34ä¹åæ¯16ä½ï¼ä¸å¢å çã(Hugh Dickinsè§å¯å°ï¼frontswapå¯è½ |
| -ä¼å·åç°æç8个æ¯ç¹ï¼ä½æ¯æä»¬ä»¥å忥æ
å¿è¿ä¸ªå°çä¼åé®é¢)ãå¯¹äºæ åç4K页é¢å¤§å°ç |
| -é常大ç交æ¢çï¼è¿å¾ç½è§ï¼ï¼è¿æ¯æ¯32GB交æ¢ç1MBå¼éã |
| - |
| -å½äº¤æ¢é¡µåå¨å¨transcendent memoryä¸è䏿¯åå°ç£ç䏿¶ï¼æä¸ä¸ªå¯ä½ç¨ï¼å³è¿å¯è½ä¼ |
| -äº§çæ´å¤çå
åååï¼æå¯è½è¶
è¿å
¶ä»çä¼ç¹ãä¸ä¸ªbackendï¼æ¯å¦zcacheï¼å¿
é¡»å®ç°çç¥ |
| -æ¥ä»ç»ï¼ä½å¨æå°ï¼ç®¡çå
åéå¶ï¼ä»¥ç¡®ä¿è¿ç§æ
åµä¸ä¼åçã |
| - |
| -* 好å§ï¼é£å°±ç¨å
æ ¸éªå®¢è½çè§£çæ¯è¯æ¥å¿«éæ¦è¿°ä¸ä¸è¿ä¸ªfrontswapè¡¥ä¸çä½ç¨å¦ä½ï¼ |
| - |
| -æä»¬å设å¨å
æ ¸åå§åè¿ç¨ä¸ï¼ä¸ä¸ªfrontswap ç âbackendâ å·²ç»æ³¨åäºï¼è¿ä¸ªæ³¨å表 |
| -æè¿ä¸ªfrontswap ç âbackendâ å¯ä»¥è®¿é®ä¸äºä¸è¢«å
æ ¸ç´æ¥è®¿é®çâå
åâãå®å°åºæ |
| -ä¾äºå¤å°å
忝å®å
¨å¨æåéæºçã |
| - |
| -æ¯å½ä¸ä¸ªäº¤æ¢è®¾å¤è¢«äº¤æ¢æ¶ï¼å°±ä¼è°ç¨frontswap_init()ï¼æäº¤æ¢è®¾å¤çç¼å·ï¼åç§°âç±» |
| -åâï¼ä½ä¸ºä¸ä¸ªåæ°ä¼ ç»å®ãè¿å°±éç¥äºfrontswapï¼ä»¥æå¾
âstoreâ ä¸è¯¥å·ç ç¸å
³ç交 |
| -æ¢é¡µçå°è¯ã |
| - |
| -æ¯å½äº¤æ¢åç³»ç»åå¤å°ä¸ä¸ªé¡µé¢åå
¥äº¤æ¢è®¾å¤æ¶ï¼åè§swap_writepage()ï¼ï¼å°±ä¼è°ç¨ |
| -frontswap_storeãFrontswapä¸frontswap backendååï¼å¦æbackendè¯´å®æ²¡æç©º |
| -é´ï¼frontswap_storeè¿å-1ï¼å
æ ¸å°±ä¼ç
§å¸¸æé¡µæ¢å°äº¤æ¢è®¾å¤ä¸ã注æï¼æ¥èªfrontswap |
| -backendçååºå¯¹å
æ ¸æ¥è¯´æ¯ä¸å¯é¢æµçï¼å®å¯è½éæ©ä»ä¸æ¥åä¸ä¸ªé¡µé¢ï¼å¯è½æ¥åæ¯ä¹ä¸ª |
| -页é¢ï¼ä¹å¯è½æ¥åæ¯ä¸ä¸ªé¡µé¢ã使¯å¦æbackendç¡®å®æ¥åäºä¸ä¸ªé¡µé¢ï¼é£ä¹è¿ä¸ªé¡µé¢çæ° |
| -æ®å·²ç»è¢«å¤å¶å¹¶ä¸ç±»åååç§»éç¸å
³èäºï¼èä¸backendä¿è¯äºæ°æ®çæä¹
æ§ãå¨è¿ç§æ
åµ |
| -ä¸ï¼frontswapå¨äº¤æ¢è®¾å¤çâfrontswap_mapâ ä¸è®¾ç½®äºä¸ä¸ªä½ï¼å¯¹åºäºäº¤æ¢è®¾å¤ä¸ç |
| -页é¢åç§»éï¼å¦åå®å°±ä¼å°æ°æ®åå
¥è¯¥è®¾å¤ã |
| - |
| -å½äº¤æ¢åç³»ç»éè¦äº¤æ¢ä¸ä¸ªé¡µé¢æ¶ï¼swap_readpage()ï¼ï¼å®é¦å
è°ç¨frontswap_load()ï¼ |
| -æ£æ¥frontswap_mapï¼çè¿ä¸ªé¡µé¢æ¯å¦æ©å
被frontswap backendæ¥åã妿æ¯ï¼è¯¥é¡µ |
| -çæ°æ®å°±ä¼ä»frontswapå端填å
ï¼æ¢å
¥å°±å®æäºã妿䏿¯ï¼æ£å¸¸ç交æ¢ä»£ç å°è¢«æ§è¡ï¼ |
| -以便ä»çæ£ç交æ¢è®¾å¤ä¸è·å¾è¿ä¸é¡µçæ°æ®ã |
| - |
| -æä»¥æ¯æ¬¡frontswap backendæ¥åä¸ä¸ªé¡µé¢æ¶ï¼äº¤æ¢è®¾å¤ç读ååï¼å¯è½ï¼äº¤æ¢è®¾å¤çå |
| -å
¥é½è¢« âfrontswap backend storeâ åï¼å¯è½ï¼âfrontswap backend loadsâ |
| -æå代ï¼è¿å¯è½ä¼å¿«å¾å¤ã |
| - |
| -* frontswapä¸è½è¢«é
置为ä¸ä¸ª âç¹æ®çâ 交æ¢è®¾å¤ï¼å®çä¼å
级è¦é«äºä»»ä½çæ£çäº¤æ¢ |
| - 设å¤ï¼ä¾å¦åzswapï¼æè
å¯è½æ¯swap-over-nbd/NFSï¼ï¼ |
| - |
| -é¦å
ï¼ç°æç交æ¢åç³»ç»ä¸å
许æä»»ä½ç§ç±»ç交æ¢å±æ¬¡ç»æãä¹è®¸å®å¯ä»¥è¢«éå以éåºå±æ¬¡ |
| -ç»æï¼ä½è¿å°éè¦ç¸å½å¤§çæ¹åãå³ä½¿å®è¢«éåï¼ç°æç交æ¢åç³»ç»ä¹ä½¿ç¨äºåI/Oå±ï¼å® |
| -åå®äº¤æ¢è®¾å¤æ¯åºå®å¤§å°çï¼å
¶ä¸çä»»ä½é¡µé¢é½æ¯å¯çº¿æ§å¯»åçãFrontswapå 乿²¡æè§¦ |
| -åç°æç交æ¢åç³»ç»ï¼èæ¯å´ç»çåI/Oåç³»ç»çéå¶ï¼æä¾äºå¤§éççµæ´»æ§å卿æ§ã |
| - |
| -ä¾å¦ï¼frontswap backend对任ä½äº¤æ¢é¡µçæ¥åæ¯å®å
¨ä¸å¯é¢æµçãè¿å¯¹frontswap backend |
| -çå®ä¹è³å
³éè¦ï¼å 为å®èµäºäºbackendå®å
¨å¨æçå³å®æãå¨zcacheä¸ï¼äººä»¬æ æ³é¢ |
| -å
ç¥éä¸ä¸ªé¡µé¢çå¯å缩æ§å¦ä½ãå¯åç¼©æ§ âå·®â ç页é¢ä¼è¢«æç»ï¼è âå·®â æ¬èº«ä¹å¯ |
| -ä»¥æ ¹æ®å½åçå
åéå¶å¨æå°å®ä¹ã |
| - |
| -æ¤å¤ï¼frontswapæ¯å®å
¨åæ¥çï¼èçæ£ç交æ¢è®¾å¤ï¼æ ¹æ®å®ä¹ï¼æ¯å¼æ¥çï¼å¹¶ä¸ä½¿ç¨ |
| -åI/OãåI/Oå±ä¸ä»
æ¯ä¸å¿
è¦çï¼èä¸å¯è½è¿è¡ âä¼åâï¼è¿å¯¹é¢åRAMçè®¾å¤æ¥è¯´æ¯ |
| -ä¸åéçï¼å
æ¬å°ä¸äºé¡µé¢çåå
¥å»¶è¿ç¸å½é¿çæ¶é´ã忥æ¯å¿
é¡»çï¼ä»¥ç¡®ä¿å端çå¨ |
| -ææ§ï¼å¹¶é¿å
æ£æçç«äºæ¡ä»¶ï¼è¿å°ä¸å¿
è¦å°å¤§å¤§å¢å frontswapå/æåI/Oåç³»ç»ç |
| -夿æ§ãä¹å°±æ¯è¯´ï¼åªææåç âstoreâ å âloadâ æä½æ¯éè¦åæ¥çãä¸ä¸ªç¬ç« |
| -ç弿¥çº¿ç¨å¯ä»¥èªç±å°æä½ç±frontswapåå¨ç页é¢ãä¾å¦ï¼RAMsterä¸ç âremotificationâ |
| -线ç¨ä½¿ç¨æ åç弿¥å
æ ¸å¥æ¥åï¼å°å缩çfrontswap页é¢ç§»å¨å°è¿ç¨æºå¨ãåæ ·ï¼ |
| -KVMçå®¢æ·æ¹å®ç°å¯ä»¥è¿è¡å®¢æ·å
å缩ï¼å¹¶ä½¿ç¨ âbatchedâ hypercallsã |
| - |
| -å¨èæåç¯å¢ä¸ï¼å¨ææ§å
许管çç¨åºï¼æä¸»æºæä½ç³»ç»ï¼åâintelligent overcommitâã |
| -ä¾å¦ï¼å®å¯ä»¥éæ©åªæ¥å页é¢ï¼ç´å°ä¸»æºäº¤æ¢å¯è½å³å°åçï¼ç¶åå¼ºè¿«å®¢æ·æºåä»ä»¬ |
| -èªå·±ç交æ¢ã |
| - |
| -transcendent memoryè§æ ¼çfrontswapæä¸ä¸ªåå¤ãå ä¸ºä»»ä½ âstoreâ é½å¯ |
| -è½å¤±è´¥ï¼æä»¥å¿
é¡»å¨ä¸ä¸ªçæ£ç交æ¢è®¾å¤ä¸æä¸ä¸ªçæ£çææ§½æ¥äº¤æ¢é¡µé¢ãå æ¤ï¼ |
| -frontswapå¿
é¡»ä½ä¸ºæ¯ä¸ªäº¤æ¢è®¾å¤ç âå½±åâ æ¥å®ç°ï¼å®æå¯è½å®¹çº³äº¤æ¢è®¾å¤å¯è½ |
| -å®¹çº³çæ¯ä¸ä¸ªé¡µé¢ï¼ä¹æå¯è½æ ¹æ¬ä¸å®¹çº³ä»»ä½é¡µé¢ãè¿æå³çfrontswapä¸è½å
嫿¯ |
| -swapè®¾å¤æ»æ°æ´å¤ç页é¢ãä¾å¦ï¼å¦æå¨æäºå®è£
䏿²¡æé
置交æ¢è®¾å¤ï¼frontswap |
| -就没æç¨ãæ 交æ¢è®¾å¤ç便æºå¼è®¾å¤ä»ç¶å¯ä»¥ä½¿ç¨frontswapï¼ä½æ¯è¿ç§è®¾å¤ç |
| -backendå¿
é¡»é
ç½®æç§ âghostâ 交æ¢è®¾å¤ï¼å¹¶ç¡®ä¿å®æ°¸è¿ä¸ä¼è¢«ä½¿ç¨ã |
| - |
| - |
| -* 为ä»ä¹ä¼æè¿ç§å
³äº âéå¤åå¨â ç奿ªå®ä¹ï¼å¦æä¸ä¸ªé¡µé¢ä»¥å被æåå°åå¨è¿ï¼ |
| - é¾éå®ä¸è½æ»æ¯è¢«æåå°è¦çåï¼ |
| - |
| -å 乿»æ¯å¯ä»¥çï¼ä¸ï¼ææ¶ä¸è½ãèèä¸ä¸ªä¾åï¼æ°æ®è¢«å缩äºï¼åæ¥ç4K页é¢è¢«å |
| -缩å°äº1Kãç°å¨ï¼æäººè¯å¾ç¨ä¸å¯åç¼©çæ°æ®è¦ç该页ï¼å æ¤ä¼å ç¨æ´ä¸ª4Kã使¯ |
| -backendæ²¡ææ´å¤ç空é´äºãå¨è¿ç§æ
åµä¸ï¼è¿ä¸ªåå¨å¿
须被æç»ãæ¯å½frontswap |
| -æç»ä¸ä¸ªä¼è¦ççå卿¶ï¼å®ä¹å¿
须使æ§çæ°æ®ä½åºï¼å¹¶ç¡®ä¿å®ä¸å被访é®ãå 为交 |
| -æ¢åç³»ç»ä¼ææ°çæ°æ®åå°è¯»äº¤æ¢è®¾å¤ä¸ï¼è¿æ¯ç¡®ä¿ä¸è´æ§çæ£ç¡®åæ³ã |
| - |
| -* 为ä»ä¹frontswapè¡¥ä¸ä¼å建æ°ç头æä»¶swapfile.hï¼ |
| - |
| -frontswap代ç ä¾èµäºä¸äºswapåç³»ç»å
é¨çæ°æ®ç»æï¼è¿äºæ°æ®ç»æå¤å¹´æ¥ä¸ç´ |
| -å¨éæåå
¨å±ä¹é´æ¥åç§»å¨ãè¿ä¼¼ä¹æ¯ä¸ä¸ªåçç妥åï¼å°å®ä»¬å®ä¹ä¸ºå
¨å±ï¼ä½å¨ä¸ |
| -个æ°çå
嫿件ä¸å£°æå®ä»¬ï¼è¯¥æä»¶ä¸è¢«å
å«swap.hçå¤§éæºæä»¶æå
å«ã |
| - |
| -Dan Magenheimerï¼æåæ´æ°äº2012å¹´4æ9æ¥ |
| --- a/Documentation/translations/zh_CN/mm/index.rst~mm-kill-frontswap |
| +++ a/Documentation/translations/zh_CN/mm/index.rst |
| @@ -42,7 +42,6 @@ ç»æåçææ¡£ä¸ï¼å¦æå®å·²ç»å |
| damon/index |
| free_page_reporting |
| ksm |
| - frontswap |
| hmm |
| hwpoison |
| hugetlbfs_reserv |
| --- a/fs/proc/meminfo.c~mm-kill-frontswap |
| +++ a/fs/proc/meminfo.c |
| @@ -17,6 +17,7 @@ |
| #ifdef CONFIG_CMA |
| #include <linux/cma.h> |
| #endif |
| +#include <linux/zswap.h> |
| #include <asm/page.h> |
| #include "internal.h" |
| |
| --- a/include/linux/frontswap.h |
| +++ /dev/null |
| @@ -1,91 +0,0 @@ |
| -/* SPDX-License-Identifier: GPL-2.0 */ |
| -#ifndef _LINUX_FRONTSWAP_H |
| -#define _LINUX_FRONTSWAP_H |
| - |
| -#include <linux/swap.h> |
| -#include <linux/mm.h> |
| -#include <linux/bitops.h> |
| -#include <linux/jump_label.h> |
| - |
| -struct frontswap_ops { |
| - void (*init)(unsigned); /* this swap type was just swapon'ed */ |
| - int (*store)(unsigned, pgoff_t, struct page *); /* store a page */ |
| - int (*load)(unsigned, pgoff_t, struct page *, bool *); /* load a page */ |
| - void (*invalidate_page)(unsigned, pgoff_t); /* page no longer needed */ |
| - void (*invalidate_area)(unsigned); /* swap type just swapoff'ed */ |
| -}; |
| - |
| -int frontswap_register_ops(const struct frontswap_ops *ops); |
| - |
| -extern void frontswap_init(unsigned type, unsigned long *map); |
| -extern int __frontswap_store(struct page *page); |
| -extern int __frontswap_load(struct page *page); |
| -extern void __frontswap_invalidate_page(unsigned, pgoff_t); |
| -extern void __frontswap_invalidate_area(unsigned); |
| - |
| -#ifdef CONFIG_FRONTSWAP |
| -extern struct static_key_false frontswap_enabled_key; |
| - |
| -static inline bool frontswap_enabled(void) |
| -{ |
| - return static_branch_unlikely(&frontswap_enabled_key); |
| -} |
| - |
| -static inline void frontswap_map_set(struct swap_info_struct *p, |
| - unsigned long *map) |
| -{ |
| - p->frontswap_map = map; |
| -} |
| - |
| -static inline unsigned long *frontswap_map_get(struct swap_info_struct *p) |
| -{ |
| - return p->frontswap_map; |
| -} |
| -#else |
| -/* all inline routines become no-ops and all externs are ignored */ |
| - |
| -static inline bool frontswap_enabled(void) |
| -{ |
| - return false; |
| -} |
| - |
| -static inline void frontswap_map_set(struct swap_info_struct *p, |
| - unsigned long *map) |
| -{ |
| -} |
| - |
| -static inline unsigned long *frontswap_map_get(struct swap_info_struct *p) |
| -{ |
| - return NULL; |
| -} |
| -#endif |
| - |
| -static inline int frontswap_store(struct page *page) |
| -{ |
| - if (frontswap_enabled()) |
| - return __frontswap_store(page); |
| - |
| - return -1; |
| -} |
| - |
| -static inline int frontswap_load(struct page *page) |
| -{ |
| - if (frontswap_enabled()) |
| - return __frontswap_load(page); |
| - |
| - return -1; |
| -} |
| - |
| -static inline void frontswap_invalidate_page(unsigned type, pgoff_t offset) |
| -{ |
| - if (frontswap_enabled()) |
| - __frontswap_invalidate_page(type, offset); |
| -} |
| - |
| -static inline void frontswap_invalidate_area(unsigned type) |
| -{ |
| - if (frontswap_enabled()) |
| - __frontswap_invalidate_area(type); |
| -} |
| - |
| -#endif /* _LINUX_FRONTSWAP_H */ |
| --- a/include/linux/swapfile.h~mm-kill-frontswap |
| +++ a/include/linux/swapfile.h |
| @@ -2,11 +2,6 @@ |
| #ifndef _LINUX_SWAPFILE_H |
| #define _LINUX_SWAPFILE_H |
| |
| -/* |
| - * these were static in swapfile.c but frontswap.c needs them and we don't |
| - * want to expose them to the dozens of source files that include swap.h |
| - */ |
| -extern struct swap_info_struct *swap_info[]; |
| extern unsigned long generic_max_swapfile_size(void); |
| unsigned long arch_max_swapfile_size(void); |
| |
| --- a/include/linux/swap.h~mm-kill-frontswap |
| +++ a/include/linux/swap.h |
| @@ -302,10 +302,6 @@ struct swap_info_struct { |
| struct file *swap_file; /* seldom referenced */ |
| unsigned int old_block_size; /* seldom referenced */ |
| struct completion comp; /* seldom referenced */ |
| -#ifdef CONFIG_FRONTSWAP |
| - unsigned long *frontswap_map; /* frontswap in-use, one bit per page */ |
| - atomic_t frontswap_pages; /* frontswap pages in-use counter */ |
| -#endif |
| spinlock_t lock; /* |
| * protect map scan related fields like |
| * swap_map, lowest_bit, highest_bit, |
| @@ -630,11 +626,6 @@ static inline int mem_cgroup_swappiness( |
| } |
| #endif |
| |
| -#ifdef CONFIG_ZSWAP |
| -extern u64 zswap_pool_total_size; |
| -extern atomic_t zswap_stored_pages; |
| -#endif |
| - |
| #if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP) |
| void __folio_throttle_swaprate(struct folio *folio, gfp_t gfp); |
| static inline void folio_throttle_swaprate(struct folio *folio, gfp_t gfp) |
| --- /dev/null |
| +++ a/include/linux/zswap.h |
| @@ -0,0 +1,37 @@ |
| +/* SPDX-License-Identifier: GPL-2.0 */ |
| +#ifndef _LINUX_ZSWAP_H |
| +#define _LINUX_ZSWAP_H |
| + |
| +#include <linux/types.h> |
| +#include <linux/mm_types.h> |
| + |
| +extern u64 zswap_pool_total_size; |
| +extern atomic_t zswap_stored_pages; |
| + |
| +#ifdef CONFIG_ZSWAP |
| + |
| +bool zswap_store(struct page *page); |
| +bool zswap_load(struct page *page); |
| +void zswap_invalidate(int type, pgoff_t offset); |
| +void zswap_swapon(int type); |
| +void zswap_swapoff(int type); |
| + |
| +#else |
| + |
| +static inline bool zswap_store(struct page *page) |
| +{ |
| + return false; |
| +} |
| + |
| +static inline bool zswap_load(struct page *page) |
| +{ |
| + return false; |
| +} |
| + |
| +static inline void zswap_invalidate(int type, pgoff_t offset) {} |
| +static inline void zswap_swapon(int type) {} |
| +static inline void zswap_swapoff(int type) {} |
| + |
| +#endif |
| + |
| +#endif /* _LINUX_ZSWAP_H */ |
| --- a/MAINTAINERS~mm-kill-frontswap |
| +++ a/MAINTAINERS |
| @@ -8404,13 +8404,6 @@ F: Documentation/power/freezing-of-tasks |
| F: include/linux/freezer.h |
| F: kernel/freezer.c |
| |
| -FRONTSWAP API |
| -M: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
| -L: linux-kernel@vger.kernel.org |
| -S: Maintained |
| -F: include/linux/frontswap.h |
| -F: mm/frontswap.c |
| - |
| FS-CACHE: LOCAL CACHING FOR NETWORK FILESYSTEMS |
| M: David Howells <dhowells@redhat.com> |
| L: linux-cachefs@redhat.com (moderated for non-subscribers) |
| --- a/mm/frontswap.c |
| +++ /dev/null |
| @@ -1,283 +0,0 @@ |
| -// SPDX-License-Identifier: GPL-2.0-only |
| -/* |
| - * Frontswap frontend |
| - * |
| - * This code provides the generic "frontend" layer to call a matching |
| - * "backend" driver implementation of frontswap. See |
| - * Documentation/mm/frontswap.rst for more information. |
| - * |
| - * Copyright (C) 2009-2012 Oracle Corp. All rights reserved. |
| - * Author: Dan Magenheimer |
| - */ |
| - |
| -#include <linux/mman.h> |
| -#include <linux/swap.h> |
| -#include <linux/swapops.h> |
| -#include <linux/security.h> |
| -#include <linux/module.h> |
| -#include <linux/debugfs.h> |
| -#include <linux/frontswap.h> |
| -#include <linux/swapfile.h> |
| - |
| -DEFINE_STATIC_KEY_FALSE(frontswap_enabled_key); |
| - |
| -/* |
| - * frontswap_ops are added by frontswap_register_ops, and provide the |
| - * frontswap "backend" implementation functions. Multiple implementations |
| - * may be registered, but implementations can never deregister. This |
| - * is a simple singly-linked list of all registered implementations. |
| - */ |
| -static const struct frontswap_ops *frontswap_ops __read_mostly; |
| - |
| -#ifdef CONFIG_DEBUG_FS |
| -/* |
| - * Counters available via /sys/kernel/debug/frontswap (if debugfs is |
| - * properly configured). These are for information only so are not protected |
| - * against increment races. |
| - */ |
| -static u64 frontswap_loads; |
| -static u64 frontswap_succ_stores; |
| -static u64 frontswap_failed_stores; |
| -static u64 frontswap_invalidates; |
| - |
| -static inline void inc_frontswap_loads(void) |
| -{ |
| - data_race(frontswap_loads++); |
| -} |
| -static inline void inc_frontswap_succ_stores(void) |
| -{ |
| - data_race(frontswap_succ_stores++); |
| -} |
| -static inline void inc_frontswap_failed_stores(void) |
| -{ |
| - data_race(frontswap_failed_stores++); |
| -} |
| -static inline void inc_frontswap_invalidates(void) |
| -{ |
| - data_race(frontswap_invalidates++); |
| -} |
| -#else |
| -static inline void inc_frontswap_loads(void) { } |
| -static inline void inc_frontswap_succ_stores(void) { } |
| -static inline void inc_frontswap_failed_stores(void) { } |
| -static inline void inc_frontswap_invalidates(void) { } |
| -#endif |
| - |
| -/* |
| - * Due to the asynchronous nature of the backends loading potentially |
| - * _after_ the swap system has been activated, we have chokepoints |
| - * on all frontswap functions to not call the backend until the backend |
| - * has registered. |
| - * |
| - * This would not guards us against the user deciding to call swapoff right as |
| - * we are calling the backend to initialize (so swapon is in action). |
| - * Fortunately for us, the swapon_mutex has been taken by the callee so we are |
| - * OK. The other scenario where calls to frontswap_store (called via |
| - * swap_writepage) is racing with frontswap_invalidate_area (called via |
| - * swapoff) is again guarded by the swap subsystem. |
| - * |
| - * While no backend is registered all calls to frontswap_[store|load| |
| - * invalidate_area|invalidate_page] are ignored or fail. |
| - * |
| - * The time between the backend being registered and the swap file system |
| - * calling the backend (via the frontswap_* functions) is indeterminate as |
| - * frontswap_ops is not atomic_t (or a value guarded by a spinlock). |
| - * That is OK as we are comfortable missing some of these calls to the newly |
| - * registered backend. |
| - * |
| - * Obviously the opposite (unloading the backend) must be done after all |
| - * the frontswap_[store|load|invalidate_area|invalidate_page] start |
| - * ignoring or failing the requests. However, there is currently no way |
| - * to unload a backend once it is registered. |
| - */ |
| - |
| -/* |
| - * Register operations for frontswap |
| - */ |
| -int frontswap_register_ops(const struct frontswap_ops *ops) |
| -{ |
| - if (frontswap_ops) |
| - return -EINVAL; |
| - |
| - frontswap_ops = ops; |
| - static_branch_inc(&frontswap_enabled_key); |
| - return 0; |
| -} |
| - |
| -/* |
| - * Called when a swap device is swapon'd. |
| - */ |
| -void frontswap_init(unsigned type, unsigned long *map) |
| -{ |
| - struct swap_info_struct *sis = swap_info[type]; |
| - |
| - VM_BUG_ON(sis == NULL); |
| - |
| - /* |
| - * p->frontswap is a bitmap that we MUST have to figure out which page |
| - * has gone in frontswap. Without it there is no point of continuing. |
| - */ |
| - if (WARN_ON(!map)) |
| - return; |
| - /* |
| - * Irregardless of whether the frontswap backend has been loaded |
| - * before this function or it will be later, we _MUST_ have the |
| - * p->frontswap set to something valid to work properly. |
| - */ |
| - frontswap_map_set(sis, map); |
| - |
| - if (!frontswap_enabled()) |
| - return; |
| - frontswap_ops->init(type); |
| -} |
| - |
| -static bool __frontswap_test(struct swap_info_struct *sis, |
| - pgoff_t offset) |
| -{ |
| - if (sis->frontswap_map) |
| - return test_bit(offset, sis->frontswap_map); |
| - return false; |
| -} |
| - |
| -static inline void __frontswap_set(struct swap_info_struct *sis, |
| - pgoff_t offset) |
| -{ |
| - set_bit(offset, sis->frontswap_map); |
| - atomic_inc(&sis->frontswap_pages); |
| -} |
| - |
| -static inline void __frontswap_clear(struct swap_info_struct *sis, |
| - pgoff_t offset) |
| -{ |
| - clear_bit(offset, sis->frontswap_map); |
| - atomic_dec(&sis->frontswap_pages); |
| -} |
| - |
| -/* |
| - * "Store" data from a page to frontswap and associate it with the page's |
| - * swaptype and offset. Page must be locked and in the swap cache. |
| - * If frontswap already contains a page with matching swaptype and |
| - * offset, the frontswap implementation may either overwrite the data and |
| - * return success or invalidate the page from frontswap and return failure. |
| - */ |
| -int __frontswap_store(struct page *page) |
| -{ |
| - int ret = -1; |
| - swp_entry_t entry = { .val = page_private(page), }; |
| - int type = swp_type(entry); |
| - struct swap_info_struct *sis = swap_info[type]; |
| - pgoff_t offset = swp_offset(entry); |
| - |
| - VM_BUG_ON(!frontswap_ops); |
| - VM_BUG_ON(!PageLocked(page)); |
| - VM_BUG_ON(sis == NULL); |
| - |
| - /* |
| - * If a dup, we must remove the old page first; we can't leave the |
| - * old page no matter if the store of the new page succeeds or fails, |
| - * and we can't rely on the new page replacing the old page as we may |
| - * not store to the same implementation that contains the old page. |
| - */ |
| - if (__frontswap_test(sis, offset)) { |
| - __frontswap_clear(sis, offset); |
| - frontswap_ops->invalidate_page(type, offset); |
| - } |
| - |
| - ret = frontswap_ops->store(type, offset, page); |
| - if (ret == 0) { |
| - __frontswap_set(sis, offset); |
| - inc_frontswap_succ_stores(); |
| - } else { |
| - inc_frontswap_failed_stores(); |
| - } |
| - |
| - return ret; |
| -} |
| - |
| -/* |
| - * "Get" data from frontswap associated with swaptype and offset that were |
| - * specified when the data was put to frontswap and use it to fill the |
| - * specified page with data. Page must be locked and in the swap cache. |
| - */ |
| -int __frontswap_load(struct page *page) |
| -{ |
| - int ret = -1; |
| - swp_entry_t entry = { .val = page_private(page), }; |
| - int type = swp_type(entry); |
| - struct swap_info_struct *sis = swap_info[type]; |
| - pgoff_t offset = swp_offset(entry); |
| - bool exclusive = false; |
| - |
| - VM_BUG_ON(!frontswap_ops); |
| - VM_BUG_ON(!PageLocked(page)); |
| - VM_BUG_ON(sis == NULL); |
| - |
| - if (!__frontswap_test(sis, offset)) |
| - return -1; |
| - |
| - /* Try loading from each implementation, until one succeeds. */ |
| - ret = frontswap_ops->load(type, offset, page, &exclusive); |
| - if (ret == 0) { |
| - inc_frontswap_loads(); |
| - if (exclusive) { |
| - SetPageDirty(page); |
| - __frontswap_clear(sis, offset); |
| - } |
| - } |
| - return ret; |
| -} |
| - |
| -/* |
| - * Invalidate any data from frontswap associated with the specified swaptype |
| - * and offset so that a subsequent "get" will fail. |
| - */ |
| -void __frontswap_invalidate_page(unsigned type, pgoff_t offset) |
| -{ |
| - struct swap_info_struct *sis = swap_info[type]; |
| - |
| - VM_BUG_ON(!frontswap_ops); |
| - VM_BUG_ON(sis == NULL); |
| - |
| - if (!__frontswap_test(sis, offset)) |
| - return; |
| - |
| - frontswap_ops->invalidate_page(type, offset); |
| - __frontswap_clear(sis, offset); |
| - inc_frontswap_invalidates(); |
| -} |
| - |
| -/* |
| - * Invalidate all data from frontswap associated with all offsets for the |
| - * specified swaptype. |
| - */ |
| -void __frontswap_invalidate_area(unsigned type) |
| -{ |
| - struct swap_info_struct *sis = swap_info[type]; |
| - |
| - VM_BUG_ON(!frontswap_ops); |
| - VM_BUG_ON(sis == NULL); |
| - |
| - if (sis->frontswap_map == NULL) |
| - return; |
| - |
| - frontswap_ops->invalidate_area(type); |
| - atomic_set(&sis->frontswap_pages, 0); |
| - bitmap_zero(sis->frontswap_map, sis->max); |
| -} |
| - |
| -static int __init init_frontswap(void) |
| -{ |
| -#ifdef CONFIG_DEBUG_FS |
| - struct dentry *root = debugfs_create_dir("frontswap", NULL); |
| - if (root == NULL) |
| - return -ENXIO; |
| - debugfs_create_u64("loads", 0444, root, &frontswap_loads); |
| - debugfs_create_u64("succ_stores", 0444, root, &frontswap_succ_stores); |
| - debugfs_create_u64("failed_stores", 0444, root, |
| - &frontswap_failed_stores); |
| - debugfs_create_u64("invalidates", 0444, root, &frontswap_invalidates); |
| -#endif |
| - return 0; |
| -} |
| - |
| -module_init(init_frontswap); |
| --- a/mm/Kconfig~mm-kill-frontswap |
| +++ a/mm/Kconfig |
| @@ -25,7 +25,6 @@ menuconfig SWAP |
| config ZSWAP |
| bool "Compressed cache for swap pages" |
| depends on SWAP |
| - select FRONTSWAP |
| select CRYPTO |
| select ZPOOL |
| help |
| @@ -873,9 +872,6 @@ config USE_PERCPU_NUMA_NODE_ID |
| config HAVE_SETUP_PER_CPU_AREA |
| bool |
| |
| -config FRONTSWAP |
| - bool |
| - |
| config CMA |
| bool "Contiguous Memory Allocator" |
| depends on MMU |
| --- a/mm/Makefile~mm-kill-frontswap |
| +++ a/mm/Makefile |
| @@ -72,7 +72,6 @@ ifdef CONFIG_MMU |
| endif |
| |
| obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o swap_slots.o |
| -obj-$(CONFIG_FRONTSWAP) += frontswap.o |
| obj-$(CONFIG_ZSWAP) += zswap.o |
| obj-$(CONFIG_HAS_DMA) += dmapool.o |
| obj-$(CONFIG_HUGETLBFS) += hugetlb.o |
| --- a/mm/page_io.c~mm-kill-frontswap |
| +++ a/mm/page_io.c |
| @@ -19,12 +19,12 @@ |
| #include <linux/bio.h> |
| #include <linux/swapops.h> |
| #include <linux/writeback.h> |
| -#include <linux/frontswap.h> |
| #include <linux/blkdev.h> |
| #include <linux/psi.h> |
| #include <linux/uio.h> |
| #include <linux/sched/task.h> |
| #include <linux/delayacct.h> |
| +#include <linux/zswap.h> |
| #include "swap.h" |
| |
| static void __end_swap_bio_write(struct bio *bio) |
| @@ -195,7 +195,7 @@ int swap_writepage(struct page *page, st |
| folio_unlock(folio); |
| return ret; |
| } |
| - if (frontswap_store(&folio->page) == 0) { |
| + if (zswap_store(&folio->page)) { |
| folio_start_writeback(folio); |
| folio_unlock(folio); |
| folio_end_writeback(folio); |
| @@ -512,7 +512,7 @@ void swap_readpage(struct page *page, bo |
| } |
| delayacct_swapin_start(); |
| |
| - if (frontswap_load(page) == 0) { |
| + if (zswap_load(page)) { |
| SetPageUptodate(page); |
| unlock_page(page); |
| } else if (data_race(sis->flags & SWP_FS_OPS)) { |
| --- a/mm/swapfile.c~mm-kill-frontswap |
| +++ a/mm/swapfile.c |
| @@ -35,13 +35,13 @@ |
| #include <linux/memcontrol.h> |
| #include <linux/poll.h> |
| #include <linux/oom.h> |
| -#include <linux/frontswap.h> |
| #include <linux/swapfile.h> |
| #include <linux/export.h> |
| #include <linux/swap_slots.h> |
| #include <linux/sort.h> |
| #include <linux/completion.h> |
| #include <linux/suspend.h> |
| +#include <linux/zswap.h> |
| |
| #include <asm/tlbflush.h> |
| #include <linux/swapops.h> |
| @@ -95,7 +95,7 @@ static PLIST_HEAD(swap_active_head); |
| static struct plist_head *swap_avail_heads; |
| static DEFINE_SPINLOCK(swap_avail_lock); |
| |
| -struct swap_info_struct *swap_info[MAX_SWAPFILES]; |
| +static struct swap_info_struct *swap_info[MAX_SWAPFILES]; |
| |
| static DEFINE_MUTEX(swapon_mutex); |
| |
| @@ -744,7 +744,7 @@ static void swap_range_free(struct swap_ |
| swap_slot_free_notify = NULL; |
| while (offset <= end) { |
| arch_swap_invalidate_page(si->type, offset); |
| - frontswap_invalidate_page(si->type, offset); |
| + zswap_invalidate(si->type, offset); |
| if (swap_slot_free_notify) |
| swap_slot_free_notify(si->bdev, offset); |
| offset++; |
| @@ -2343,11 +2343,10 @@ static void _enable_swap_info(struct swa |
| |
| static void enable_swap_info(struct swap_info_struct *p, int prio, |
| unsigned char *swap_map, |
| - struct swap_cluster_info *cluster_info, |
| - unsigned long *frontswap_map) |
| + struct swap_cluster_info *cluster_info) |
| { |
| - if (IS_ENABLED(CONFIG_FRONTSWAP)) |
| - frontswap_init(p->type, frontswap_map); |
| + zswap_swapon(p->type); |
| + |
| spin_lock(&swap_lock); |
| spin_lock(&p->lock); |
| setup_swap_info(p, prio, swap_map, cluster_info); |
| @@ -2390,7 +2389,6 @@ SYSCALL_DEFINE1(swapoff, const char __us |
| struct swap_info_struct *p = NULL; |
| unsigned char *swap_map; |
| struct swap_cluster_info *cluster_info; |
| - unsigned long *frontswap_map; |
| struct file *swap_file, *victim; |
| struct address_space *mapping; |
| struct inode *inode; |
| @@ -2515,12 +2513,10 @@ SYSCALL_DEFINE1(swapoff, const char __us |
| p->swap_map = NULL; |
| cluster_info = p->cluster_info; |
| p->cluster_info = NULL; |
| - frontswap_map = frontswap_map_get(p); |
| spin_unlock(&p->lock); |
| spin_unlock(&swap_lock); |
| arch_swap_invalidate_area(p->type); |
| - frontswap_invalidate_area(p->type); |
| - frontswap_map_set(p, NULL); |
| + zswap_swapoff(p->type); |
| mutex_unlock(&swapon_mutex); |
| free_percpu(p->percpu_cluster); |
| p->percpu_cluster = NULL; |
| @@ -2528,7 +2524,6 @@ SYSCALL_DEFINE1(swapoff, const char __us |
| p->cluster_next_cpu = NULL; |
| vfree(swap_map); |
| kvfree(cluster_info); |
| - kvfree(frontswap_map); |
| /* Destroy swap account information */ |
| swap_cgroup_swapoff(p->type); |
| exit_swap_address_space(p->type); |
| @@ -2995,7 +2990,6 @@ SYSCALL_DEFINE2(swapon, const char __use |
| unsigned long maxpages; |
| unsigned char *swap_map = NULL; |
| struct swap_cluster_info *cluster_info = NULL; |
| - unsigned long *frontswap_map = NULL; |
| struct page *page = NULL; |
| struct inode *inode = NULL; |
| bool inced_nr_rotate_swap = false; |
| @@ -3135,11 +3129,6 @@ SYSCALL_DEFINE2(swapon, const char __use |
| error = nr_extents; |
| goto bad_swap_unlock_inode; |
| } |
| - /* frontswap enabled? set up bit-per-page map for frontswap */ |
| - if (IS_ENABLED(CONFIG_FRONTSWAP)) |
| - frontswap_map = kvcalloc(BITS_TO_LONGS(maxpages), |
| - sizeof(long), |
| - GFP_KERNEL); |
| |
| if ((swap_flags & SWAP_FLAG_DISCARD) && |
| p->bdev && bdev_max_discard_sectors(p->bdev)) { |
| @@ -3192,16 +3181,15 @@ SYSCALL_DEFINE2(swapon, const char __use |
| if (swap_flags & SWAP_FLAG_PREFER) |
| prio = |
| (swap_flags & SWAP_FLAG_PRIO_MASK) >> SWAP_FLAG_PRIO_SHIFT; |
| - enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map); |
| + enable_swap_info(p, prio, swap_map, cluster_info); |
| |
| - pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s\n", |
| + pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s\n", |
| p->pages<<(PAGE_SHIFT-10), name->name, p->prio, |
| nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10), |
| (p->flags & SWP_SOLIDSTATE) ? "SS" : "", |
| (p->flags & SWP_DISCARDABLE) ? "D" : "", |
| (p->flags & SWP_AREA_DISCARD) ? "s" : "", |
| - (p->flags & SWP_PAGE_DISCARD) ? "c" : "", |
| - (frontswap_map) ? "FS" : ""); |
| + (p->flags & SWP_PAGE_DISCARD) ? "c" : ""); |
| |
| mutex_unlock(&swapon_mutex); |
| atomic_inc(&proc_poll_event); |
| @@ -3231,7 +3219,6 @@ bad_swap: |
| spin_unlock(&swap_lock); |
| vfree(swap_map); |
| kvfree(cluster_info); |
| - kvfree(frontswap_map); |
| if (inced_nr_rotate_swap) |
| atomic_dec(&nr_rotate_swap); |
| if (swap_file) |
| --- a/mm/zswap.c~mm-kill-frontswap |
| +++ a/mm/zswap.c |
| @@ -2,7 +2,7 @@ |
| /* |
| * zswap.c - zswap driver file |
| * |
| - * zswap is a backend for frontswap that takes pages that are in the process |
| + * zswap is a cache that takes pages that are in the process |
| * of being swapped out and attempts to compress and store them in a |
| * RAM-based memory pool. This can result in a significant I/O reduction on |
| * the swap device and, in the case where decompressing from RAM is faster |
| @@ -20,7 +20,6 @@ |
| #include <linux/spinlock.h> |
| #include <linux/types.h> |
| #include <linux/atomic.h> |
| -#include <linux/frontswap.h> |
| #include <linux/rbtree.h> |
| #include <linux/swap.h> |
| #include <linux/crypto.h> |
| @@ -28,7 +27,7 @@ |
| #include <linux/mempool.h> |
| #include <linux/zpool.h> |
| #include <crypto/acompress.h> |
| - |
| +#include <linux/zswap.h> |
| #include <linux/mm_types.h> |
| #include <linux/page-flags.h> |
| #include <linux/swapops.h> |
| @@ -1084,7 +1083,7 @@ static int zswap_get_swap_cache_page(swp |
| * |
| * This can be thought of as a "resumed writeback" of the page |
| * to the swap device. We are basically resuming the same swap |
| - * writeback path that was intercepted with the frontswap_store() |
| + * writeback path that was intercepted with the zswap_store() |
| * in the first place. After the page has been decompressed into |
| * the swap cache, the compressed version stored by zswap can be |
| * freed. |
| @@ -1224,13 +1223,11 @@ static void zswap_fill_page(void *ptr, u |
| memset_l(page, value, PAGE_SIZE / sizeof(unsigned long)); |
| } |
| |
| -/********************************* |
| -* frontswap hooks |
| -**********************************/ |
| -/* attempts to compress and store an single page */ |
| -static int zswap_frontswap_store(unsigned type, pgoff_t offset, |
| - struct page *page) |
| +bool zswap_store(struct page *page) |
| { |
| + swp_entry_t swp = { .val = page_private(page), }; |
| + int type = swp_type(swp); |
| + pgoff_t offset = swp_offset(swp); |
| struct zswap_tree *tree = zswap_trees[type]; |
| struct zswap_entry *entry, *dupentry; |
| struct scatterlist input, output; |
| @@ -1238,23 +1235,22 @@ static int zswap_frontswap_store(unsigne |
| struct obj_cgroup *objcg = NULL; |
| struct zswap_pool *pool; |
| struct zpool *zpool; |
| - int ret; |
| unsigned int dlen = PAGE_SIZE; |
| unsigned long handle, value; |
| char *buf; |
| u8 *src, *dst; |
| gfp_t gfp; |
| + int ret; |
| + |
| + VM_WARN_ON_ONCE(!PageLocked(page)); |
| + VM_WARN_ON_ONCE(!PageSwapCache(page)); |
| |
| /* THP isn't supported */ |
| - if (PageTransHuge(page)) { |
| - ret = -EINVAL; |
| - goto reject; |
| - } |
| + if (PageTransHuge(page)) |
| + return false; |
| |
| - if (!zswap_enabled || !tree) { |
| - ret = -ENODEV; |
| - goto reject; |
| - } |
| + if (!zswap_enabled || !tree) |
| + return false; |
| |
| /* |
| * XXX: zswap reclaim does not work with cgroups yet. Without a |
| @@ -1262,10 +1258,8 @@ static int zswap_frontswap_store(unsigne |
| * local cgroup limits. |
| */ |
| objcg = get_obj_cgroup_from_page(page); |
| - if (objcg && !obj_cgroup_may_zswap(objcg)) { |
| - ret = -ENOMEM; |
| + if (objcg && !obj_cgroup_may_zswap(objcg)) |
| goto reject; |
| - } |
| |
| /* reclaim space if needed */ |
| if (zswap_is_full()) { |
| @@ -1275,10 +1269,9 @@ static int zswap_frontswap_store(unsigne |
| } |
| |
| if (zswap_pool_reached_full) { |
| - if (!zswap_can_accept()) { |
| - ret = -ENOMEM; |
| + if (!zswap_can_accept()) |
| goto shrink; |
| - } else |
| + else |
| zswap_pool_reached_full = false; |
| } |
| |
| @@ -1286,7 +1279,6 @@ static int zswap_frontswap_store(unsigne |
| entry = zswap_entry_cache_alloc(GFP_KERNEL); |
| if (!entry) { |
| zswap_reject_kmemcache_fail++; |
| - ret = -ENOMEM; |
| goto reject; |
| } |
| |
| @@ -1303,17 +1295,13 @@ static int zswap_frontswap_store(unsigne |
| kunmap_atomic(src); |
| } |
| |
| - if (!zswap_non_same_filled_pages_enabled) { |
| - ret = -EINVAL; |
| + if (!zswap_non_same_filled_pages_enabled) |
| goto freepage; |
| - } |
| |
| /* if entry is successfully added, it keeps the reference */ |
| entry->pool = zswap_pool_current_get(); |
| - if (!entry->pool) { |
| - ret = -EINVAL; |
| + if (!entry->pool) |
| goto freepage; |
| - } |
| |
| /* compress */ |
| acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); |
| @@ -1333,19 +1321,17 @@ static int zswap_frontswap_store(unsigne |
| * synchronous in fact. |
| * Theoretically, acomp supports users send multiple acomp requests in one |
| * acomp instance, then get those requests done simultaneously. but in this |
| - * case, frontswap actually does store and load page by page, there is no |
| + * case, zswap actually does store and load page by page, there is no |
| * existing method to send the second page before the first page is done |
| - * in one thread doing frontswap. |
| + * in one thread doing zwap. |
| * but in different threads running on different cpu, we have different |
| * acomp instance, so multiple threads can do (de)compression in parallel. |
| */ |
| ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait); |
| dlen = acomp_ctx->req->dlen; |
| |
| - if (ret) { |
| - ret = -EINVAL; |
| + if (ret) |
| goto put_dstmem; |
| - } |
| |
| /* store */ |
| zpool = zswap_find_zpool(entry); |
| @@ -1381,15 +1367,12 @@ insert_entry: |
| |
| /* map */ |
| spin_lock(&tree->lock); |
| - do { |
| - ret = zswap_rb_insert(&tree->rbroot, entry, &dupentry); |
| - if (ret == -EEXIST) { |
| - zswap_duplicate_entry++; |
| - /* remove from rbtree */ |
| - zswap_rb_erase(&tree->rbroot, dupentry); |
| - zswap_entry_put(tree, dupentry); |
| - } |
| - } while (ret == -EEXIST); |
| + while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { |
| + zswap_duplicate_entry++; |
| + /* remove from rbtree */ |
| + zswap_rb_erase(&tree->rbroot, dupentry); |
| + zswap_entry_put(tree, dupentry); |
| + } |
| if (entry->length) { |
| spin_lock(&entry->pool->lru_lock); |
| list_add(&entry->lru, &entry->pool->lru); |
| @@ -1402,7 +1385,7 @@ insert_entry: |
| zswap_update_total_size(); |
| count_vm_event(ZSWPOUT); |
| |
| - return 0; |
| + return true; |
| |
| put_dstmem: |
| mutex_unlock(acomp_ctx->mutex); |
| @@ -1412,23 +1395,20 @@ freepage: |
| reject: |
| if (objcg) |
| obj_cgroup_put(objcg); |
| - return ret; |
| + return false; |
| |
| shrink: |
| pool = zswap_pool_last_get(); |
| if (pool) |
| queue_work(shrink_wq, &pool->shrink_work); |
| - ret = -ENOMEM; |
| goto reject; |
| } |
| |
| -/* |
| - * returns 0 if the page was successfully decompressed |
| - * return -1 on entry not found or error |
| -*/ |
| -static int zswap_frontswap_load(unsigned type, pgoff_t offset, |
| - struct page *page, bool *exclusive) |
| +bool zswap_load(struct page *page) |
| { |
| + swp_entry_t swp = { .val = page_private(page), }; |
| + int type = swp_type(swp); |
| + pgoff_t offset = swp_offset(swp); |
| struct zswap_tree *tree = zswap_trees[type]; |
| struct zswap_entry *entry; |
| struct scatterlist input, output; |
| @@ -1436,15 +1416,16 @@ static int zswap_frontswap_load(unsigned |
| u8 *src, *dst, *tmp; |
| struct zpool *zpool; |
| unsigned int dlen; |
| - int ret; |
| + bool ret; |
| + |
| + VM_WARN_ON_ONCE(!PageLocked(page)); |
| |
| /* find */ |
| spin_lock(&tree->lock); |
| entry = zswap_entry_find_get(&tree->rbroot, offset); |
| if (!entry) { |
| - /* entry was written back */ |
| spin_unlock(&tree->lock); |
| - return -1; |
| + return false; |
| } |
| spin_unlock(&tree->lock); |
| |
| @@ -1452,7 +1433,7 @@ static int zswap_frontswap_load(unsigned |
| dst = kmap_atomic(page); |
| zswap_fill_page(dst, entry->value); |
| kunmap_atomic(dst); |
| - ret = 0; |
| + ret = true; |
| goto stats; |
| } |
| |
| @@ -1460,7 +1441,7 @@ static int zswap_frontswap_load(unsigned |
| if (!zpool_can_sleep_mapped(zpool)) { |
| tmp = kmalloc(entry->length, GFP_KERNEL); |
| if (!tmp) { |
| - ret = -ENOMEM; |
| + ret = false; |
| goto freeentry; |
| } |
| } |
| @@ -1481,7 +1462,8 @@ static int zswap_frontswap_load(unsigned |
| sg_init_table(&output, 1); |
| sg_set_page(&output, page, PAGE_SIZE, 0); |
| acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen); |
| - ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait); |
| + if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait)) |
| + WARN_ON(1); |
| mutex_unlock(acomp_ctx->mutex); |
| |
| if (zpool_can_sleep_mapped(zpool)) |
| @@ -1489,16 +1471,16 @@ static int zswap_frontswap_load(unsigned |
| else |
| kfree(tmp); |
| |
| - BUG_ON(ret); |
| + ret = true; |
| stats: |
| count_vm_event(ZSWPIN); |
| if (entry->objcg) |
| count_objcg_event(entry->objcg, ZSWPIN); |
| freeentry: |
| spin_lock(&tree->lock); |
| - if (!ret && zswap_exclusive_loads_enabled) { |
| + if (ret && zswap_exclusive_loads_enabled) { |
| zswap_invalidate_entry(tree, entry); |
| - *exclusive = true; |
| + SetPageDirty(page); |
| } else if (entry->length) { |
| spin_lock(&entry->pool->lru_lock); |
| list_move(&entry->lru, &entry->pool->lru); |
| @@ -1510,8 +1492,7 @@ freeentry: |
| return ret; |
| } |
| |
| -/* frees an entry in zswap */ |
| -static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset) |
| +void zswap_invalidate(int type, pgoff_t offset) |
| { |
| struct zswap_tree *tree = zswap_trees[type]; |
| struct zswap_entry *entry; |
| @@ -1528,8 +1509,22 @@ static void zswap_frontswap_invalidate_p |
| spin_unlock(&tree->lock); |
| } |
| |
| -/* frees all zswap entries for the given swap type */ |
| -static void zswap_frontswap_invalidate_area(unsigned type) |
| +void zswap_swapon(int type) |
| +{ |
| + struct zswap_tree *tree; |
| + |
| + tree = kzalloc(sizeof(*tree), GFP_KERNEL); |
| + if (!tree) { |
| + pr_err("alloc failed, zswap disabled for swap type %d\n", type); |
| + return; |
| + } |
| + |
| + tree->rbroot = RB_ROOT; |
| + spin_lock_init(&tree->lock); |
| + zswap_trees[type] = tree; |
| +} |
| + |
| +void zswap_swapoff(int type) |
| { |
| struct zswap_tree *tree = zswap_trees[type]; |
| struct zswap_entry *entry, *n; |
| @@ -1547,29 +1542,6 @@ static void zswap_frontswap_invalidate_a |
| zswap_trees[type] = NULL; |
| } |
| |
| -static void zswap_frontswap_init(unsigned type) |
| -{ |
| - struct zswap_tree *tree; |
| - |
| - tree = kzalloc(sizeof(*tree), GFP_KERNEL); |
| - if (!tree) { |
| - pr_err("alloc failed, zswap disabled for swap type %d\n", type); |
| - return; |
| - } |
| - |
| - tree->rbroot = RB_ROOT; |
| - spin_lock_init(&tree->lock); |
| - zswap_trees[type] = tree; |
| -} |
| - |
| -static const struct frontswap_ops zswap_frontswap_ops = { |
| - .store = zswap_frontswap_store, |
| - .load = zswap_frontswap_load, |
| - .invalidate_page = zswap_frontswap_invalidate_page, |
| - .invalidate_area = zswap_frontswap_invalidate_area, |
| - .init = zswap_frontswap_init |
| -}; |
| - |
| /********************************* |
| * debugfs functions |
| **********************************/ |
| @@ -1658,16 +1630,11 @@ static int zswap_setup(void) |
| if (!shrink_wq) |
| goto fallback_fail; |
| |
| - ret = frontswap_register_ops(&zswap_frontswap_ops); |
| - if (ret) |
| - goto destroy_wq; |
| if (zswap_debugfs_init()) |
| pr_warn("debugfs initialization failed\n"); |
| zswap_init_state = ZSWAP_INIT_SUCCEED; |
| return 0; |
| |
| -destroy_wq: |
| - destroy_workqueue(shrink_wq); |
| fallback_fail: |
| if (pool) |
| zswap_pool_destroy(pool); |
| _ |