blob: 9a8fd611d13963f507862d8118a152bc54c04b5e [file] [log] [blame]
From kernel/suspend.c:
* BIG FAT WARNING *********************************************************
*
* If you have unsupported (*) devices using DMA...
* ...say goodbye to your data.
*
* If you touch anything on disk between suspend and resume...
* ...kiss your data goodbye.
*
* If your disk driver does not support suspend... (IDE does)
* ...you'd better find out how to get along
* without your data.
*
* (*) pm interface support is needed to make it safe.
You need to append resume=/dev/your_swap_partition to kernel command
line. Then you suspend by echo 4 > /proc/acpi/sleep.
[Notice. Rest docs is pretty outdated (see date!) It should be safe to
use swsusp on ext3/reiserfs these days.]
Article about goals and implementation of Software Suspend for Linux
Author: G‚ábor Kuti
Last revised: 2002-04-08
Idea and goals to achieve
Nowadays it is common in several laptops that they have a suspend button. It
saves the state of the machine to a filesystem or to a partition and switches
to standby mode. Later resuming the machine the saved state is loaded back to
ram and the machine can continue its work. It has two real benefits. First we
save ourselves the time machine goes down and later boots up, energy costs
real high when running from batteries. The other gain is that we don't have to
interrupt our programs so processes that are calculating something for a long
time shouldn't need to be written interruptible.
On desk machines the power saving function isn't as important as it is in
laptops but we really may benefit from the second one. Nowadays the number of
desk machines supporting suspend function in their APM is going up but there
are (and there will still be for a long time) machines that don't even support
APM of any kind. On the other hand it is reported that using APM's suspend
some irqs (e.g. ATA disk irq) is lost and it is annoying for the user until
the Linux kernel resets the device.
So I started thinking about implementing Software Suspend which doesn't need
any APM support and - since it uses pretty near only high-level routines - is
supposed to be architecture independent code.
Using the code
The code is experimental right now - testers, extra eyes are welcome. To
compile this support into the kernel, you need CONFIG_EXPERIMENTAL,
and then CONFIG_SOFTWARE_SUSPEND in menu General Setup to be enabled. It
cannot be used as a module and I don't think it will ever be needed.
You have two ways to use this code. The first one is if you've compiled in
sysrq support then you may press Sysrq-D to request suspend. The other way
is with a patched SysVinit (my patch is against 2.76 and available at my
home page). You might call 'swsusp' or 'shutdown -z <time>'. Next way is to
echo 4 > /proc/acpi/sleep.
Either way it saves the state of the machine into active swaps and then
reboots. You must explicitly specify the swap partition to resume from with ``resume=''
kernel option. If signature is found it loads and restores saved state. If the
option ``noresume'' is specified as a boot parameter, it skips the resuming.
Warning! Look at section ``Things to implement'' to see what isn't yet
implemented. Also I strongly suggest you to list all active swaps in
/etc/fstab. Firstly because you don't have to specify anything to resume and
secondly if you have more than one swap area you can't decide which one has the
'root' signature.
In the meantime while the system is suspended you should not touch any of the
hardware!
About the code
Goals reached
The code can be downloaded from
http://falcon.sch.bme.hu/~seasons/linux/. It mainly works but there are still
some of XXXs, TODOs, FIXMEs in the code which seem not to be too important. It
should work all right except for the problems listed in ``Things to
implement''. Notes about the code are really welcome.
How the code works
When suspending is triggered it immediately wakes up process bdflush. Bdflush
checks whether we have anything in our run queue tq_bdflush. Since we queued up
function do_software_suspend, it is called. Here we shrink everything including
dcache, inodes, buffers and memory (here mainly processes are swapped out). We
count how many pages we need to duplicate (we have to be atomical!) then we
create an appropiate sized page directory. It will point to the original and
the new (copied) address of the page. We get the free pages by
__get_free_pages() but since it changes state we have to be able to track it
later so it also flips in a bit in page's flags (a new Nosave flag). We
duplicate pages and then mark them as used (so atomicity is ensured). After
this we write out the image to swaps, do another sync and the machine may
reboot. We also save registers to stack.
By resuming an ``inverse'' method is executed. The image if exists is loaded,
loadling is either triggered by ``resume='' kernel option. We
change our task to bdflush (it is needed because if we don't do this init does
an oops when it is waken up later) and then pages are copied back to their
original location. We restore registers, free previously allocated memory,
activate memory context and task information. Here we should restore hardware
state but even without this the machine is restored and processes are continued
to work. I think hardware state should be restored by some list (using
notify_chain) and probably by some userland program (run-parts?) for users'
pleasure. Check out my patch at the same location for the sysvinit patch.
WARNINGS!
- It does not like pcmcia cards. And this is logical: pcmcia cards need cardmgr to be
initialized. they are not initialized during singleuser boot, but "resumed" kernel does
expect them to be initialized. That leads to armagedon. You should eject any pcmcia cards
before suspending.
Things to implement
- SMP support. I've done an SMP support but since I don't have access to a kind
of this one I cannot test it. Please SMP people test it. .. Tested it,
doesn't work. Had no time to figure out why. There is some mess with
interrupts AFAIK..
- We should only make a copy of data related to kernel segment, since any
process data won't be changed.
- Hardware state restoring. Now there's support for notifying via the notify
chain, event handlers are welcome. Some devices may have microcodes loaded
into them. We should have event handlers for them as well.
- We should support other architectures (There are really only some arch
related functions..)
- We should also restore original state of swaps if the ``noresume'' kernel
option is specified.. Or do we need such a feature to save state for some
other time? Do we need some kind of ``several saved states''? (Linux-HA
people?). There's been some discussion about checkpointing on linux-future.
- Should make more sanity checks. Or are these enough?
Not so important ideas for implementing
- If a real time process is running then don't suspend the machine.
- Support for power.conf file as in Solaris, autoshutdown, special
devicetypes support, maybe in sysctl.
- Introduce timeout for SMP locking. But first locking ought to work :O
- Pre-detect if we don't have enough swap space or free it instead of
calling panic.
- Support for adding/removing hardware while suspended?
- We should not free pages at the beginning so aggressively, most of them
go there anyway..
- If X is active while suspending then by resuming calling svgatextmode
corrupts the virtual console of X.. (Maybe this has been fixed AFAIK).
Drivers we support
- IDE disks are okay
- vesafb
Drivers that need support
- pc_keyb -- perhaps we can wait for vojtech's input patches
- do IDE cdroms need some kind of support?
- IDE CD-RW -- how to deal with that?
FAQ:
Q: well, suspending a server is IMHO a really stupid thing,
but... (Diego Zuccato):
A: You bought new UPS for your server. How do you install it without
bringing machine down? Suspend to disk, rearrange power cables,
resume.
You have your server on UPS. Power died, and UPS is indicating 30
seconds to failure. What do you do? Suspend to disk.
Ethernet card in your server died. You want to replace it. Your
server is not hotplug capable. What do you do? Suspend to disk,
replace ethernet card, resume. If you are fast your users will not
even see broken connections.
Any other idea you might have tell me!
Contacting the author
If you have any question or any patch that solves the above or detected
problems please contact me at seasons@falcon.sch.bme.hu. I might delay
answering, sorry about that.