kernel/FPU-emu/README - pub/scm/linux/kernel/git/nico/archive - Git at Google

  +---------------------------------------------------------------------------+
  |  wm-FPU-emu   an FPU emulator for 80386 and 80486SX microprocessors.      |
  |                                                                           |
  | Copyright (C) 1992,1993                                                   |
  |                       W. Metzenthen, 22 Parker St, Ormond, Vic 3163,      |
  |                       Australia.  E-mail apm233m@vaxc.cc.monash.edu.au    |
  |                                                                           |
  |    This program is free software; you can redistribute it and/or modify   |
  |    it under the terms of the GNU General Public License version 2 as      |
  |    published by the Free Software Foundation.                             |
  |                                                                           |
  |    This program is distributed in the hope that it will be useful,        |
  |    but WITHOUT ANY WARRANTY; without even the implied warranty of         |
  |    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the          |
  |    GNU General Public License for more details.                           |
  |                                                                           |
  |    You should have received a copy of the GNU General Public License      |
  |    along with this program; if not, write to the Free Software            |
  |    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.              |
  |                                                                           |
  +---------------------------------------------------------------------------+


 wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
 which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was
 in turn based upon emu387 which was written by DJ Delorie for djgpp.
 The interface to the Linux kernel is based upon the original Linux
 math emulator by Linus Torvalds.

 My target FPU for wm-FPU-emu is that described in the Intel486
 Programmer's Reference Manual (1992 edition). Numerous facets of the
 functioning of the FPU are not well covered in the Reference Manual;
 in the absence of clear details I have made guesses about the most
 reasonable behaviour. Recently, this situation has improved because
 I now have some access to the results produced by a real 80486 FPU.

 wm-FPU-emu does not implement all of the behaviour of the 80486 FPU.
 See "Limitations" later in this file for a partial list of some
 differences.  I believe that the missing features are never used by
 normal C or FORTRAN programs.


 Please report bugs, etc to me at:
        apm233m@vaxc.cc.monash.edu.au


 --Bill Metzenthen
   May 1993


 ----------------------- Internals of wm-FPU-emu -----------------------

 Numeric algorithms:
 (1) Add, subtract, and multiply. Nothing remarkable in these.
 (2) Divide has been tuned to get reasonable performance. The algorithm
     is not the obvious one which most people seem to use, but is designed
     to take advantage of the characteristics of the 80386. I expect that
     it has been invented many times before I discovered it, but I have not
     seen it. It is based upon one of those ideas which one carries around
     for years without ever bothering to check it out.
 (3) The sqrt function has been tuned to get good performance. It is based
     upon Newton's classic method. Performance was improved by capitalizing
     upon the properties of Newton's method, and the code is once again
     structured taking account of the 80386 characteristics.
 (4) The trig, log, and exp functions are based in each case upon quasi-
     "optimal" polynomial approximations. My definition of "optimal" was
     based upon getting good accuracy with reasonable speed.

 The code of the emulator is complicated slightly by the need to
 account for a limited form of re-entrancy. Normally, the emulator will
 emulate each FPU instruction to completion without interruption.
 However, it may happen that when the emulator is accessing the user
 memory space, swapping may be needed. In this case the emulator may be
 temporarily suspended while disk i/o takes place. During this time
 another process may use the emulator, thereby changing some static
 variables (eg FPU_st0_ptr, etc). The code which accesses user memory
 is confined to five files:
     fpu_entry.c
     reg_ld_str.c
     load_store.c
     get_address.c
     errors.c

 ----------------------- Limitations of wm-FPU-emu -----------------------

 There are a number of differences between the current wm-FPU-emu
 (version beta 1.4) and the 80486 FPU (apart from bugs). Some of the
 more important differences are listed below:

 All internal computations are performed at 64 bit or higher precision
 and rounded etc as required by the PC bits of the FPU control word.
 Under the crt0 version for Linux current at March 1993, the FPU PC
 bits specify 53 bits precision.

 The precision flag (PE of the FPU status word) and the Roundup flag
 (C1 of the status word) are now partially implemented. Does anyone
 write code which uses these features?

 The functions which load/store the FPU state are partially implemented,
 but the implementation should be sufficient for handling FPU errors etc
 in 32 bit protected mode.

 The implementation of the exception mechanism is flawed for unmasked
 interrupts.

 Detection of certain conditions, such as denormal operands, is not yet
 complete.

 ----------------------- Performance of wm-FPU-emu -----------------------

 Speed.
 -----

 The speed of floating point computation with the emulator will depend
 upon instruction mix. Relative performance is best for the instructions
 which require most computation. The simple instructions are adversely
 affected by the fpu instruction trap overhead.


 Timing: Some simple timing tests have been made on the emulator functions.
 The times include load/store instructions. All times are in microseconds
 measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
 ms-dos, the next two columns are for emulators running with the djgpp
 ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
 using libm4.0 (hard).

 function      Turbo C        djgpp 1.06        WM-emu387     wm-FPU-emu

    +          60.5           154.8              76.5          139.4
    -          61.1-65.5      157.3-160.8        76.2-79.5     142.9-144.7
    *          71.0           190.8              79.6          146.6
    /          61.2-75.0      261.4-266.9        75.3-91.6     142.2-158.1

  sin()        310.8          4692.0            319.0          398.5
  cos()        284.4          4855.2            308.0          388.7
  tan()        495.0          8807.1            394.9          504.7
  atan()       328.9          4866.4            601.1          419.5-491.9

  sqrt()       128.7          crashed           145.2          227.0
  log()        413.1-419.1    5103.4-5354.21    254.7-282.2    409.4-437.1
  exp()        479.1          6619.2            469.1          850.8


 The performance under Linux is improved by the use of look-ahead code.
 The following results show the improvement which is obtained under
 Linux due to the look-ahead code. Also given are the times for the
 original Linux emulator with the 4.1 'soft' lib.

  [ Linus' note: I changed look-ahead to be the default under linux, as
    there was no reason not to use it after I had edited it to be
    disabled during tracing ]

             wm-FPU-emu w     original w
             look-ahead       'soft' lib
    +         106.4             190.2
    -         108.6-111.6      192.4-216.2
    *         113.4             193.1
    /         108.8-124.4      700.1-706.2

  sin()       390.5            2642.0
  cos()       381.5            2767.4
  tan()       496.5            3153.3
  atan()      367.2-435.5     2439.4-3396.8

  sqrt()      195.1            4732.5
  log()       358.0-387.5     3359.2-3390.3
  exp()       619.3            4046.4


 These figures are now somewhat out-of-date. The emulator has become
 progressively slower for most functions as more of the 80486 features
 have been implemented.


 ----------------------- Accuracy of wm-FPU-emu -----------------------


 Accuracy: The following table gives the accuracy of the sqrt(), trig
 and log functions. Each function was tested at about 400 points. Ideal
 results would be 64 bits. The reduced accuracy of cos() and tan() for
 arguments greater than pi/4 can be thought of as being due to the
 precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
 accurate to 64 bits can result in a relative accuracy in cos() of about
 64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given
 in the last column.


 Function      Tested x range            Worst result (bits)         Turbo C

 sqrt(x)       1 .. 2                    64.1                         63.2
 atan(x)       1e-10 .. 200              62.6                         62.8
 cos(x)        0 .. pi/2-(1e-10)         63.2 (x <= pi/4)             62.4
                                         35.2 (x = pi/2-(1e-10))      31.9
 sin(x)        1e-10 .. pi/2             63.0                         62.8
 tan(x)        1e-10 .. pi/2-(1e-10)     62.4 (x <= pi/4)             62.1
                                         35.2 (x = pi/2-(1e-10))      31.9
 exp(x)        0 .. 1                    63.1                         62.9
 log(x)        1+1e-6 .. 2               62.4                         62.1


 As of version 1.3 of the emulator, the accuracy of the basic
 arithmetic has been improved (by a small fraction of a bit). Care has
 been taken to ensure full accuracy of the rounding of the basic
 arithmetic functions (+,-,*,/,and fsqrt), and they all now produce
 results which are exact to the 64th bit (unless there are any bugs
 left). To ensure this, it was necessary to effectively get information
 of up to about 128 bits precision. The emulator now passes the
 "paranoia" tests (compiled with gcc 2.3.3) for 'float' variables (24
 bit precision numbers) when precision control is set to 24, 53 or 64
 bits, and for 'double' variables (53 bit precision numbers) when
 precision control is set to 53 bits (a properly performing FPU cannot
 pass the 'paranoia' tests for 'double' variables when precision
 control is set to 64 bits).

 ------------------------- Contributors -------------------------------

 A number of people have contributed to the development of the
 emulator, often by just reporting bugs, sometimes with a suggested
 fix, and a few kind people have provided me with access in one way or
 another to an 80486 machine. Contributors include (to those people who
 I have forgotten, please excuse me):

 Linus Torvalds
 Tommy.Thorn@daimi.aau.dk
 Andrew.Tridgell@anu.edu.au
 Nick Holloway alfie@dcs.warwick.ac.uk
 Hermano Moura moura@dcs.gla.ac.uk
 Jon Jagger J.Jagger@scp.ac.uk
 Lennart Benschop
 Brian Gallew geek+@CMU.EDU
 Thomas Staniszewski ts3v+@andrew.cmu.edu
 Martin Howell mph@plasma.apana.org.au
 M Saggaf alsaggaf@athena.mit.edu
 Peter Barker PETER@socpsy.sci.fau.edu
 tom@vlsivie.tuwien.ac.at
 Dan Russel russed@rpi.edu
 Daniel Carosone danielce@ee.mu.oz.au
 cae@jpmorgan.com
 Hamish Coleman t933093@minyos.xx.rmit.oz.au

 ...and numerous others who responded to my request for help with
 a real 80486.
	+---------------------------------------------------------------------------+
	\| wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. \|
	\| \|
	\| Copyright (C) 1992,1993 \|
	\| W. Metzenthen, 22 Parker St, Ormond, Vic 3163, \|
	\| Australia. E-mail apm233m@vaxc.cc.monash.edu.au \|
	\| \|
	\| This program is free software; you can redistribute it and/or modify \|
	\| it under the terms of the GNU General Public License version 2 as \|
	\| published by the Free Software Foundation. \|
	\| \|
	\| This program is distributed in the hope that it will be useful, \|
	\| but WITHOUT ANY WARRANTY; without even the implied warranty of \|
	\| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the \|
	\| GNU General Public License for more details. \|
	\| \|
	\| You should have received a copy of the GNU General Public License \|
	\| along with this program; if not, write to the Free Software \|
	\| Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. \|
	\| \|
	+---------------------------------------------------------------------------+



	wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
	which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was
	in turn based upon emu387 which was written by DJ Delorie for djgpp.
	The interface to the Linux kernel is based upon the original Linux
	math emulator by Linus Torvalds.

	My target FPU for wm-FPU-emu is that described in the Intel486
	Programmer's Reference Manual (1992 edition). Numerous facets of the
	functioning of the FPU are not well covered in the Reference Manual;
	in the absence of clear details I have made guesses about the most
	reasonable behaviour. Recently, this situation has improved because
	I now have some access to the results produced by a real 80486 FPU.

	wm-FPU-emu does not implement all of the behaviour of the 80486 FPU.
	See "Limitations" later in this file for a partial list of some
	differences. I believe that the missing features are never used by
	normal C or FORTRAN programs.


	Please report bugs, etc to me at:
	apm233m@vaxc.cc.monash.edu.au


	--Bill Metzenthen
	May 1993


	----------------------- Internals of wm-FPU-emu -----------------------

	Numeric algorithms:
	(1) Add, subtract, and multiply. Nothing remarkable in these.
	(2) Divide has been tuned to get reasonable performance. The algorithm
	is not the obvious one which most people seem to use, but is designed
	to take advantage of the characteristics of the 80386. I expect that
	it has been invented many times before I discovered it, but I have not
	seen it. It is based upon one of those ideas which one carries around
	for years without ever bothering to check it out.
	(3) The sqrt function has been tuned to get good performance. It is based
	upon Newton's classic method. Performance was improved by capitalizing
	upon the properties of Newton's method, and the code is once again
	structured taking account of the 80386 characteristics.
	(4) The trig, log, and exp functions are based in each case upon quasi-
	"optimal" polynomial approximations. My definition of "optimal" was
	based upon getting good accuracy with reasonable speed.

	The code of the emulator is complicated slightly by the need to
	account for a limited form of re-entrancy. Normally, the emulator will
	emulate each FPU instruction to completion without interruption.
	However, it may happen that when the emulator is accessing the user
	memory space, swapping may be needed. In this case the emulator may be
	temporarily suspended while disk i/o takes place. During this time
	another process may use the emulator, thereby changing some static
	variables (eg FPU_st0_ptr, etc). The code which accesses user memory
	is confined to five files:
	fpu_entry.c
	reg_ld_str.c
	load_store.c
	get_address.c
	errors.c

	----------------------- Limitations of wm-FPU-emu -----------------------

	There are a number of differences between the current wm-FPU-emu
	(version beta 1.4) and the 80486 FPU (apart from bugs). Some of the
	more important differences are listed below:

	All internal computations are performed at 64 bit or higher precision
	and rounded etc as required by the PC bits of the FPU control word.
	Under the crt0 version for Linux current at March 1993, the FPU PC
	bits specify 53 bits precision.

	The precision flag (PE of the FPU status word) and the Roundup flag
	(C1 of the status word) are now partially implemented. Does anyone
	write code which uses these features?

	The functions which load/store the FPU state are partially implemented,
	but the implementation should be sufficient for handling FPU errors etc
	in 32 bit protected mode.

	The implementation of the exception mechanism is flawed for unmasked
	interrupts.

	Detection of certain conditions, such as denormal operands, is not yet
	complete.

	----------------------- Performance of wm-FPU-emu -----------------------

	Speed.
	-----

	The speed of floating point computation with the emulator will depend
	upon instruction mix. Relative performance is best for the instructions
	which require most computation. The simple instructions are adversely
	affected by the fpu instruction trap overhead.


	Timing: Some simple timing tests have been made on the emulator functions.
	The times include load/store instructions. All times are in microseconds
	measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
	ms-dos, the next two columns are for emulators running with the djgpp
	ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
	using libm4.0 (hard).

	function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu

	+ 60.5 154.8 76.5 139.4
	- 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7
	* 71.0 190.8 79.6 146.6
	/ 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1

	sin() 310.8 4692.0 319.0 398.5
	cos() 284.4 4855.2 308.0 388.7
	tan() 495.0 8807.1 394.9 504.7
	atan() 328.9 4866.4 601.1 419.5-491.9

	sqrt() 128.7 crashed 145.2 227.0
	log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1
	exp() 479.1 6619.2 469.1 850.8


	The performance under Linux is improved by the use of look-ahead code.
	The following results show the improvement which is obtained under
	Linux due to the look-ahead code. Also given are the times for the
	original Linux emulator with the 4.1 'soft' lib.

	[ Linus' note: I changed look-ahead to be the default under linux, as
	there was no reason not to use it after I had edited it to be
	disabled during tracing ]

	wm-FPU-emu w original w
	look-ahead 'soft' lib
	+ 106.4 190.2
	- 108.6-111.6 192.4-216.2
	* 113.4 193.1
	/ 108.8-124.4 700.1-706.2

	sin() 390.5 2642.0
	cos() 381.5 2767.4
	tan() 496.5 3153.3
	atan() 367.2-435.5 2439.4-3396.8

	sqrt() 195.1 4732.5
	log() 358.0-387.5 3359.2-3390.3
	exp() 619.3 4046.4


	These figures are now somewhat out-of-date. The emulator has become
	progressively slower for most functions as more of the 80486 features
	have been implemented.


	----------------------- Accuracy of wm-FPU-emu -----------------------


	Accuracy: The following table gives the accuracy of the sqrt(), trig
	and log functions. Each function was tested at about 400 points. Ideal
	results would be 64 bits. The reduced accuracy of cos() and tan() for
	arguments greater than pi/4 can be thought of as being due to the
	precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
	accurate to 64 bits can result in a relative accuracy in cos() of about
	64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given
	in the last column.


	Function Tested x range Worst result (bits) Turbo C

	sqrt(x) 1 .. 2 64.1 63.2
	atan(x) 1e-10 .. 200 62.6 62.8
	cos(x) 0 .. pi/2-(1e-10) 63.2 (x <= pi/4) 62.4
	35.2 (x = pi/2-(1e-10)) 31.9
	sin(x) 1e-10 .. pi/2 63.0 62.8
	tan(x) 1e-10 .. pi/2-(1e-10) 62.4 (x <= pi/4) 62.1
	35.2 (x = pi/2-(1e-10)) 31.9
	exp(x) 0 .. 1 63.1 62.9
	log(x) 1+1e-6 .. 2 62.4 62.1


	As of version 1.3 of the emulator, the accuracy of the basic
	arithmetic has been improved (by a small fraction of a bit). Care has
	been taken to ensure full accuracy of the rounding of the basic
	arithmetic functions (+,-,*,/,and fsqrt), and they all now produce
	results which are exact to the 64th bit (unless there are any bugs
	left). To ensure this, it was necessary to effectively get information
	of up to about 128 bits precision. The emulator now passes the
	"paranoia" tests (compiled with gcc 2.3.3) for 'float' variables (24
	bit precision numbers) when precision control is set to 24, 53 or 64
	bits, and for 'double' variables (53 bit precision numbers) when
	precision control is set to 53 bits (a properly performing FPU cannot
	pass the 'paranoia' tests for 'double' variables when precision
	control is set to 64 bits).

	------------------------- Contributors -------------------------------

	A number of people have contributed to the development of the
	emulator, often by just reporting bugs, sometimes with a suggested
	fix, and a few kind people have provided me with access in one way or
	another to an 80486 machine. Contributors include (to those people who
	I have forgotten, please excuse me):

	Linus Torvalds
	Tommy.Thorn@daimi.aau.dk
	Andrew.Tridgell@anu.edu.au
	Nick Holloway alfie@dcs.warwick.ac.uk
	Hermano Moura moura@dcs.gla.ac.uk
	Jon Jagger J.Jagger@scp.ac.uk
	Lennart Benschop
	Brian Gallew geek+@CMU.EDU
	Thomas Staniszewski ts3v+@andrew.cmu.edu
	Martin Howell mph@plasma.apana.org.au
	M Saggaf alsaggaf@athena.mit.edu
	Peter Barker PETER@socpsy.sci.fau.edu
	tom@vlsivie.tuwien.ac.at
	Dan Russel russed@rpi.edu
	Daniel Carosone danielce@ee.mu.oz.au
	cae@jpmorgan.com
	Hamish Coleman t933093@minyos.xx.rmit.oz.au

	...and numerous others who responded to my request for help with
	a real 80486.