| +---------------------------------------------------------------------------+ |
| | wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. | |
| | | |
| | Copyright (C) 1992,1993 | |
| | W. Metzenthen, 22 Parker St, Ormond, Vic 3163, | |
| | Australia. E-mail apm233m@vaxc.cc.monash.edu.au | |
| | | |
| | This program is free software; you can redistribute it and/or modify | |
| | it under the terms of the GNU General Public License version 2 as | |
| | published by the Free Software Foundation. | |
| | | |
| | This program is distributed in the hope that it will be useful, | |
| | but WITHOUT ANY WARRANTY; without even the implied warranty of | |
| | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
| | GNU General Public License for more details. | |
| | | |
| | You should have received a copy of the GNU General Public License | |
| | along with this program; if not, write to the Free Software | |
| | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | |
| | | |
| +---------------------------------------------------------------------------+ |
| |
| |
| |
| wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387 |
| which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was |
| in turn based upon emu387 which was written by DJ Delorie for djgpp. |
| The interface to the Linux kernel is based upon the original Linux |
| math emulator by Linus Torvalds. |
| |
| My target FPU for wm-FPU-emu is that described in the Intel486 |
| Programmer's Reference Manual (1992 edition). Numerous facets of the |
| functioning of the FPU are not well covered in the Reference Manual; |
| in the absence of clear details I have made guesses about the most |
| reasonable behaviour. Recently, this situation has improved because |
| I now have some access to the results produced by a real 80486 FPU. |
| |
| wm-FPU-emu does not implement all of the behaviour of the 80486 FPU. |
| See "Limitations" later in this file for a partial list of some |
| differences. I believe that the missing features are never used by |
| normal C or FORTRAN programs. |
| |
| |
| Please report bugs, etc to me at: |
| apm233m@vaxc.cc.monash.edu.au |
| |
| |
| --Bill Metzenthen |
| May 1993 |
| |
| |
| ----------------------- Internals of wm-FPU-emu ----------------------- |
| |
| Numeric algorithms: |
| (1) Add, subtract, and multiply. Nothing remarkable in these. |
| (2) Divide has been tuned to get reasonable performance. The algorithm |
| is not the obvious one which most people seem to use, but is designed |
| to take advantage of the characteristics of the 80386. I expect that |
| it has been invented many times before I discovered it, but I have not |
| seen it. It is based upon one of those ideas which one carries around |
| for years without ever bothering to check it out. |
| (3) The sqrt function has been tuned to get good performance. It is based |
| upon Newton's classic method. Performance was improved by capitalizing |
| upon the properties of Newton's method, and the code is once again |
| structured taking account of the 80386 characteristics. |
| (4) The trig, log, and exp functions are based in each case upon quasi- |
| "optimal" polynomial approximations. My definition of "optimal" was |
| based upon getting good accuracy with reasonable speed. |
| |
| The code of the emulator is complicated slightly by the need to |
| account for a limited form of re-entrancy. Normally, the emulator will |
| emulate each FPU instruction to completion without interruption. |
| However, it may happen that when the emulator is accessing the user |
| memory space, swapping may be needed. In this case the emulator may be |
| temporarily suspended while disk i/o takes place. During this time |
| another process may use the emulator, thereby changing some static |
| variables (eg FPU_st0_ptr, etc). The code which accesses user memory |
| is confined to five files: |
| fpu_entry.c |
| reg_ld_str.c |
| load_store.c |
| get_address.c |
| errors.c |
| |
| ----------------------- Limitations of wm-FPU-emu ----------------------- |
| |
| There are a number of differences between the current wm-FPU-emu |
| (version beta 1.4) and the 80486 FPU (apart from bugs). Some of the |
| more important differences are listed below: |
| |
| All internal computations are performed at 64 bit or higher precision |
| and rounded etc as required by the PC bits of the FPU control word. |
| Under the crt0 version for Linux current at March 1993, the FPU PC |
| bits specify 53 bits precision. |
| |
| The precision flag (PE of the FPU status word) and the Roundup flag |
| (C1 of the status word) are now partially implemented. Does anyone |
| write code which uses these features? |
| |
| The functions which load/store the FPU state are partially implemented, |
| but the implementation should be sufficient for handling FPU errors etc |
| in 32 bit protected mode. |
| |
| The implementation of the exception mechanism is flawed for unmasked |
| interrupts. |
| |
| Detection of certain conditions, such as denormal operands, is not yet |
| complete. |
| |
| ----------------------- Performance of wm-FPU-emu ----------------------- |
| |
| Speed. |
| ----- |
| |
| The speed of floating point computation with the emulator will depend |
| upon instruction mix. Relative performance is best for the instructions |
| which require most computation. The simple instructions are adversely |
| affected by the fpu instruction trap overhead. |
| |
| |
| Timing: Some simple timing tests have been made on the emulator functions. |
| The times include load/store instructions. All times are in microseconds |
| measured on a 33MHz 386 with 64k cache. The Turbo C tests were under |
| ms-dos, the next two columns are for emulators running with the djgpp |
| ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97, |
| using libm4.0 (hard). |
| |
| function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu |
| |
| + 60.5 154.8 76.5 139.4 |
| - 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7 |
| * 71.0 190.8 79.6 146.6 |
| / 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1 |
| |
| sin() 310.8 4692.0 319.0 398.5 |
| cos() 284.4 4855.2 308.0 388.7 |
| tan() 495.0 8807.1 394.9 504.7 |
| atan() 328.9 4866.4 601.1 419.5-491.9 |
| |
| sqrt() 128.7 crashed 145.2 227.0 |
| log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1 |
| exp() 479.1 6619.2 469.1 850.8 |
| |
| |
| The performance under Linux is improved by the use of look-ahead code. |
| The following results show the improvement which is obtained under |
| Linux due to the look-ahead code. Also given are the times for the |
| original Linux emulator with the 4.1 'soft' lib. |
| |
| [ Linus' note: I changed look-ahead to be the default under linux, as |
| there was no reason not to use it after I had edited it to be |
| disabled during tracing ] |
| |
| wm-FPU-emu w original w |
| look-ahead 'soft' lib |
| + 106.4 190.2 |
| - 108.6-111.6 192.4-216.2 |
| * 113.4 193.1 |
| / 108.8-124.4 700.1-706.2 |
| |
| sin() 390.5 2642.0 |
| cos() 381.5 2767.4 |
| tan() 496.5 3153.3 |
| atan() 367.2-435.5 2439.4-3396.8 |
| |
| sqrt() 195.1 4732.5 |
| log() 358.0-387.5 3359.2-3390.3 |
| exp() 619.3 4046.4 |
| |
| |
| These figures are now somewhat out-of-date. The emulator has become |
| progressively slower for most functions as more of the 80486 features |
| have been implemented. |
| |
| |
| ----------------------- Accuracy of wm-FPU-emu ----------------------- |
| |
| |
| Accuracy: The following table gives the accuracy of the sqrt(), trig |
| and log functions. Each function was tested at about 400 points. Ideal |
| results would be 64 bits. The reduced accuracy of cos() and tan() for |
| arguments greater than pi/4 can be thought of as being due to the |
| precision of the argument x; e.g. an argument of pi/2-(1e-10) which is |
| accurate to 64 bits can result in a relative accuracy in cos() of about |
| 64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given |
| in the last column. |
| |
| |
| Function Tested x range Worst result (bits) Turbo C |
| |
| sqrt(x) 1 .. 2 64.1 63.2 |
| atan(x) 1e-10 .. 200 62.6 62.8 |
| cos(x) 0 .. pi/2-(1e-10) 63.2 (x <= pi/4) 62.4 |
| 35.2 (x = pi/2-(1e-10)) 31.9 |
| sin(x) 1e-10 .. pi/2 63.0 62.8 |
| tan(x) 1e-10 .. pi/2-(1e-10) 62.4 (x <= pi/4) 62.1 |
| 35.2 (x = pi/2-(1e-10)) 31.9 |
| exp(x) 0 .. 1 63.1 62.9 |
| log(x) 1+1e-6 .. 2 62.4 62.1 |
| |
| |
| As of version 1.3 of the emulator, the accuracy of the basic |
| arithmetic has been improved (by a small fraction of a bit). Care has |
| been taken to ensure full accuracy of the rounding of the basic |
| arithmetic functions (+,-,*,/,and fsqrt), and they all now produce |
| results which are exact to the 64th bit (unless there are any bugs |
| left). To ensure this, it was necessary to effectively get information |
| of up to about 128 bits precision. The emulator now passes the |
| "paranoia" tests (compiled with gcc 2.3.3) for 'float' variables (24 |
| bit precision numbers) when precision control is set to 24, 53 or 64 |
| bits, and for 'double' variables (53 bit precision numbers) when |
| precision control is set to 53 bits (a properly performing FPU cannot |
| pass the 'paranoia' tests for 'double' variables when precision |
| control is set to 64 bits). |
| |
| ------------------------- Contributors ------------------------------- |
| |
| A number of people have contributed to the development of the |
| emulator, often by just reporting bugs, sometimes with a suggested |
| fix, and a few kind people have provided me with access in one way or |
| another to an 80486 machine. Contributors include (to those people who |
| I have forgotten, please excuse me): |
| |
| Linus Torvalds |
| Tommy.Thorn@daimi.aau.dk |
| Andrew.Tridgell@anu.edu.au |
| Nick Holloway alfie@dcs.warwick.ac.uk |
| Hermano Moura moura@dcs.gla.ac.uk |
| Jon Jagger J.Jagger@scp.ac.uk |
| Lennart Benschop |
| Brian Gallew geek+@CMU.EDU |
| Thomas Staniszewski ts3v+@andrew.cmu.edu |
| Martin Howell mph@plasma.apana.org.au |
| M Saggaf alsaggaf@athena.mit.edu |
| Peter Barker PETER@socpsy.sci.fau.edu |
| tom@vlsivie.tuwien.ac.at |
| Dan Russel russed@rpi.edu |
| Daniel Carosone danielce@ee.mu.oz.au |
| cae@jpmorgan.com |
| Hamish Coleman t933093@minyos.xx.rmit.oz.au |
| |
| ...and numerous others who responded to my request for help with |
| a real 80486. |
| |