7-1-2004
GAME | 32-bit performance | 64-bit performance | %speedup with 64-bit | 64-bit with gcc 3.4.0 | %speedup with gcc 3.4.0 |
5-26-2004
The purpose of this experiment is to quantify performance differences between a 32-bit and 64-bit compile of xmame on a native 64-bit X86-64 (aka AMD64) Linux OS. This is in response to the frequently-updated mame32 benchmarks and the X86-64xmame mailing list thread.
Background: Provided one has a native amd64 OS, a set of 64-bit and 32-bit libraries, and a compiler that allows 32-bit and 64-bit compilation, it is possible to run 64-bit programs on the same machine as 32-bit programs. It is not possible to mix 32-bit assembly code within a 64-bit program, so assembly CPU emulators and the MIPS dynamic recompiler are not being tested here. For this experiment I chose Linux for the 64-bit OS due to its current level of maturity compared to Windows, and I can't stand Windows anyway.
GAME | 32-bit performance | 64-bit performance | %speedup with 64-bit |
crusnusa | 49.654994 | 56.156599 | 13.09 % |
dkong | 1265.035100 | 1306.420889 | 3.27 % |
ga2 | 257.016928 | 230.427070 | -10.35 % |
kinst2 | 5.872204 | 6.341450 | 7.99 % |
kof2000 | 368.906005 | 382.610154 | 3.71 % |
mk2 | 144.338047 | 144.263163 | -0.05 % |
mk | 241.290272 | 246.329293 | 2.09 % |
mslugx | 350.864648 | 358.762169 | 2.25 % |
pacman | 1488.463367 | 1541.355730 | 3.55 % |
pitfight | 368.660985 | 395.256995 | 7.21 % |
punchout | 590.176837 | 613.401611 | 3.94 % |
rastan | 657.029360 | 698.528849 | 6.32 % |
samsho | 390.677070 | 403.090300 | 3.18 % |
soldivid | 350.878510 | crash | undefined % |
souledgb | 49.573759 | 59.177503 | 19.37 % |
ssf2t | 365.694318 | 379.838434 | 3.87 % |
stunrun | 181.899974 | crash | undefined % |
tempest | 319.210055 | 324.382548 | 1.62 % |
umk3 | 143.769653 | 143.247326 | -0.36 % |
wargods | 52.316578 | 59.153097 | 13.07 % |
xmen | 409.296365 | 447.504120 | 9.33 % |
The best aspect of X86-64 is the extra registers offset the bloat of the 64-bit extensions, and these numbers demonstrate that only a few games are consistenly worse under the 64-bit xmame. This makes X86-64 one of the few (only?) 64-bit ISAs where 32-bit code is generally slower. (On MIPS 32-bit is preferred for speed.) I consistently see mk2 and umk3 are slower on 64-bit xmame by a small margin, and ga2 is consistently slower by 10%. It would be interesting to see what these drivers do that makes them so slow, and of course, don't write code that way.
Generally older games benefit least from X86-64. The difference is under 10%. Newer games benefit more, generally 8-20%. It's almost like having a "free overclock" relative to a traditional 32-bit PC.
gcc-3.3.3 does not have good K8 pipeline knowledge, nor does it have new features like gcc-3.4.0's -funit-at-a-time (implied by -O2?) or -fweb (implied by -O3?). It's probable gcc 3.4.0 would boost the above scores by a noticeable percentage. Some have speculated 3.4.0 to produce executables 10-15% faster. This is worth testing!
GAME | 32-bit performance | 64-bit performance | %speedup with 64-bit |
crusnusa | 49.086755 | 58.936718 | 20.07 % |
dkong | 1140.764264 | 1415.447801 | 24.08 % |
ga2 | 257.936726 | 234.239977 | -9.19 % |
kinst2 | 5.839336 | 6.366226 | 9.02 % |
kof2000 | 366.505529 | 406.741893 | 10.98 % |
mk2 | 146.685502 | 145.081501 | -1.09 % |
mk | 242.281070 | 248.512936 | 2.57 % |
mslugx | 345.958596 | 371.871160 | 7.49 % |
pacman | 1451.273492 | 1629.660340 | 12.29 % |
pitfight | 370.377723 | 396.107374 | 6.95 % |
punchout | 606.254791 | 643.604846 | 6.16 % |
rastan | 664.283351 | 733.623237 | 10.44 % |
samsho | 390.650759 | 420.713386 | 7.70 % |
soldivid | 344.246640 | crash | undefined % |
souledgb | 51.166532 | 61.042728 | 19.30 % |
ssf2t | 361.827581 | 390.738864 | 7.99 % |
stunrun | 181.875532 | crash | undefined % |
tempest | 272.897737 | 340.184756 | 24.66 % |
umk3 | 145.880191 | 143.936515 | -1.33 % |
wargods | 52.638714 | 60.428721 | 14.80 % |
xmen | 420.078615 | 472.652894 | 12.52 % |
-O3 is not universally better than my -O2 + options settings in 32-bit xmame, but it is a universal win in 64-bit xmame. The result is a somewhat larger performance percentage in some games, like kof2000, which don't tend to vary as much in successive runs like pacman and dkong do.
This is to test 3.4.0's alleged 10-15% speedup with -march=k8 and other enhancements. As you can see below, we never achieve this goal.
GAME | 64-bit gcc 3.3.3 -O2 | 64-bit gcc 3.4.0 -O2 | %speedup with 3.4.0 | 64-bit gcc 3.3.3 -O3 | 64-bit gcc 3.4.0 -O3 | %speedup with 3.4.0 |
crusnusa | 56.156599 | 60.270855 | 7.33 % | 58.936718 | 61.050343 | 3.59 % |
dkong | 1306.420889 | 1295.098234 | -0.87 % | 1415.447801 | 1345.679675 | -4.93 % |
ga2 | 230.427070 | 235.620888 | 2.25 % | 234.239977 | 235.603967 | 0.58 % |
kinst2 | 6.341450 | 6.408674 | 1.06 % | 6.366226 | 6.887890 | 8.19 % |
kof2000 | 382.610154 | 408.004194 | 6.64 % | 406.741893 | 391.631754 | -3.71 % |
mk2 | 144.263163 | 145.184173 | 0.64 % | 145.081501 | 148.429203 | 2.31 % |
mk | 246.329293 | 246.219163 | -0.04 % | 248.512936 | 250.966075 | 0.99 % |
mslugx | 358.762169 | 369.534169 | 3.00 % | 371.871160 | 381.372748 | 2.56 % |
pacman | 1541.355730 | 1556.940991 | 1.01 % | 1629.660340 | 1753.823203 | 7.62 % |
pitfight | 395.256995 | 404.457804 | 2.33 % | 396.107374 | 419.641312 | 5.94 % |
punchout | 613.401611 | 618.810579 | 0.88 % | 643.604846 | 647.053232 | 0.54 % |
rastan | 698.528849 | 716.890742 | 2.63 % | 733.623237 | 740.113673 | 0.88 % |
samsho | 403.090300 | 421.498746 | 4.57 % | 420.713386 | 432.719346 | 2.85 % |
soldivid | crash | crash | undefined % | crash | crash | undefined % |
souledgb | 59.177503 | 59.680260 | 0.85 % | 61.042728 | 59.501684 | -2.52 % |
ssf2t | 379.838434 | 388.248751 | 2.21 % | 390.738864 | 392.914541 | 0.56 % |
stunrun | crash | crash | undefined % | crash | crash | undefined % |
tempest | 324.382548 | 346.557169 | 6.84 % | 340.184756 | 338.134105 | -0.60 % |
umk3 | 143.247326 | 144.629490 | 0.96 % | 143.936515 | 147.008495 | 2.13 % |
wargods | 59.153097 | 61.810082 | 4.49 % | 60.428721 | 62.926341 | 4.13 % |
xmen | 447.504120 | 457.206814 | 2.17 % | 472.652894 | 475.177150 | 0.53 % |
When using gcc 3.4.0 I never see more than 8.19% improvement on games and generally less than 5%. This deflates the idea of getting 10-15% from -march=k8 and the new options (-funit-at-a-time should be included in -O2 by default). However, I do see a general improvement from 3.4.0 and for most games -O3 is a win, except souledgb surprises with best speed with gcc 3.3.3 -O3. dkong behaves similarly, but that game has a large margin of error between successive runs.