xmame-0.81.1 on the K6-3+ compiled with gcc-3.3.3. Here we compare the preset flags with -march=i586 to -O2 with -march=i586 and the preset flags with -march=k6. This setting used to ICE with gcc-3.2.1, but appears to work OK now.
GAME | PRESET | -O2 | -O2 with k6 arch |
pacman | 82.570656 | 82.151757 | 87.929299 |
tempest | 16.707578 | 16.489034 | 16.886475 |
samsho | 38.472931 | 37.568450 | 40.814139 |
ssf2t | 35.134646 | 34.969959 | 37.506007 |
xmen | 43.463369 | 42.061477 | 46.157284 |
mslugx | 36.002283 | 34.936075 | 38.027312 |
mk2 | 18.114097 | 18.099119 | 19.483827 |
Here we see a gain from using the preset flags (listed below) over -O2, but mating the preset flags with the K6-specific architecture optimization makes another big improvement in scores. Clearly optimizing for a specific pipeline versus a generic "i586" is important when you want every ounce of speed from gcc-3.3.3.
SYSTEM #1 SGI Octane w/gcc
SYSTEM #2 SGI Octane w/MIPSPro compiled version
SYSTEM #3 SGI O2 w/gcc
SYSTEM #4 SGI O2 w/MIPSPro compiled version
SYSTEM #5 AMD K6-3+
SYSTEM #6 AMD K6-2
CFLAGS = -O2 -Wall -Wno-unused -march=r5k -mabi=n32 -fomit-frame-pointer -fstrict-aliasing -fstrength-reduce -ffast-math
CFLAGS = -O2 -Wall -Wno-unused -march=i586 -fomit-frame-pointer -fstrict-aliasing -fstrength-reduce -ffast-math -pipe
GAME | R10K 225 mipspro | R10K 225 gcc | R5K 300 mipspro | R5K 300 gcc | K6-3+ 450 | K6-2 500 |
pacman | 137.203003 | 117.940765 | 67.869871 | 64.536163 | 84.598798 | 47.205285 |
tempest | 39.997060 | 30.914372 | 17.670171 | 15.100367 | 16.651686 | 11.597895 |
samsho | 36.453655 | 27.079296 | 21.709560 | 18.922046 | 40.909001 | 29.152977 |
ssf2t | 34.426368 | 29.350826 | 18.807077 | 17.036615 | 36.107175 | 24.225178 |
xmen | 41.291096 | 32.940774 | 24.565539 | 22.719772 | 43.611827 | 30.895375 |
mslugx | 31.018900 | 22.625514 | 18.109691 | 16.165538 | 37.525928 | 26.111407 |
mk2 | 36.961467 | 30.826279 | 22.559168 | 19.232059 | 35.976701 | 24.091125 |
crusnusa | ??-4.932447?? | ??-6.897904?? | ???? | ???? | 7.152422 | 5.583496 |
kinst2 | 6.805552 | COREDUMP! | 5.547025 | ??-5.04973?? | 11.892078 | 11.128569 |
The comparison above was illustrative of the differences between a MIPSpro compile and a gcc 3.2.2 compile, and compared to a slightly newer gcc compile on Linux. Now let's test things more fairly.
First we hold gcc versions constant and try different CFLAGS.
Tests performed on the K6-3+
PRESET:
CFLAGS = -O2 -Wall -Wno-unused -march=i586 -fomit-frame-pointer -fstrict-aliasing -fstrength-reduce -ffast-math -pipe
CFLAGS = -O2 -Wall -Wno-unused -march=i586
CFLAGS = -O3 -Wall -Wno-unused -march=i586
GAME | PRESET | -O2 | -O3 |
pacman | 84.598798 | 80.678981 | 80.058366 |
tempest | 16.651686 | 16.600993 | 16.581389 |
samsho | 40.909001 | 39.387096 | 39.637885 |
ssf2t | 36.107175 | 35.793585 | 36.161937 |
xmen | 43.611827 | 43.051415 | 42.871190 |
mslugx | 37.525928 | 36.294775 | 36.812401 |
mk2 | 35.976701 | 34.635191 | 34.669343 |
crusnusa | 7.152422 | 6.881578 | 6.067334 |
kinst2 | 11.892078 | 11.017038 | 11.604298 |
For gcc 2.95.3 the preset flags in makefile.unix are generally better than -O2 or -O3 alone. The only exception was ssf2t where -O3 was 0.06fps better. -O3 helps newer games somewhat and hurts older games slightly, with early '90s games somewhere in between.
gcc 3.2.2
GAME | PRESET | -O2 | -O3 |
pacman | 83.793083 | 84.312677 | 84.895467 |
tempest | 16.763369 | 17.252698 | 16.642994 |
samsho | 39.368409 | 39.609946 | 39.928090 |
ssf2t | 35.510030 | 35.624199 | 35.970064 |
xmen | 42.589371 | 43.566596 | 43.614736 |
mslugx | 35.831199 | 36.133992 | 36.989401 |
mk2 | 36.452126 | 36.239598 | 36.224637 |
crusnusa | 7.305772 | 6.981159 | 7.239103 |
kinst2 | 11.418256 | 10.902094 | 10.231836 |
Strange behavior! Here we see newer games again benefiting from the preset CFLAGS, but generally O3 has the best performance by a slim margin. However, O3 miscompiles on the Octane (see below) so there clearly is danger in using it on alternative architectures. This makes it hard to recommend a CFLAGS for gcc 3.2.2. -O2 is a good compromise except for newer games.
Tests performed on the Octane
PRESET:
CFLAGS = -O2 -Wall -Wno-unused -march=r5k -mabi=n32 -fomit-frame-pointer -fstrict-aliasing -fstrength-reduce -ffast-math
CFLAGS = -O2 -Wall -Wno-unused -march=r5k -mabi=n32
CFLAGS = -O3 -Wall -Wno-unused -march=r5k -mabi=n32
GAME | PRESET | -O2 | -O3 |
pacman | 117.940765 | 118.288844 | hang on black screen |
tempest | 30.914372 | 30.968461 | miscompile--vertex coords all wrong |
samsho | 27.079296 | 26.900544 | hang on BIOS mess |
ssf2t | 29.350826 | 29.360782 | hang on black |
xmen | 32.940774 | 32.970164 | hang on "bad" ROM screen |
mslugx | 22.625514 | 22.922950 | hang on BIOS mess |
mk2 | 30.826279 | 30.384780 | hang with corrupt screen |
crusnusa | negative number | ??-7.15?? | -6.80??? |
kinst2 | coredump | coredump! | hang |
First let me note kinst2 hangs on a black screen even with -O and no architecture flags! I have not gotten kinst2 to run correctly on gcc 3.2.2 on the Octane at all. Additionally I get an internal compiler error using N64 mode on the preset CFLAGS, so as yet I cannot compare 64-bit performance on the Octane. (The libraries are installed.) Obviously O3 kills us here. There isn't much difference between the preset CFLAGS and -O2.