## x86-64
nbody 4.86 6.48 5.43 5.40
spectral-norm 2.51 5.04 2.56 2.51
spectral-norm (5500) 3.04 6.07 3.10 3.06
fannkuch-redux 1.47 3.27 1.54 1.14
fannkuch-redux (12) 19.3 43.9 20.1 14.8
4x4 f32 matrix:
inverse 1872 4629
multiply 876 2654
transpose 291 770
test mandel_naive ... bench: 948,074 ns/iter (+/- 2,595)
test mandel_simd4 ... bench: 295,489 ns/iter (+/- 1,388)
## aarch64
test inverse_naive ... bench: 7,349 ns/iter (+/- 755)
test inverse_simd4 ... bench: 2,215 ns/iter (+/- 221)
test multiply_naive ... bench: 3,375 ns/iter (+/- 98)
test multiply_simd4 ... bench: 1,233 ns/iter (+/- 7)
test transpose_naive ... bench: 504 ns/iter (+/- 132)
test transpose_simd4 ... bench: 300 ns/iter (+/- 8)
test mandel_naive ... bench: 3,116,045 ns/iter (+/- 12,387)
test mandel_simd4 ... bench: 953,249 ns/iter (+/- 3,220)
fannkuch-redux: 5.11 9.45
fannkuch-redux (12) 65.3 130
spectral-norm: 7.85 7.91
spectral-norm (5500): 26.3 26.6
nbody: 29.5 61.8
## arm
test inverse_naive ... bench: 13,788 ns/iter (+/- 4,540)
test inverse_simd4 ... bench: 6,906 ns/iter (+/- 1,027)
test multiply_naive ... bench: 8,178 ns/iter (+/- 11,187)
test multiply_simd4 ... bench: 2,033 ns/iter (+/- 22)
test transpose_naive ... bench: 1,023 ns/iter (+/- 24)
test transpose_simd4 ... bench: 545 ns/iter (+/- 742)
test mandel_naive ... bench: 2,639,999 ns/iter (+/- 7,036)
test mandel_simd4 ... bench: 885,036 ns/iter (+/- 2,876)
fannkuch-redux 3.36 4.50
fannkuch-redux (12) 42.8 70