Discussion:
fftw runtime cpu detection
Eric Bavier
2018-04-05 22:13:29 UTC
Permalink
Hello Guix,

I recently discovered that the FFTW library can do runtime cpu
detection. In order to do this, the package needs to be configured to
build SIMD "codelets", like how our 'fftw-avx' currently does. Then,
based on the instruction support detected at runtime, make those
kernels available to the fftw "planner" for execution.

I tested this on two systems: 1) system with sse2, and 2) system with
avx2. I configured the library with "--enable-sse2 --enable-avx
--enable-avx2", then ran the following on both systems:

1)
$ ./tests/bench --verbose=3 --verify 'ibcd11x7x6v10'
Planning ibcd11x7x6v10...
using plan_many_dft
estimate-planner time: 0.004355 s
using plan_many_dft
planner time: 0.035684 s
(dft-rank>=2/1
(dft-vrank>=1-x11/1
(dft-rank>=2/1
(dft-vrank>=1-x7/1
(dft-direct-6-x10 "n1bv_6_sse2"))
(dft-direct-7-x60 "n1bv_7_sse2")))
(dft-direct-11-x420 "n1bv_11_sse2"))
flops: 36800 add, 9700 mul, 26260 fma
estimated cost: 99057.699080, pcost = 115706.000000
ibcd11x7x6v10 4.33362e-16 7.27264e-16 8.46842e-16

2)
$ ./tests/bench --verbose=3 --verify 'ibcd11x7x6v10'
Planning ibcd11x7x6v10...
using plan_many_dft
estimate-planner time: 0.001485 s
using plan_many_dft
planner time: 0.025788 s
(dft-rank>=2/1
(dft-rank>=2/1
(dft-vrank>=1-x77/1
(dft-direct-6-x10 "n1bv_6_sse2"))
(dft-vrank>=1-x11/1
(dft-direct-7-x60 "n1bv_7_avx")))
(dft-direct-11-x420 "n1bv_11_avx"))
flops: 12280 add, 2810 mul, 6950 fma
estimated cost: 28996.283180, pcost = 40767.000000
ibcd11x7x6v10 2.24601e-07 3.90447e-07 2.42548e-07


The attached patch is a WIP.
--
Eric Bavier, Scientific Libraries, Cray Inc.
Chris Marusich
2018-04-06 07:54:19 UTC
Permalink
Post by Eric Bavier
I recently discovered that the FFTW library can do runtime cpu
detection.
Cool! I'm not familiar with this library, but the patch seems pretty
reasonable to me.
Post by Eric Bavier
In order to do this, the package needs to be configured to build SIMD
"codelets", like how our 'fftw-avx' currently does. Then, based on
the instruction support detected at runtime, make those kernels
available to the fftw "planner" for execution.
So, if we choose the right configure flags at build time (for the given
architecture), then at runtime, the software will detect the CPU and
either perform better or make more features available. Is that right?

I see you added the "--enable-sse" configure flag, and it not present
before. Why did you add it?
--
Chris
Eric Bavier
2018-04-06 15:08:46 UTC
Permalink
Post by Chris Marusich
Post by Eric Bavier
I recently discovered that the FFTW library can do runtime cpu
detection.
Cool! I'm not familiar with this library, but the patch seems pretty
reasonable to me.
Thanks for looking at it.
Post by Chris Marusich
Post by Eric Bavier
In order to do this, the package needs to be configured to build SIMD
"codelets", like how our 'fftw-avx' currently does. Then, based on
the instruction support detected at runtime, make those kernels
available to the fftw "planner" for execution.
So, if we choose the right configure flags at build time (for the given
architecture), then at runtime, the software will detect the CPU and
either perform better or make more features available. Is that right?
That's the idea, yes. The simd kernels will execute if the cpu
supports them and the fftw planner finds they are faster in practice
than other kernels.
Post by Chris Marusich
I see you added the "--enable-sse" configure flag, and it not present
before. Why did you add it?
In the documentation, I had seen it listed as the simd flag for
single-precision. But now I see a comment in configure.ac that says
the --enable-sse2 flag Does The Right Thing when --enable-float is
given, so it can be left out. Thanks for checking!
--
Eric Bavier, Scientific Libraries, Cray Inc.
Ludovic Courtès
2018-04-06 08:05:43 UTC
Permalink
Hello Eric,
Post by Eric Bavier
I recently discovered that the FFTW library can do runtime cpu
detection. In order to do this, the package needs to be configured to
build SIMD "codelets", like how our 'fftw-avx' currently does. Then,
based on the instruction support detected at runtime, make those
kernels available to the fftw "planner" for execution.
That’s really good news! Thanks for testing it.

The patch LGTM. Can you confirm that the planner won’t ever try to use
the AVX2 codelets, for instance when running the test suite on an x86_64
box that lacks AVX2?

If that’s the case, I’d be in favor of pushing this patch to core-updates.

Thanks,
Ludo’.
Eric Bavier
2018-04-06 15:02:32 UTC
Permalink
Post by Ludovic Courtès
Hello Eric,
Post by Eric Bavier
I recently discovered that the FFTW library can do runtime cpu
detection. In order to do this, the package needs to be configured to
build SIMD "codelets", like how our 'fftw-avx' currently does. Then,
based on the instruction support detected at runtime, make those
kernels available to the fftw "planner" for execution.
That’s really good news! Thanks for testing it.
The patch LGTM. Can you confirm that the planner won’t ever try to use
the AVX2 codelets, for instance when running the test suite on an x86_64
box that lacks AVX2?
Yes, I've successfully run 'make check' on an sse2-only machine where
'--enable-avx' and '--enable-avx2' where configured. I'll check on an
i686 machine tonight.
Post by Ludovic Courtès
If that’s the case, I’d be in favor of pushing this patch to core-updates.
Great. I'll do some more testing. Should I send a finalized patch to
guix-patches when it's ready?
--
Eric Bavier, Scientific Libraries, Cray Inc.
Ludovic Courtès
2018-04-06 15:09:21 UTC
Permalink
Post by Eric Bavier
Post by Ludovic Courtès
Hello Eric,
Post by Eric Bavier
I recently discovered that the FFTW library can do runtime cpu
detection. In order to do this, the package needs to be configured to
build SIMD "codelets", like how our 'fftw-avx' currently does. Then,
based on the instruction support detected at runtime, make those
kernels available to the fftw "planner" for execution.
That’s really good news! Thanks for testing it.
The patch LGTM. Can you confirm that the planner won’t ever try to use
the AVX2 codelets, for instance when running the test suite on an x86_64
box that lacks AVX2?
Yes, I've successfully run 'make check' on an sse2-only machine where
'--enable-avx' and '--enable-avx2' where configured. I'll check on an
i686 machine tonight.
OK.
Post by Eric Bavier
Post by Ludovic Courtès
If that’s the case, I’d be in favor of pushing this patch to core-updates.
Great. I'll do some more testing. Should I send a finalized patch to
guix-patches when it's ready?
If Marius has no objections, I think you could push it directly to
core-updates.

Thank you,
Ludo’.
Marius Bakke
2018-04-06 18:37:42 UTC
Permalink
Post by Ludovic Courtès
Post by Eric Bavier
If that’s the case, I’d be in favor of pushing this patch to core-updates.
Great. I'll do some more testing. Should I send a finalized patch to
guix-patches when it's ready?
If Marius has no objections, I think you could push it directly to
core-updates.
Sounds good to me. I just pushed a couple of full-rebuild commits to
fix bootstrap-tarballs, so the Big Rebuild is still some days off.
Eric Bavier
2018-04-17 21:29:48 UTC
Permalink
Hello Guix,

On Fri, 06 Apr 2018 20:37:42 +0200
Post by Marius Bakke
Post by Ludovic Courtès
Post by Eric Bavier
If that’s the case, I’d be in favor of pushing this patch to core-updates.
Great. I'll do some more testing. Should I send a finalized patch to
guix-patches when it's ready?
If Marius has no objections, I think you could push it directly to
core-updates.
Sounds good to me. I just pushed a couple of full-rebuild commits to
fix bootstrap-tarballs, so the Big Rebuild is still some days off.
I just pushed commit 65bb22796f854cbc3eae053a80b1d64365dad376 to
core-updates. I built a good portion of fftwf's dependents to check
things out. The dependent audio libraries seemed to pass their tests
on the build machine (an avx cpu), which gives me confidence.

`~Eric
Ludovic Courtès
2018-04-18 21:36:33 UTC
Permalink
Post by Eric Bavier
Hello Guix,
On Fri, 06 Apr 2018 20:37:42 +0200
Post by Marius Bakke
Post by Ludovic Courtès
Post by Eric Bavier
Post by Ludovic Courtès
If that’s the case, I’d be in favor of pushing this patch to core-updates.
Great. I'll do some more testing. Should I send a finalized patch to
guix-patches when it's ready?
If Marius has no objections, I think you could push it directly to
core-updates.
Sounds good to me. I just pushed a couple of full-rebuild commits to
fix bootstrap-tarballs, so the Big Rebuild is still some days off.
I just pushed commit 65bb22796f854cbc3eae053a80b1d64365dad376 to
core-updates. I built a good portion of fftwf's dependents to check
things out. The dependent audio libraries seemed to pass their tests
on the build machine (an avx cpu), which gives me confidence.
Awesome, thank you!

Ludo’.

Loading...