From Newsgroup: comp.dsp
Am 22.11.19 um 19:04 schrieb MatthewA:
Gosh, Steve... apologies for my naivety. The only implementation I've seen has been recursive. I'm a bit of a noob in this regard. I'm definitely looking to compute these frames with a high degree of latency. The idea here is that we trade immediacy for spectral accuracy because, in my use case, latency isn't a huge problem as long as it stays under a second or two. In my current implementation I use a low priority thread and do it on the graphics processor since it could be up to a few seconds of 44.1k audio.
The point of the original question is, how can I spread the computation of an FFT out over multiple vector-based performance routine calls. (not sure the exact term for that.)
Ah, so the "real question" is: How can I efficiently compute an FFT on a parallel vector processor like a GPU?
Unfortunately, that question is nontrivial, because, as you have
discovered, there is a lot of dopendency between the input and output
values. Each input value in a FFT has an influence onto each output
value, which makes parallel processing hard.
You have several options, depending on your input length, that might
speed up things.
1) Do you really need the full FFT, or just a few single bins? If only a
few bins are required, a direct convolution can be quite efficient on
the GPU. You can precompute the sine/cosine for these frequency bins and compute the direct dot product.
"True" FFT only pays off if you want the full transform. There is even a special algorithm for the computation of single FFT bins, the Goertzel algorithm.
2) Do you have maybe multiple FFTs which you can process in parallel? In particular, if you compute 2D FFTs, then the FFT runs on each row and
then on each column, which can all be processed in parallel
3) There are open source libraries for FFTs on GPUs, e.g.
https://github.com/clMathLibraries/clFFT for OpenCL, based on some early examples by Apple. The code is not that straight forward and it requires multiple passes on the host compiler, so your mileage may vary.
Christian
--- Synchronet 3.22a-Linux NewsLink 1.2