This is a list of things we should do but haven't done
yet.  

1) Implement some sort of Prime Factor algorithm (Temperton's?)
   (PFA is now used in the codelets)

2) Try the Winograd blocks for the base cases. (We now use
   Rader's algorithm for prime size codelets.)

3) Try on-the-fly generation of twiddle factors, to save space
   and cache. (Done.  However, not yet enabled in the standard
   distribution.  The codelet generator is capable of generating
   code that either loads or computes the twiddle factors, and the
   FFTW C code supports both ways.  We do not have enough experimental
   numbers to determine which way is faster, however)

4) Since we now have ``strided wisdom'', it would be nice to keep
   the stride into account when planning 1D transform recursively.
   We should eliminate the planner table altogether, and just use
   the wisdom table for planning.

5) Implement parallel real-complex transforms.
