Figure 1: Coefficients and their waveforms.
G'MIC implements the Discrete Fast Fourier Transform and its inverse through filter commands -fft and -ifft. -fft runs the Fourier analysis transform which expresses images, indeed any arbitrary dataset, as sets of sinusoidial waves of various frequencies, phasings and intensities, these in the form of a coefficient field that comprises the spectral domain.
-ifft performs the inverse synthesis transform: given a field of coefficients, it calculates an array of waveforms that constructively reinforce or destructively interfere with each other to reproduce an image – or any other dataset – in the spatial domain.
Together, the commands switch representations of an image between the more familiar spatial domain, where cats look like cats and dogs look like dogs, and the spectral domain, where both creatures (and everything else) are encoded in a field of coefficients, each representing a particular waveform. In this less familiar domain, many frequency and detail oriented editing operations are quite straightforward, where they would be difficult or impossible to do in the spatial domain.
Both commands are defined over complex number fields and, consequently, operate pairwise on their selected images, interpreting each of the pair as real (ax,y) and imaginary (ibx,y) fields, which are then taken together to represent a single complex number field. However, for odd-numbered selections, both commands assume an implicit imaginary field for the last image, this populated with zeros.
Both commands adapt to the number of dimensions in the image, performing three dimensional transforms for images with more than one slice, two dimensional transforms for single slice images, and one dimensional transforms for single row or column images.
Each command may take one optional argument, a string consisting of axis identifiers 'x', 'y' and 'z'. If present, the argument restricts the transform to just those cited axes and a dataset nominally of a particular dimension may be evaluated as a collection of lower-dimensioned datasets. For example, with the argument 'xy', a pair of single channel images with dimensions 256,256,256,1, nominally a three dimensional complex number field, may be viewed as a sequence of 256 two dimensional complex number fields. Each z-increment indexes a two dimensional 'xy' dataset which is transformed independently of the others. Similarly a pair of 256,256,1,1 single channel images on the image list can be processed as 256 one dimensional datasets. 'x' indicates a row-by-row organization, 'y' indicates column-by-column.
Figure 2: A complex number with polar and rectangular notations.
Beginning in the late 18th century, Jean-Baptiste Joseph Fourier and others began developing the idea that a quantity varying over time could be expressed as a collection of sinusoidal waves. These, musicians may recognize, are the 'pure' tones that lack harmonics. Uniquely among waveforms, sinusoidals of equal frequency, but varying phases and amplitudes, may be added like vectors under the Parallelogram Rule, the results being other sinusoidals. The idea that some fairly wooly problems, such as heat transfer, could be re-cast into sets of sinusoidal waves, tractible and well-understood, keenly appealed to Fourier and others and the scheme developed throughout the 19th and 20th centuries, its themes recurring in other areas of study. Variations through time do not fundamentally differ from variations of intensity, color or any other phenomena, so the machinery of Fourier transforms readily fits with many toolkits, including image processing.
Initially, however, there were practical difficulties. Fourier analysis and synthesis transforms, initially implemented from their formal definitions, were expensive operations. To compute the transform of each point in the destination domain, every point in the source domain must be evaluated. Thus, for a 5×5 domain, each of the twenty five points call for twenty five evaluations, 25 × 25 = 625 in aggregate, and generally the number of requisite computations grows with the square of the size of the domain. Until about a half century ago, this limited the application of the transforms to small problems.
In the mid-twentieth century, James Cooley and John Tukey developed an algorithm which exploited recurrence patterns of complex roots of unity, enabling them to develop a divide-and-conquer technique that greatly reduced the number of operations for Fourier analysis and synthesis. Large-scale datasets involving hundreds of thousands of points could be transformed with modestly endowed computers. With this 'Fast Fourier Transform' it became straightforward to decompose a time varying signal into its frequency, or spectral components, or translate spectra into time-varying signals.
In both the spatial and spectral domains, the basic data units are complex numbers. In polar form, these have magnitude (m) and angular (θ) parts, their polar coordinates. In rectangular form, they have real (a) and imaginary (ib) parts, the vector projections onto the real and imaginary axes (Figure 2). By the Pythagorean Theorem, these forms are equivalent, but the polar form expresses rotation-related concepts better than its rectangular counterpart.
Both domains may harness any number of cardinal directions to convey ideas of measure along lines (one dimensional) surfaces (two dimensional) volumes (three dimensional), and, (in theory) upward to any number of higher dimensions. -fft and -ifft, however, do not proceed past three dimensional problems and operate in finite-sized discrete domains, this in tune with the discrete pixel structure of G'MIC images.
In finite discrete domains, such as the ones in which G'MIC's -fft and -ifft operate, the spectral and spatial domains have reference points, or origins, upon which we fix indivisible grids and along which we sample or set complex quantities. 'Discrete' means change in any cardinal direction occurs only by quantum leaps, conventionally of unit size. We distinguish points by counting from the origin the number of steps taken in each cardinal direction to reach their location, these becoming the point's coordinates. By convention, we call observed or set values at spectral domain points 'coefficients'.
Being finite, we can only go so far from domain origins before wrapping around and approaching them from 'the other side.' The one dimensional domains are circular, and embedded in surfaces. By extension, two dimensional domains are tori embedded in volumes, and their three dimensional counterparts are hyper-tori embedded in four dimensional spaces. These embedded domains are more easily visualized when unrolled, the circle cut and uncurled into a line segment, the torus into a rectangle and the hyper-torus into a cuboid, but connectedness remains, so as we wander off edges, we magically find ourselves stepping over door sills on the opposite side.
Figure 3: Figure 2 coefficient as a waveform
The spectral and spatial domains apply different meanings to points. In the spatial domain, points quantify conditions at discrete distances from the origin. In the spectral domain, they quantify conditions at particular oscillations. The spatial domain's origin represents zero distance; the spectral domain origin represents zero frequency.
Extra terminology graces the spectral domain. The origin is called the Zero (frequency) Pole, and the coefficient at the furthest remove in every cardinal direction from the Zero Pole is the Nyquist Pole. The Zero Pole quantifies a zero frequency, infinite wavelength signal corresponding to the spatial domain's net average value. The Nyquist Pole quantifies the highest oscillation that can be resolved under Fourier synthesis, given the size of that particular spectral domain. Ever larger spans of coefficients between the Zero and Nyquist Poles permit ever higher frequency characterizations, but at the cost of a larger amount of computation.
Dispite their visual dissimilarities, the spatial and spectral domains embody one another, the spectral coefficient and its spatial waveform encoding the same things. To see this, consider the complex quantity illustrated in Figure 2, 283.277 / 63.2°. We set this value to the spectral coefficient of one, this in a one dimensional spectral domain, leaving all other coefficients zero. Under the Fourier synthesis of this spectral domain to its spatial counterpart, this single coefficient emerges as a complex sinusoidal waveform (Figure 3). Defined on a complex field, it has real (a: blue) and imaginary (ib: red) parts, these plotted in a normalized way where a phase angle of 360° compares to one wavelength.
The coefficient's presence is still manifest. First, the peak-to-peak swing of the waveform is 283.277, exactly the magnitude of the coefficient. The angular argument of the coefficient shows up as a 63.2° offset counterclockwise from the origin, a phase shift, indexing a point where the real part is exactly one half of the magnitude of the coefficient and a vanished imaginary part (marked by the right hand green line). 90° clockwise from this point finds a data bit with a vanished real part and the imaginary part equal to one half of the coefficient's magnitude. It too has a phase shift of 63.2°, but measured from the 90° mark.
As noted earlier, this sinusoidal wave behaves like a vector, as does the coefficient, and this is a providential quality, as there is one other coefficient in this one dimensional spectral domain that is one step from the Zero Pole, but in the opposite direction. It too corresponds a frequency of one cycle. In the present example, its value was zero and so was of no consequence, but generally, the precise qualities of the one cycle spatial domain waveform arises from the vector sum of two coefficients, each a mirror of the other across the Zero Pole reflection axis.
Figure 4: Pairs of coefficients giving rise to waveforms of the same frequency mirror each other in the spectral domain.
As a general rule, for however many cardinal directions a spectral domain may have, we find that nearly all coefficients have mirror counterparts, or chirals, these found by counting the same number of steps from the Zero Pole, along each cardinal axis, but, for each axis, choosing the opposite direction. Figure 4 illustrates examples from the two dimensional spectral domain. Each turquoise coefficient is ten steps in the horizontal direction from the Zero Pole and three in the vertical direction, their displacements differing in sign. Each engender spatial waveforms so oriented as to exhibit three oscillations across the vertical span of the spatial domain and ten oscillations across the horizontal span, with each transverse to the line connecting the two coefficients. The aggregate waveform oscillating along this connecting line is the vector sum of the two component waveforms engendered by the turquoise coefficients. The two magenta coefficients establish similar circumstances for a pair of waveforms oscillating three horizontal and five vertical cycles, the waveform oscillating along their connecting line being the vector sum of these component waveforms. The overall spatial image is the interference pattern of the turquoise and magenta systems.
In the one dimensional case of Figures 2 and 3, the mirror image of coefficient 1 is N − 1, with N the number of coefficients comprising the spectral domain. Should we set both coefficients to the same complex value, say, the one depicted in Figure 2 (283.277 / 63.2°), and execute a Fourier synthesis, the two coefficients do not generate identical waveforms, but mirror images. The real part of one phase shifts counterclockwise 63.2º, its imaginary part trailing by 90°, the other's phase shifts clockwise 63.2º, its imaginary part leading by 90°. The first waveform, that of coefficient 1, mirrors the other, that of coefficient N − 1. These two waveforms, with identical frequencies and propagation directions, but with mirror phasings, combine vectorially to produce the composite one cycle frequency waveform. Both mirrored coefficients, at identical steps from the origin, work in concert to generate the one cycle waveform. With few exceptions, most other waveforms arise from similar coefficient pairings.
In every spectral domain, a few coefficients do not have mirror counterparts. These are the coefficients which sit at the intersections of cardinal axes, such as the Zero and Nyquist Poles. There are 2d such coefficients for a spectral domain of dimension d and along each cardinal axis, each engender either a zero or Nyquist frequency oscillation.
Figure 5: Waveforms courtesy of -srand 123456 -input 1024,1,1,1 -turbulence[-1] 16,3,2,0,0 -normalize 0,1
The real world, which we visit from time to time, offers more than pristine sinusoidals. Figure 4 presents a typical waveform of the wooly sort, this generated from G'MIC's -turbulence command. It could very well be a row or column from an image.
A typical question that we might ask about this waveform would be: "What are the simple sinusoidal waveforms which constitute this data set?" It is the question that Fourier analysis attempts to answer through a transform into the spectral domain. The magnitudes, phase angles, and the orderly distribution of coefficients in that domain by frequency say much about how strong long wavelength oscillations are compared to short wavelength ones.
Perhaps we have an image scanned from a magazine or newspaper and the highly regular halftone screen shows up in the spectral domain as a cluster of large coefficients around a particular frequency. Downscaling just those coefficients, leaving others alone, may minimize the halftone pattern without otherwise disturbing the image. Being vectors, such coefficients are computationally tractable, and, so altered, the modified coefficients can be synthesized into a new version of the image with few halftoning artifacts.
Before we perform such a transform, a remark is in order. The data from -turbulence, typical of images, has no imaginary component for with images such has no meaning. However, Fourier transforms are defined and necessarily operate on complex fields. Fortunately, no loss of generality occurs by augmenting our real-only data with fiat imaginary data, all set to zero. That is reflected in the red flat line in Figure 5. We can do this because the real line, where image data usually lives, is also a subset of the complex plane: Real numbers are just complex numbers with their imaginary parts set to zero.
Figure 6: Left and Right Spectral Space Extremes
coef. | a | ib | coef. | a | ib | ||
---|---|---|---|---|---|---|---|
0 | 0.639 | + | 0.000i | — | — | — | — |
1 | 0.018 | + | 0.043i | 1023 | 0.018 | − | 0.043i |
2 | 0.039 | − | 0.001i | 1022 | 0.039 | + | 0.001i |
3 | 0.022 | − | 0.050i | 1021 | 0.022 | + | 0.050i |
4 | 0.025 | + | 0.041i | 1020 | 0.025 | − | 0.041i |
5 | -0.029 | + | 0.022i | 1019 | -0.029 | − | 0.022i |
6 | 0.028 | + | 0.037i | 1018 | 0.028 | − | 0.037i |
7 | 0.054 | − | 0.022i | 1017 | 0.054 | + | 0.022i |
8 | -0.048 | + | 0.045i | 1016 | -0.048 | − | 0.045i |
9 | 0.067 | + | 0.015i | 1015 | 0.067 | − | 0.015i |
10 | 0.020 | − | 0.007i | 1014 | 0.020 | + | 0.007i |
11 | 0.024 | + | 0.060i | 1013 | 0.024 | − | 0.060i |
12 | 0.003 | + | 0.003i | 1012 | 0.003 | − | 0.003i |
13 | 0.034 | − | 0.010i | 1011 | 0.034 | + | 0.010i |
14 | 0.005 | + | 0.045i | 1010 | 0.005 | − | 0.045i |
15 | 0.014 | + | 0.030i | 1009 | 0.014 | − | 0.030i |
16 | -0.018 | − | 0.014i | 1008 | -0.018 | + | 0.014i |
17 | -0.008 | − | 0.002i | 1007 | -0.008 | + | 0.002i |
18 | 0.017 | + | 0.006i | 1006 | 0.017 | − | 0.006i |
19 | 0.021 | + | 0.033i | 1005 | 0.021 | − | 0.033i |
Table 1: Comparison of Mirrored Coefficients Around The Zero Pole
The spectral domain coefficients found through the Fourier analysis of 1024 discrete samples of -turbulence data (Figure 5) is partially depicted in Figure 6, which omits all but the first and last fifty coefficients, the elided 924 that ranged around the Nyquist Pole varied around zero by less than ± 0.001. This elision immediately tells us about an essential quality of the waveform; it is mostly a low-frequency signal, more of a bass note than a treble; an oboe, not a piccolo. The high frequency coefficients make a negligible contribution to the waveform.
In contrast, the large zero coefficient magnitude tells us that the waveform has an overall positive bias. This should not surprise us. Images from paint programs are mainly stored as unsigned integers and so reflect a positive average value. This is certainly the case with the waveform depicted in Figure 5, which ranges from zero to one with an average intensity of 0.32.
A less obvious trait is the mirror symmetry exhibited in the values of those coefficients other than the Zero Pole. Coefficient 1 is the complex conjugate of its chiral counterpart, coefficient N-1. That is, the real parts of the two coefficients are equal, the imaginary parts are equal in magnitude, but opposite in sign. Visually, the coefficient is the reflection of its chiral across the x axis: where it may be rotated by 32°, its chiral is rotated −32°, otherwise they are the same magnitude. In the same vein, coefficient 2 is the complex conjugate of N-2, 3 with N-3, and so on. Table 1 shows this for the first and last twenty coefficients of Figure 6.
The Cookbook article Coefficient Values: Part One, investigates this phenomenon in detail. Our purposes are fufilled by observing that the vector sum of complex conjugates finds the equal but opposite signed imaginary parts cancelling each other out, but the equal real parts reinforcing each other; the results are vectors with just real components.
This too should not surprise us. This is a matter of Fourier synthesis reproducing what we initially submitted to Fourier analysis: a data set with a fiat imaginary part set to zero. The chiral pairs in the spectral domain reflect this fiat data by becoming complex conjugates of one another. Had we been possessed of more interesting imaginary data, different results would have obtained.
-fft and -ifft almost never appear singly; they are the means to end and not ends in themselves. Usually they appear in pairs in the machinery of spectral editing scripts. Here are some examples:
— Garry Osgood