Median-cut

Figure 1: Some clusters with outliers.

If the premises underlying -colormap apply to a particular image, it should exhibit distinct hue clusters: colors grouping around particular median points with a minimum variance, only a small spread of values around the various median points. The key idea is that good clusters exhibit low variance.

The median-cut algorithm divides-and-conquers:

Let n represent the desired size of the palette, which is given as the first command parameter.
Rearrange the pixels from left to right, top to bottom, into a (width*height)×1 image. This image becomes the initial cluster or box₀. At the start of the algorithm, box₀ is the only image on the image list.
For the set { box₀, …, box_j } on the image list:
1. Find box_k with the largest variance. In gmic-speak, this amounts to examining iv (image variance), one of the metrics that G'MIC maintains for every image.
2. Note the color channel exhibiting the largest variance and sort the pixels of box_k into dark-to-light order, using that channel as the sort key.
3. Split box_k in two halves, box_kLand box_kR (the median cut). Implicit in this step, gmic recalculates image metrics for each half, including variance. This removes one, then adds two boxes to the list, a net gain of one.
If there are less than n boxes on the list, return to step 3.
Replace each of the n boxes with its average value and merge the boxes along the x axis to create one image. This constitutes the desired color palette of size n.

If the workings of this algorithm seem like black magic, recall that the signal characteristic of a cluster is low variance. That is, the pixels in a box more-or-less share the value of the median pixel. But step 3 splits boxes exhibiting high variance. Splitting necessarily makes child boxes less variant than parents for in sorting, dark hues go to one and light hues go to the other, and neither child ranges as widely as the parent. Since boxes with low variance aren't split the algorithm tends toward more boxes with lower variance.

Are the results perfect? Almost certainly not; that's why there is the k-mean algorithm.

Now with a broad appreciation of the median-cut algorithm in our kit, lets look at a few code-level features.

The Median-cut Algorithm in G'MIC

Illustration 1: Photo: © Dietmar Rabich, rabich.de, License: Creative Commons BY-SA 4.0, Source: Wikimedia Commons	A photograph by Dietmar Rabich of a window in Haus Ostoff, Dülmen, North Rhine-Westphalia, Germany. Opinions may vary, but most will agree that reds form one or more clusters, as would dark greys and off-whites. We invoke `-colormap 8,1` for an eight color palette.
	-resize {w*h},1,1,100%,-1

The first order of business is to make the initial box, realizing step 2: the pixels of the image are grouped into one row, the height here being extremely exaggerated for illustrative purposes.

-resize generally makes images larger or smaller but can rearrange as well as resize. The '‒1' final parameter disables any pixel interpolation and ensures an in-memory rearrangement of the pixel buffer.

–	_colormap :

The private command, -_colormap, implements the median-cut algorithm. It follows -colormap proper in the standard script library. Its leading underscore hides the command from the interpreter and is accessible only to other gmic commands.

  
  -repeat {$1-1}
    $!,{s}
    -repeat {$!-1}
     n=$>
     -repeat {$n,s}
       -sh[$n] $>
       -=.. {if(w>1,iv,-1)},$n,$>
       -rm.
     -done
    -done
    <see below>
  -done

As observed elsewhere, not all images in a gmic image stack are meant to be looked at. Some are just prosaic data structures, temporary scratch pads and such.

_colormap keeps track of variances in the growing list of boxes with a scratch pad image which we call a variance digest, created at the top of the outermost repeat loop. The inner nested pair of repeat loops, written in darker text, populates this scratch pad in preparation for steps 3a – 3c.

The notation $!,{s} creates an empty digest out of the aether. The width parameter $!, is one of many substitution symbols; the parser replaces this particular symbol with the current number of images on the list – a number not including the variance digest, of course, as it does not exist yet.

The height parameter, '{s}', is a preset math variable containing the channel count of the last image on the list. Altogether, this notation creates a one channel image with a width equal to the number of images on the list and a height equal to the number of channels, this particular datum taken from the last list item. In this case, the channel count is common to all boxes as each has been split off from box₀.

Each column of the variance digest represents a box on the image list, here for six boxes (the median-cut algorithm had run for five complete cycles when this snapshot was taken). The left column corresponds to image [0], the right corresponds to [5] – the sixth and last box. Each row on the list represents an image channel, R corresponds to row 0, G to row 1 and B to row 2. Each of the 18 pixels in this this tiny 6 × 3 image represents the variance of one channel in one image.

The purpose of the follow-on -repeat…-done loop, along with its inner loop, is to read the variances of all 18 channels in 6 boxes and transfer them to the variance digest. Follow-on code then reads the variance digest to determine which channel of which box exhibits the maximum variance.

The argument of the -repeat command, $!-1, sets up an iteration over all the boxes, less the variance digest itself. The inner -repeat loop constitutes a scan of the channels of the currently active box, ($n).

The assignment, n=$> may seem curious, but the substitution symbol $> is context sensitive, referencing the count only of the most proximate enclosing -repeat…-done loop. The assignment to $n fixes the count of the outer loop so it may be referenced within the inner. The assignment n=$> is a bit out of fashion. In modern releases, -repeat can take a variable label as a second argument which can be referenced within the loop to obtain the loop count. Thus -repeat {$!-1},n defines $n, the loop count, making n=$> unnecessary.

The assignment of a particular channel's variance to a pixel in the variance digest occurs in this inner loop. -sh[$n] $> temporarily shares channel $> from box $n on the image list, placing it after the variance digest.

The -set command (alias: -=) assigns the variance of channel $> of box $n, to the variance digest pixel at column $n, row $>, unless a particular exception ensues. Successive splitting may leave the current box, $n, with a width of one pixel. The fiat assignment of -1, an impossibly low variance, to the corresponding variance digest pixel ensures that this ultimately divided box cannot ever again be a candidate for splitting.

The previous paragraph's narrative condenses to:

-=.. {if(w>1,iv,-1)},$n,$>

which may seem manifestly terse to the beginning gmic programmer, and perhaps mysterious. -= is a command shorthand for -set and '..' is selection shorthand for [-2]. The value being set depends on a math-based if function, which resolves to one of two possible values, the variance of the current box, iv, or -1, depending on whether the width of the current box is greater than one. Once the assignment for variance has taken place, -rm. removes the shared channel aliased by -sh[$n] $>. Since it was shared, its removal does not involve the actual erasure of the channel, just the elimination of its alias on the image list.

A value has been set in the variance digest

 -repeat {$1-1}
    <see above>
    c={C}
    b=${-arg\ 1,$c}
    a=${-arg\ 2,$c}
    -rm.
    -shift[$b] 0,0,0,{-$a},2
    -sort[$b] +,x
    -shift[$b] 0,0,0,$a,2
    -split[$b] x,2
 -done

With the variance digest fully populated, _colormap can address the key step of the median-cut algorithm, 3a – 3c: find the box with the largest variance (a) sort its pixels by the channel exhibiting that variance (b) and then split the box into one with the dark pixels and the other with the light. (c)

Piece of cake.

The right hand side of the first assignment to variable c is an image feature substitution, though it bears a superficial resemblance to a math expression. The giveaway here is 'C', which is not a predefined math variable but the coordinate list of the lightest pixel in the image, a feature substitution. These substitutions have the form '{<selector>,<feature_identifier>}' The selector is an integer that picks a particular image on the list, counting from the beginning if zero or positive, or the end if negative. It may be omitted. In that case, the last image on the list is the implied selection. This last image is the celebrated 6 × 3 variance digest populated in the previously discussed block of code. The coordinate list in C, x, y, z, c addresses, respectively, the column, row, slice and channel holding the highest value element in the variance digest. See section 1.7 of the G'MIC Reference Guide for the other image feature substitutions.

Given the organization of the variance digest, the lightest element corresponds to the box containing the largest variance. Thus, the first assignment implements step 3a in the median-cut algorithm.

The next two assignments harness command substitution and variable substitution. Variable substitution is both simple and fundamental to G'MIC. Place the variable on the left hand side of an assignment, without the '$' signet, to set its value. Anywhere following, reference that value by applying the signet. The parser stops and throws an exception if it finds itself referencing a variable before assignment. Thus, the parser substitutes $c with the list x, y, z, c.

To pick out an individual element from that list, we use the -arg command. Though commonly used to dissect G'MIC command lines, these are just lists and -arg works with any list. In the initial assignment, -arg fetches the first element of list, x, selecting the box with the largest variance. A new variable, $b acquires this value. In the following assignment, -arg fetches the second element of list, y, selecting the channel in box $b with the largest variance. A new variable, $a, acquires this value. With $b and $a set, there is no further need for the variance digest; -rm. disposes it from the image list.

Though we have described it as if it were so, the assignments to $a and $b are not immediate but call for an execution of the subsidiary command line embedded in the curly braces – a command substitution. This runtime evaluation of G'MIC code is necessary because the data that shapes the course of the algorithm are not available until the script evaluates a particular image. Before hand, there is no juncture where we can simply write $a = <what, now?>. We don't know anything about <what, now?> when we are composing the script. Runtime evaluation enables us to establish <what, now?> once we have the specifics of particular images. But since this acquisition requires the script to be running, and we can't stop it to write assignment snippets on the fly, our recourse is to write the snippets beforehand in the cloistered environments of command substitution. Runtime evaluation is the principle reason the command substitution construct exists and is fundamental to G'MIC. It usually escapes our attention that G'MIC routinely runs -arg in a command substitution in order to parse the command line, another case where we don't know what the particulars are until the user runs the script. In practice, when the parser encounters a sequence of G'MIC commands delimited in a pair of curly braces, it re-invokes a copy of itself. That parser works entirely within the scope of the curly braces, but its behavior differs little from its parent parser. When it parses and completes the snippet, it returns just the result of the last command in a temporary status variable. It is the value of that variable that is assigned to $b and $a.

Then what of that '\'? It escapes the space character from its usual meaning, to wit: a command line item delimiter. Very early in the transformation of command lines, G'MIC decomposes the whole command line into an item list; at this stage, it hasn't established which items are commands or arguments. It just needs to know where items begin and end. Space characters furnish those markers. However, care must be taken with command substitution, which embeds a command line within a command line. The interpretation of that command must be deferred, hence escape characters to ensure that a space is taken to be a space, and not a delimiter. Without that escape character, the G'MIC command parser splits the command substitution block into two items, and later will have difficulty interpreting them, throwing an exception.

The -sort command bracketed by two -shift commands implement step 3b: sort pixels by the channel exhibiting the largest variance. In pixel sorting along spatial axes x, y, and z, the primary sort key is channel zero. The values of the others follow their channel zero counterpart to their new positions. If another channel has to be the primary sort key, then one applies -shift to temporarily move it to channel zero. In this maneuver, one should remember to set the last -shift parameter to 2, which selects the cyclic boundary policy. If channel zero should shift to channel one, then channel one shift to channel two and channel two cycles around to channel zero; no data are lost. The other policies lose data in interesting ways, but probably not in manners you'd like.

Finally, the last line implements step 3c, splitting box $b from dark to light in accordance to the channel identifier in $a. This obliterates the parent box from the image list, replacing it with two children, one with the darker values and the other with the lighter. In any case neither child exhibits as large a variance as the late parent. In net, the image list gains another box, and if the number equals that requested for the palette, the algorithm drops from the loop and commences its exit. Otherwise, it returns to the step 3a, searching for the next box with the greatest variance…

Before we close this walk through, it may be useful to step back and look at a schematic of how the algorithm regrouped the pixels of the Haus Ostoff window into prototype color swatches. We adopt a <number>,<letter-sequence> subscript which documents the cycle when the box split from its parent and a sequence of L(eft), R(ight) characters to indicate its location in the splitting tree: a kind of address which follows the branching from box₀ to the destination box. The number of these indicators directly shows the volatility of the box. Thus, Box_7,LRRR and its ancestors were subjected to 4 splits, while Box_2,RL led a quieter existence. It should come as no surprise that the pixel colors of Box_2,RL closely resemble the hues of the leaves growing around the window, a large area reflecting a particular red. Less certain groupings center on the dull greens of the porcelain fruit sitting on the bench in front of the window. Those colors do not separate very clearly from the shadows or some of the darker colors of the gravel path, evidenced by the volatility of the box that contains them.

The construction of the palette is trivial. The first line takes all the boxes that were on the image list at the exit of the processing loop and resizes them to one pixel, using average interpolation, so all of the various, or not so various pixels of a box blend to their median value. That constitutes the color swatches.

The second line appends the separate, one pixel square images together along the x dimension into one image, completing the palette of colors, as required.

  -resize 1,1,1,100%,2
  -append x