Gif Selfies and Colour Quantization

I'm working on a project with gifs because I really do like them very much. The beginning of this project is reading from my webcam, downscaling the image, and reducing to 256 colors so it can fit in the gif. So that's what this is about!

We'll start at a grayscale selfie. After helping to fix a bug in the webcam crate I was using (I don't contribute a lot so it was nice! i liked it), it was pretty easy to shove it into a gif. I used neam to downscale the image and gifed to do all the gif related things. Grayscale isn't why we're here though, let's get on with colour!

neam and gifed are two of my image-related projects. neam is used here as a library for downscaling images using Nearest Neighboor and gifed is for encoding gifs. that's all you really need to know about them to make it through :)

A short GIF of me holding a candle close to my face and waving. The candle provides the only light so it and I are all you can see. The apparent framerate is only 1 frame per second.

Colour, oh no!

I'll keep this breif 'cause I don't think I am particularly qualified to talk about it but there are images from it I want to show, so it gets a breif mention.

For awhile I thought the colour format was NV12, which is a kind-of-weird version of YUV420. This is a good explanation of NV12.

What I was getting instead was YUV422 in the UYUV format (explanation of yuv). Deriving RGB from this was not easy, but that's only because I thought it was NV12. Once I figured out what I was working with it got a lot easier. Here are a few images from working through it. I think they're rather nice.

A set of four images in a 2x2 grid showing the progress from "entirely wrong" to "almost correct". We start at a very magenta, nonsensical image. There is a severe screen-door, moire effect covering the entire image, but some semblance of shape can still be made out. The next image looks more correct. It's still highly magenta but now there are islands of darker meganta that seem to want to form objects. The third image fixes the magenta, but now we can see that things are very misaligned and the colour is all in the wrong place. There's patches of yellow and red and green and blue that don't align with any shape. We can see that these are pictures of me and I'm in a room. There's a book in front of half my face and the text is even readable! It's "Untethering The Web" which is a book in the Software For Artists series. In the final image the full frame is finally visible. The colours are still misaligned, but the luma is in the right spot so all the shapes are correct. In addition to the book you can see I'm holding a retractable dog leash and an old RCA vacuum tube box. The colours of these items can't be made out, but I can tell you that they're green, blue, and red. An object for reach RGB channel.

Colour Quantization

Before we get into the GIFs, we have to stop at colour quantization. It's the process of reducing the unique colour count of an image. We have to do this because the maximum size of a GIF palette is 256 colours. Which is a lot less than images typically have!

The most popular way to do this, I think, is to use an algorithm like k-means clustering. I didn't use k-means, so I won't talk about it. But it's nice to know what's out there. If you want to know more, the kmeans-colors crate readme is pretty good. Also this bit of Wikipedia.

colorsquash

I didn't use k-means. What I did is probably worse, but it was more fun! I use a thing I wrote called colorsquash.

Maybe the hardest part of quantization is picking the palette. What 256 colours out of the ~16.8 million would be good to represent this image?

I don't know, which ones are most common? Yeah, really.

colorsquash counts up how many times each color occurs and then sorts them most common to least. We don't want to just take the most common 256, that'd get a lot of very similar colours, so it uses a tolerance to exclude some of them.

It will only add another colour to the palette if every colour in the palette is no more than 1.3% similar. There was no method to choosing this number. I slowly adjusted it until it either stopped picking 6 colours, or stopped making the entire image barely different shades of the same colour. Those are the two extremes that, I guess, 1.3% sits between.

There's more work to do after we pick the palette; the image itself still has to be mapped to the new, very reduced colourspace. Initially, colorsquash looked through the entire palette for every pixel and chose the most similar colour. It was written to play with images from my DSLR which're roughly 6000x4000, or 24 million pixels. If it checked all 256 selected colours with all those pixels then it ends up running 6.1 billion calculations. That's too many!

Good, then, that a lot of the colours are the same. The most common colour may occur tens of thousands of times. That's a lot of wasted computation that doesn't need to happen; it only needs to compute the color difference once really.

So I traded space for time and allocated a Vec<u8> that was 256^3 in length and then stored the index of the selected colour at a location equal to red + green * 256 + blue * 256 * 256. I'm more than happy to take ~16MB of ram to get an 11x speedup. at least that's what I think it ended up being? I did all that awhile ago ^^;;

There's a problem

Okay okay I swear there was a point. Every frame from the webcam is decoded into RGB and then sent through colorsquash to reduce the color palette. Only one palette is used for the entire GIF. It's called the Global Color Table (and I'm mad you can't modify or swap it out on the fly, but whatever).

If I didn't use a Global Color Table I'd have to pass a colour table on every image and that's 768 bytes I don't want to spend. At 10fps it's 7.6KB. gif is already inefficient so I really don't need to help it use more data.

The palette is picked once on the first frame because it's an expensive operation. At my target framerate of 5fps we only get 200ms to do everything we need. That's a pretty good bit of time. "Pick a palette", however, is not all that needs to happen, so every frame after the first would only map the image (not picking a whole new palette).

CW for body horror? The area around my eyes and mouth is just a noisy black, so

Well thaaaaat's not right. I fought with this for awhile and produced a number of gif in the same style. It's kind of beautiful! Image bugs are my favorite. Here's one I liked quite a bit. ( a friend suggeted I try black lipstick after seeing this one :3 )

CW for body horror for the same reason.

a similarly broken gif as the last. I have my eyes closed facing the camera, head slowly angling downward. I think I was singing? The black noise appears and covers my eyes and mouth and nose.

I don't know if you can see. Can you? The first frame? It looks fine. Great! in fact. What on earth.

Remember how I said it wasn't picking the palette on every frame? Well that has come back to bite! When the colorspace is mapped into the really-kind-of-quite-large Vec, it only maps the colors it's seen already; it only assigns a palette-index to any colour seen in the first frame. The rest are left just as they started: index 0. I set index 0 as black myself so I could have the top/bottom bars because then the gif could be square (and i like squares)!

So I fixed it! For every unique color in every image, it finds the closest colour in the palette and adds it to the colour map. Maybe I'll do the entire 16.8 million colours on the first frame, it might be smart, but that's how it is right now. So here's my good face in glorious 256 colours.

My smiling face looking at the camera, waving, and looking back down at my integrated terminal in VS Code (unseen). It actually loops pretty well. The colours are finally correct. There is no kind of disturbing black noise and it looks like a regular, short-fps video. just in 256 colours.