SmallGif.com

An experiment in compact .GIF files

The "Graphics Interchange Format" was one of the first compressed colour image formats. Originally introduced in 1987, and updated in 1989.
The later specification clarified what was allowed, introduce a tranparent colour, and introduce time delays to control animated (multi-image) .GIF files.
It's still the most popular and supported format for animated images on the web. There are millions of .GIF images in use today.

Why

Many years ago I needed to read and write images for a project I was working on. There were a lot of images, and I wanted to use a compressed lossless format. I found public domain reader and writer apps, but quickly found some limitations of them. In particular with animated .GIFs and also with files that didn't quite adhere to the standard.
So I ended up writing my own versions. But back then we were in the dark days of the Unisys patents, so when the project finished, I put it all to one side, just using it for a few internal projects. Now that those patents have expired, it's time to put this code out into the big internet world.

The source code

Here is the source code it's royalty free, you just need to credit me. The reading code is quite stable, and has been tested with a wide range of random images from the internet. The writing code uses a direct lookup table, if you have enough memory, or a linear search if the malloc fails. The reset decision is via a function call, so can be replaced.
Here are the files: If you are building a multi file project which needs to access these functions from several files, then include the file from all the places that need it, put prefix all but one with "#define NO_FUNCTIONS" to avoid multiple function definitions.

To read a .GIF file

For most applications you should only need to call two functions: See the gifinfo.c example in the .tgz file.

To write a .GIF file

Most applications will normally call the following functions: See the gifgen.c example in the .tgz file.

The gif_file_t structure

The gif_file_t structure contains uNImages pointed to by psGifFile->psImage[0...uNImages-1]
Each of these images will have 1 byte per pixel in the pLines[image.height-1][image.width-1] array.
See the gif_file_t structure for more information,

Compressing a single image

Something which isn't immediately obvious, is that there are different ways of compressing the same data. The .GIF file format has a well defined de-compression technique, but there are options during the compression. The LZW technique was originally designed as a stream compressor, so traditional compressors usually only use the previous data as a guide. Because memory and computing power are much greater now, it's possible to use a multi pass approach that takes the later data into account as well. Using this technique it's possible to find a smaller compressed data set for a given image.
For a single image:
  1. Find the valid images modes
  2. Compress the various data sets, using variable reset positions to minimise the compressed data size
  3. Write the smallest set to the output file

Compressing multiple images

For multi image files, this much more difficult. Optional transparent coloured pixels give a lot more options.
If a given pixel in the the previous, and current image have the same colour, then that pixel can either be encoded as the colour, or as a transparent pixel.
For example lets assume we have two images, of 4 x 1 pixels, that use a global palette. Lets use numbers 1..4 for the colours, with T for transparent.
Images:
  1. 1234
  2. 3434
We can code image #2 as: Often using transparent gives better compression, as repeating values usually compress well. But in this case, using 3434, means that the data contains the same pattern twice, so it's likely to compresses better. There are also bit depth considerations, as 34TT needs 3 different colours, so 2 bits per pixel, where as 3434 only has 2 colours, so only needs 1 bit per pixel. So you can see, optional trans-cols pixels make finding the optimum set much more complicated.
Using the global palatte can save more bytes, but it's a trade off: Merging multiple image's colours into the global palette usually means that the bit depth for the individual images grows. So deciding which images to merge into the global palette usually isn't simple.
When there are more than 2 images, there can be a number of different image mode options. This gives different sets of images to get from one image to another, this makes it even more complicated.

The ultimate compressor

One day I'll get the optimised writer software in a reasonable state and publish it as well, but for now here is the overview:
  1. Decompress the source into a sequence of RGB images
  2. Scan the images to find valid sets of mode sequences (the modes 1,2,or 3 for each image)
  3. Go through the mode sets creating image sets to get from one image to the next, compress them, and try to optimise the optional-transcols used.
  4. Work out the palette bytes required, and recompress the cached data with more bits if the merge into the global palette grew the number of image bits.
  5. Walk though the mode sets, and find the minimal set
  6. Add a SmallGif.com signature to the image
  7. Write the smallest image set out
It sounds really easy, but steps #3 and #4 have too many options to brute force, so it's exceptionally difficult to find a good compromise for a large set of test images!

Take a look at some of the sample images to see what I'm experimenting with.


This page was lasted updated on Monday, 11-May-2020 20:10:28 BST