Sunday, 30 January 2011

Modifying and analyzing colors in PDF files using podofocolor

What is podofocolor?

Podofocolor is the newest addition to the podofo-tools package. It is a command-line tool to analyze and/or modify all colors in a PDF file. This can be done using predefined rules or based on a custom Lua script.

Basically, podofocolor opens a PDF file and goes through every page or vector graphics object (e.g. an XObject) and looks at every PDF command. Whenever it encounters a colorspace definition or a PDF command, which sets a color for a following PDF operation like “draw a line” or “fill area with color”, an action can be performed. These actions are either predefined actions or can be defined by implementing a C++ interface or more likely by providing a Lua script. Predefined actions are “convert this color to grayspace” or “print color name to stdout”; however more complicated actions can be easily created as well. As can be seen by the “grayscale”-action, the most powerful feature of the tool is to replace colors in a PDF file. Custom color conversion algorithms can be implemented in Lua and be immediately applied to any PDF file.

How is it useful?

There are different use-cases for such a tool and I assume users will come up with even more options. Possible usage scenarios that come to my mind can be categorized in two areas: analyzing colors and modifying colors.

  • Analyzing colors

    • Find out, which colorspaces or colors are used in a PDF

    • Verify that certain colors are not used in a PDF

    • Verify that only CMYK or ICC-based colors are used in a PDF

  • Modifying colors

    • Convert colorspace of a PDF (e.g. convert it to grayscale or CMYK)

    • Convert colors in a PDF to certain corporate colors

    • Split one PDF file into four different PDF files, where each file represents one component of the CMYK colors used in the PDF. As a result, you will receive one PDF containing only the cyan color channel, one containing the yellow one, etc..


The usage of the command-line tools is simple:

./podofocolor [converter] input.pdf output.pdf

Different values are possible to be used as a converter. The table below lists all converters which are currently available:

Converter   Description
dummy   This is an example implementation of a converter in C++, which will convert all colors in a PDF to RGB red.
grayscale   The grayscale converter changes all colors to its grayscale equivalents in a grayscale colorspace.
lua planfile   The Lua converter is the most powerful one. It takes a lua file as another parameter. This Lua file provides the color conversion descriptions implemented as Lua functions.

For example, to convert the colors in a PDF file using the included example.lua file, you would use the following command:

./podofocolor lua example.lua input.pdf output.pdf

Writing own converters

For the tool to be really useful, you will have to create your own converter. This can either be done by implementing the C++ interface IConverter or by creating a small and simple Lua script. If you consider creating a C++ implementation of the interface, the included Doxygen comments will be enough to get you started (Yes, it is that simple! For example, the grayscale converter consists of only 44 lines of source code and most other conversions will be the same size), so we will skip the C++ part and go straight to Lua.

Lua is a very simple, yet powerful, scripting language. To get started, it is best to download the example.lua file included in PoDoFo. It contains all the necessary function definitions, which you can adapt to your needs.

We will start with a short example: whenever podofocolor finds a definition of a stroking color on a PDF page (i.e. a color which is used when drawing lines or curves), it will call one function in the Lua script. The function called depends on the colorspace of the color definition. Currently, there are three different functions that can be called. set_stroking_color_gray will be called when a grayscale color is defined. Similarly, set_stroking_color_rgb or set_stroking_cmyk are called.
The example below shows an implementation of set_stroking_color_rgb with a rather simple implementation. The function gets the three parameters r, g, and b, which refer to the values of the red, green, and blue color components. The values are in the range of 0.0 to 1.0 as it is common in PDF files, where (0.0, 0.0, 0.0) is black and (1.0, 0.0, 0.0) is red. Now to the concrete function implementation: It checks if the passed color is black, if yes it returns a tuple with four values – which is a CMYK color – and thereby converts any occurrence of RGB black to CMYK black. For all other color values a tuple with three values is returned and the RGB color is not changed. Another option would have been to return a tuple with a single value and thereby convert the color to a gray value.

function set_stroking_color_rgb (r,g,b)
-- convert all black rgb values to cmyk,
-- leave other as they are
if r == 0 and
g == 0 and
b == 0 then
return { 0.0, 0.0, 0.0, 1.0 }
return { r,g,b }

Other functions in the script provide information about pages, objects, etc...


Currently this tool does not convert images embedded in the PDF file. First of all, the focus of the tools is on modifying colors in PDF files and secondly, there are other tools, which can modify colors in images and/or work with images embedded in PDF files. If there is demand for such a feature, it can be easily added. Podofoimgextract, another PoDoFo-tool, is a good example of how easy it is to work with images using the PoDoFo API.


Podofocolor is currently available in SVN trunk. Instructions on how to get and build PoDoFo trunk can be found on our website. It works on all supported platforms, including Windows and Unix systems. We are interested in your feedback! Feel free to drop a mail containing your feedback, comments, or suggestions to our mailing-list