Mako Extractor

Mako Extractor is a variation on Mako Splitter. Whereas Mako Splitter divides the source file into a number of files containing the same number of pages, Mako Extractor allows for a bespoke division of the source document, expressed as one or more page ranges.

TEXT

Mako Extractor v1.0.0.X

Usage:
   Makoextractor input.xxx [output.yyy] [parameter=setting] [parameter=setting] ...
 Where:
   input.xxx          source file from which to extract pages, where xxx is pdf, xps, pxl (PCL/XL) or pcl (PCL5).
   output.yyy         target file to write the output to, where yyy is pdf, xps, pxl or pcl.
                        If no output file is declared, <input>.pdf is assumed.
   parameter=setting  one or more settings, described below.

Parameters:
   pw=<password>      PDF password, if required to open the file.
   p=<page range>     Multiple page ranges separated by a comma, eg 1-10,5-30,92,600-720,40-50.
                      -- Do not include spaces
   m=yes|no           PDF only: Copy source document metadata (title, author etc.) to output files (yes)
                      -- Default: no
   n=yes|no           PDF only: Copy source document named destinations to output files (yes)
                      -- Default: no
   q=yes|no           PDF only: Use incremental output when writing PDF (yes)
                      -- Default: no
   f=yes|no           Create a folder to contain the output, named according to the output file name (yes)
                      -- Default: no
   s=yes|no           Use a single thread to write the output files (yes)
                      -- Default: no (use all available threads)
   d=yes|no           Use a deep copy of pages, ie copy bookmarks and form field metadata (yes). May negatively impact performance.
                      -- Default: no

How it works

The program creates a list of "jobs", each of which consists of a number of pages, determined by the page ranges. These are scheduled by a threadrunner, initialized with the number of available threads. Jobs are scheduled until they are exhausted.

Metadata (Title, Author etc.) can be copied, as well as Named Destinations.

Incremental output

Another important distinction is that Mako Extractor supports PDF's incremental output. Incremental output means that changes are appended to the end of the file, then pointers etc. are adjusted so that redundant content is ignored, but remains in the file.

Incremental output can dramatically reduce the time it takes to process a file, but at the expense of increased file sizes. To make use of incremental output, it's necessary to avoid writing a completely new file, so to extract a page range Mako Extractor removes all of the pages either side of the specified range then incrementally saves the file. Quick, but not necessarily efficient in terms of file size.

Multithreading

Mako is thread-safe, which is to say multiple instances of Mako can be run on separate threads without the threat of clashes over temp file storage, memory allocation etc.

This example make use of multiple threads to write the output files. It creates a vector of jobs for each extraction task, then spawns multiple threads to run the jobs to completion.

Useful sample code

Threading pattern
Incremental output (PDF)
Metadata (PDF)
Named destinations (PDF)