(v13) Initial performance tuning recommendations
This page applies to Harlequin v13.1r0 and later; and to Harlequin Core but not Harlequin MultiRIP
The only reliable way to identify the absolutely optimal configuration is to run timing tests with a variety of representative test files using RIP configurations that are the same as those that are to be used in production.
The details of that optimal configuration will vary by the contents of each job, depending on whether it is very image-heavy, meaning that bandwidth between main memory and the CPU is a deciding factor, and/or if it includes constructs that require a lot of processing power such as live PDF transparency. This means that you should average across a range of carefully selected files, with due consideration for the kinds of files that both your most challenging and least sophisticated customers are likely to be printing. If possible, you should also try to predict how that job mix will change over the next few years; in many markets that will probably be an increase in graphical complexity and in the use of variable data printing.
The following guidelines should give you a good starting point to base those tests around:
- Two key considerations are the number of RIPs to use, and the number of threads to allocate to each RIP. As a general rule, if you're using multiple RIPs on a server, the optimal thread count per RIP will be relatively low (that is, in the range of 1 to 4).
- The number of RIPs should be around the number of physical cores available minus one (for the OS), divided by the selected thread count per RIP. If your CPU(s) support Intel® Hyper-Threading Technology, this number can be increased by maybe 5-20%; it should not be doubled because the impact of memory bandwidth and cache flushing is significant in that context.
- The memory allocated per RIP should be somewhere around (PhysicalRAM1GB)/RIPCount , where the 1GB set aside is for the operating system. As a guide, most of our OEM partners allocate between 3 and 16GB RAM per RIP, depending on job complexity and the dimensions of each piece of output in device pixels. It may be easiest to start by defining RIP count as (PhysicalRAM - 1GB)/5GB , and then calculating threads per RIP as floor((PhysicalCores-1)/RIPCount) .
- If you're running a single RIP and there are no other processes on the same CPU that are processing significant volumes of data, we have found that setting MaxBandMemory at (or just below) the lesser of the size of the Level 2 cache on the CPU(s) and (the size of the Level 3 cache, divided by the number of RIPs running on each CPU) produces optimal results. If you're running multiple RIPs on the same CPU, or there are other processes (for example, post-processing rasters from the RIP) on that CPU we seen performance continuing to rise even with significantly larger values of MaxBandMemory, because the CPU caches are constantly being flushed anyway. Band sizes should be set using MaxBandMemory (not by BandHeight) so that band memory and height are set appropriately for a range of different page sizes. You may still set BandHeight to a negative number if it is important to ensure that the number of lines per band is an integer multiple of some value that is specified in your system.
These guidelines should allow a first pass selection of RIP and thread counts, based on the installed RAM and core counts … or provide a starting point for planning hardware selection if you have a target RIP count in mind. Try to perform the same set of tests on multiple different configurations scattered around this starting point so that you can select the best result.
NOTE: All of these recommendations assume that there is no significant additional processing on the same server as HHR. If there are other processes (including those associated with the raster backend in the RIP) which take significant CPU cycles, or which use significant amounts of RAM, then the preceding recommendations must be adjusted accordingly.