One of the most often overlooked aspects of computer performance for image processing is the size of the processor cache. Sure, the clock speed is important and so is the total amount of memory in the computer as well as the speed of the disk drives, but image processing algorithms crunch through huge amounts of data – millions and millions of pixels – and their speed of execution is bound to memory access speed more than by any other factor, assuming reasonably well-built mid-range-or-better computer.
What is the processor cache and how does it affect performance?
Image data is stored in main memory (RAM). The processor accesses the RAM through the memory bus, a communication channel that allows the data to move back and forth between the processor and the memory itself. Typically, when operating on pixel data, an image processing algorithm – say a sharpening filter – needs to grab some pixel’s values, perform some arithmetic operation on the values and store the resulting pixel back to memory. This process is then repeated with the next pixel and so on, until the entire image – or the entire selection – is processed. Worse, the algorithm often needs neighboring pixels (say the ones above and below the current pixel, as well as those to the left and to the right, plus the diagonal pixels) in order to compute the new value of a given pixels, causing the number of memory accessed to literally explode: that’s 8 extra read accesses for a small 3×3 convolution kernel raising the total number of memory reads to process a 12MP image to about 340 million memory reads to access all 3 color planes, or close to 1 billion memory reads for a 5×5 kernel running over the entire image.
The problem is that the memory bus and the memory itself do not operate as fast as the central processor. For example the processor clock may run at a speed of 3GHz but the memory may only run at 1333Mhz, or 1.3GHz, which is considerably slower. The processor essentially waits until the memory responds and transfers the data, thus, by some stretch of imagination, one can think the computer operates closer to the 1.3GHz memory speed than to its 3GHz processor speed when is accessing memory very often, as when running image processing filters.
To mitigate this performance bottleneck, processor manufacturers added a small amount of very fast memory close to the processor itself. When accessing main memory the cache is filled with data under the assumption that it will be needed again, and on subsequent accesses the data is returned from the very fast cache if it’s still there, which spares the processor to reaching out to main memory and waiting for it to react and return the data.
A complete description of CPU caches is beyond the scope of this article but it boils down to the following: the larger the cache the more data it can keep ready for fast access and the less slow down there is when accessing memory. When the data is not is the cache – either it was not accessed yet, or it was accessed a relatively long time ago and was discarded since then, a “cache miss” situation occurs and the main memory is accessed to retrieve the data.
The above is only a tiny part of the story as the cache also stores memory writes and the whole cached memory system is actually quite complex, in particular in the presence of multiple processors with separate caches that must be kept coherent etc, but again, we don’t need to dig too deep in the details: what we need to take with us is “the more cache, the better”.
Pentium 4 processors came with 128KB of cache, then 256KB and finally 512KB. Contemporary processors like the Intel i7 can have up to 12MB of cache, or 24 to 96 times more than the old P4’s. Needless to say, just because of that, an i7 will outperform a P4 by a considerable margin (huge, in fact) even if clocked at the same processor speed, and in cache misses situations the current RAM chips, running at 1333, 1600, 1800 or 2000MHz will supply the data considerably faster than before, adding to the overall performance.
Image processing is a bit special special in the sense that the algorithms needs to access data all over the place (for example when accessing pixels above or below the current one, the memory locations needing to be reached are quite far apart from each other), in technical terms one say they exhibit poor locality of reference. The consequence is that this type of behavior quickly overwhelms small memory caches and makes them much less effective. With relatively large cache sizes (8MB or more) a significant portion of the image can be kept right in the processor cache, dramatically speeding up access to pixel data. To keep things in perspective, server processors such as the Intel Itanium have a relatively low clock speed (about 2GHz) but a gigantic 24MB CPU cache, making them ideally suited for intensive number crunching over large data sets.
We are now ready for a quote from Intel on the subject “In recent years, the resolution of image sensors has increased significantly, but image processing algorithms have not improved in efficiency. The bottleneck for nearly all image processing algorithms is memory access, and access time increases significantly when image data resides outside the L2 cache. Even with more computing power available, the increased size of the images impacts the ability to achieve high performance due to the increased number of cache misses”.
Wrap up and conclusion
Memory accesses represent a significant bottleneck for image processing and a large CPU cache helps mitigate the issue. If given a choice, always pick the processor with the largest CPU cache, all other things being equal it will perform better for image processing than those with a smaller cache, even if the latters are clocked faster! Of course, you also want multiple cores, say 4 or more, and a modern 64-bit operating system with plenty of memory (12GB seems to be enough at the time of this writing) to gain the ability to smoothly multitask multiple memory hungry applications such as FastPictureViewer, Adobe Lightroom, Nikon Capture NX and Adobe Photoshop all together, with enough room to run your favorite productivity applications like email and instant messaging on the side.
RELATED: The More Cores, the Better.