Current imaging software (as of 2010) may not use all the available CPU power everywhere they could on multicore or multiprocessors computers. Batch export or batch conversion is one such areas where modern apps are most likely to take advantage of multiple cores, and the next limiting factor is likely to be the disk subsystem: batch processing eight 25MB raw files concurrently on a 8-cores system (two Quad processors) requires at least 200MB/s of sustained disk I/O throughput just to make sure the CPUs will never be out of fresh data, a level of performance that very few system are able to sustain.
Future software may make better use of multiple cores in more places, such as when processing a single image, by increasing the level of parallelism in the program code and reducing the granularity of the parallel regions: today the level of parallelism is very coarse (file level, as in processing multiple files in parallel during batch export operations). Tomorrow a histogram or sharpening or color conversion filter will be computed in parallel on multiple cores while processing a single image, and those operations will take a fraction of the time they takes today, with a dramatic positive impact on application responsiveness.
In fact, some applications already try to optimize CPU usage in specific places, for example the Save-for-Web function of our own FastPictureViewer Pro’s batch file processor processes multiple files in parallel when the program is running on multicore computers. Also, when processing individual files, the optional sharpening filter than can be applied to exported images is itself computed in parallel on up to three CPU cores: one for each of the Red, Green and Blue color channels. Additionally, the batch processor takes special cares to avoid trashing the disks by throttling its data throughput and, while not necessarily topping the CPU usage charts all the time, exhibit a much higher overall throughput that what could be achieved in a standard way where each image is processed in turn. The throughput is also higher than what would be achieved by simply processing as many files in parallel as they are cores in the system, with each file being processed sequentially on a single core.
Application developers need to revisit all the time-consuming portions of their program code and identify the candidates for parallelism. The algorithms then needs to be rewritten in a parallel fashion and sometimes the program’s flow needs to be restructured in a number of ‘tasks’ that can run independently. By doing so, developers increase the level of latent parallelism in the program code, and when said programs are executed on multi core hardware, actual parallelism takes place and the processing time decrease accordingly, increasing the application performances.
Clearly, those changes will not happen overnight. Multithreaded programming, where an application literally executes different parts of itself concurrently, is a very difficult art and fine-grained parallelism is even harder as it requires a complete rethinking of every minute detail of the program’s operations. Finding errors in parallel program code is a whole new challenge in itself and takes the art of debugging to an entirely new level, so it is likely that the shift to truly multicore–enabled parallel applications is going to be made incrementally with small steps at every release, in particular in large existing applications with a huge code base and many legacies.
We are only at the beginning of the multicore/manycore era and much of the benefits are still to come, in the form of software upgrades from our favorite software publishers.