The Metadata Working Group, a consortium of leading companies in the digital media industry, is working hard to establish metadata interoperability standards. The benefits of such a gigantic effort are obvious but adding metadata within existing image file formats is less than evident as most of these formats were never designed to be edited in-place. This is true of the JFIF container, used to store images compressed in the ubiquitous JPEG format, as well as for TIFF and all TIFF derivatives (DNG, TIFF/EP, and all current camera raw formats such as Canon’s CR2 or Nikon NEF, just to mention two).
These container formats were designed to be extensible in the sense that it is possible to define new content parts by creating new tags, and this extensibility worked well already as it allowed adding EXIF data to JPEG images when EXIF was invented, for example, without the need to significantly alter the JFIF container format specifications.
However, when it comes to the physical implementation, the files are laid out as a set of contiguous segments, and adding new metadata elements to an existing file is not possible.
Let’s say an image does not contain copyright information and an application is used to add this information afterwards. With the current file layouts, the application needs to extend the size of the EXIF segment in order to make room for the additional data. The larger segment will no longer fit at its original place in the file, and the application needs to build an entirely new file, deconstructing the original file by splitting it into its various constituents and building a new file from scratch, moving file segments one by one in the correct order, and of course substituting the original EXIF data segment with the new, lengthier one.
One unfortunate side effect of this “transcoding” operation is that it shifts parts of the file further back to make room for the additional data at the front, thus the new file is substantially different from the original and all internal offsets must be adjusted accordingly. (“Offsets” are values indicating the location of various elements in the files, written at file creation and used by file parsers to find their way through the various file constituents).
Camera manufacturers insist to store opaque data segments in their files. Those segments, known as Maker Notes, are present in JPEG, TIFF and raw files created by all digital cameras and contain undocumented proprietary information – you can think of it as private metadata.
The problem is that 3rd party applications does not really understand the content and meaning of this private data and cannot “fix” it appropriately when reconstructing files with an altered layout.
As a result, more often than not, files that were transcoded to make room for additional metadata are left in subtly broken state: while whey may appear to work well with some applications (in particular, with the 3rd party application that edited them), manufacturer software may fail to recognize the files afterwards, or may fail to be able to correctly interpret the private data they contain since the offsets still refers to the original file layout that was created by the camera’s firmware, and not to the layout reconstructed by the application that added the extra metadata. One exception to this is the manufacturer’s own software, one example of which being Nikon’s Capture NX, which knows all details about Nikon’s proprietary files and can of course rewrite them properly.
There is no easy solution to this problem. Adobe’s DNG effort, for example, does not help at all since the Maker Notes are stored as-is in DNG, and are just as opaque as they were in the original file.
Around 1996, Microsoft came up with an attempt whose intent was to mitigate the problem: they created the OffsetSchema tag (59933), providing a well-known place where an application would write the amount of bytes the Maker Notes were moved in the new file with regard to their original location within the original file.
The idea was that file readers who actually understand the Maker Notes (i.e. the manufacturer software) would have a chance to repair the internal offsets by looking at the OffsetSchema tag and, knowing how far the data was moved, be able to fix any opaque pointers by the correct amount and restore the meaning of the moved data.
The creation of this field got some bad press initially, people accusing Microsoft of subverting established standards while their intention was simply to provide a reasonable solution to an existing problem they did not create in the first place.
This attempt was not without problems, however: the main one was that it required a worldwide coordination from all imaging software vendors who’d create (or appropriately update) the OffsetSchema tag when transcoding files.
This never happened and as a result the new tag is not always present in transcoded files and, when it is, its content cannot be relied upon as the files may have been subsequently modified and rewritten by any number of other applications that failed to update it (the default behavior in the presence of unknown tags is to ignore their content and copy them as-is).
To make a long story short, adding metadata to a file requires rewriting it entirely to make room for the new data and often breaks part of the files content in subtle and undetectable ways that can never be reliably repaired afterwards.
During the first transcoding of a file, smart imaging application adds a certain amount of padding data at the appropriate places, right at the end of the metadata segments, leaving some empty space in the otherwise compact file layout to allow some further growth of the metadata without forcing the need to rewrite the entire file again for each update.
Not all application adds this padding data while transcoding files and not all applications take advantage of it to perform in-place metadata edits when the padding room would be sufficient to accommodate the additional data.
The padding also needs to be properly managed, for example if an application adds 1KB of metadata and finds that 4KB of padding exists in the file, it must update the amount of remaining padding to be 3KB. Also, shall the available padding be too little to accommodate the new metadata, an entirely new file must be created yet again and a full complement of fresh padding must be added to the new file, in order to allow fast in-place editing for the next few updates. Unfortunately, current digital cameras never add padding, forcing file rewrites when additional metadata is first added to out-of-camera files later on.
One could think that it would suffice to somehow append the data at the end of the file but the current container standards requires that tags are stored in ascending numerical order, so the EXIF segment must stay at the same logical place in the file and cannot be relocated to the end. Such a scheme could also create issues as multiple versions of the metadata would be present in the files and, for one or the other reasons just mentioned, would likely break many existing software applications.
Another little known issue is that many raw converters (at least all those based on the popular DCRAW source code from David Coffin) rely on the actual file size to identify certain raw files or camera models. Increasing the file size even a single byte would break this detection scheme and pose problems to many existing raw conversion software.
Finally, rewriting a 25MB raw file (or a 80MB TIFF) just to add 1KB of copyright information is inefficient at best and also pose problems with compressed formats as the compressed parts must be moved losslessly from the original file to the transcoded one: this may or may not be easy depending on the actual file format and the availability (and features) of decoding libraries and file parsers handling this particular format.
From the above, one readily understand that a lot of highly complex file surgery happens under the hood when a user hits the ‘5’ key in FastPictureViewer Pro to give a five stars rating to his favorite photo, or when clicking the Save Metadata to File menu option in Adobe Lightroom to export the metadata to a JPEG, most of which could be avoided if the file containers allowed in-place edits…
A big part of what we do here at FastPictureViewer.com is to read image files. We support hundreds of digital camera models and dozens of raster formats in our image viewer and Windows codecs, we also expose EXIF, XMP and IPTC metadata to the Windows Property System, from where it can be consumed by Windows Explorer or any application that cares to take advantage of this data. We also support EXIF, XMP and IPTC metadata embedding within JPEG, TIFF and HD Photo as well as reading and writing external Adobe XMP sidecar files, just to say we are at the heart of all those issues and have been dealing with them, or working around them, on a daily basis for many years.
What we propose below is not a revolution, but a possible step towards easier to manage image files.
Camera manufacturer will probably never give up their competitive advantage and are likely to keep storing opaque proprietary data in their files for the foreseeable future, so the “opaque metadata problem” is unlikely to be solved anytime soon (in an ideal world, opaque binary metadata blocks would not exist and all camera manufacturers would use properly labeled discrete fields to store all the data they need to store. Each manufacturer would be able to keep their trade secrets as there would be no obligation or need to reveal the intent and purpose of each data element, but storing them in discrete fields – say in XML CDATA format – would make it possible for all 3rd parties to deal with all metadata – dissect it, extend it and reassemble it – in a uniform and reliable manner).
What can be avoided, however, is the necessity to entirely rewrite files – a time-consuming and resource-intensive operation with possible dangerous consequences – when adding metadata. If you are not fully convinced that metadata embedding could be a risky business, check out this post on the Lightroom Journal describing a potential data corrupting metadata bug in current Adobe flagships).
The first step towards a possible solution would be to get rid of hard-coded file offsets: they would disappear entirely and various parts of the files would be referenced by a logical name, for example “/Image/Thumbnail”, “/Image/Preview”, or “/Image/RawData”, and it would absolutely not matter where the actual elements are physically laid out in the file.
A directory, sporting a well-known name, would list the file’s content and readers could simply reference or access the various parts by name. One such part would of course be called something like “/Image/Metadata/EXIF” and could very welll contain said metadata in a format similar to the one currently used, namely a binary EXIF segment, containing XMP data and the Maker Notes segment as is does today.
The Maker Notes and EXIF data would only require minimal changes, namely the use logical names instead of file offsets where they store the location of the various file components. Existing metadata parsing code could be used with little modifications as the metadata itself would be stored in essentially the same format as it is today, save for the file offsets replaced by names.
The container itself would be a “filesystem-within-a-file” and use separate data streams for the various file constituents, image data – one stream per image representation – and metadata.
One huge advantage of such a container is that it would be possible to add, extend, truncate or remove data streams from the file without disturbing the other data streams it contains and, as such, extending the metadata stream within a file would simply be a matter of appending data to it or just rewrite the metadata stream entirely with complete disregard of the rest of the file: the storage container would take care of allocating the extra space needed, exactly like a file system like FAT, EXT3 or NTFS does for normal files on hard disks or memory cards.
Metadata updates would be much faster than when a full file rewrite is necessary and also perfectly safe as the physical layout of the various file constituents would be irrelevant, so an arbitrary amount of metadata could be added or removed without fear. Entirely new types of data or metadata could be invented and added in their own streams, and co-exist peacefully with existing data without breaking existing readers.
Since the actual binary or XML data would be stored similarly as it is today, changes to firmware and application software would be confined to the file I/O portion of the code, where the software would open the “/Image/Thumbnail” stream, for example, instead of seeking to absolute file location X, where the thumbnail is supposed to be stored.
The remaining of the file handling code could be essentially identical as the binary data (RGB pixels, sensor data…) retrieved from the various parts of the file would be exactly the same. Also, since the creation of those files would require an updated firmware, camera manufacturers could jump on the opportunity to add a new “Model ID” field in the form of a globally unique identifier (GUID), stored straight in the file’s header, that unambiguously identify the camera model so raw decoders and applications could identify the files beyond any doubt.
To sum it up, this proposed change is not much about data itself but mainly about how it is stored, more precisely in which type of container.
One readily available and widespread example of a suitable container would be the Microsoft Compound File, whose binary specifications are open and readily available, and which can be implemented on any platform or device, from camera phones to DSLRs to Linux and the Apple Mac, with little effort (all copies of the Windows operating system ship with an implementation since about 1995 and this format is already well known to the open source community). For those who’d see an issue related to the company that created this specification, think twice: Microsoft invented TIFF (with Aldus) and this doee not seem to be a problem to anyone…
With such a container it would be child’s play to define and add new types of streams, say “/Audio/MP3” or “/Video/H.264”, to existing files. Multiple formats could be stored side-by-side in the same file, such as “Audio/MP3″ and “/Audio/PCM”, if it makes sense for some application.
A software application could also add or rewrite, say, the preview image of any file without fear of breaking anything, for example in a stream called “/Image/Preview/JPEG”, which would contain a standard JPEG image (SOI, APP0, …, EOI).
Applications could actually look for such a preview stream and discover it at runtime without knowing anything about the format’s specifics. Likewise, an “/Image/Metadata/EXIF” stream could be found, read and updated by applications, again without the need to take care of - or even know – the details of the file layout. It would also be possible to add metadata to a particular image within the file without affecting other images in the same file, say “/Image/Preview/JPEG/Metadata/XMP”, if it makes sense, etc.
With a little coordination regarding the stream names and hierarchy, such a multimedia file format could become universal and used to store virtually any type of data with very little effort and maximum reuse of existing code bases and libraries. In-place extensibility, data discoverability and reliable updates would be key advantages compared to today’s container formats.
– Axel Rietschin (May 21, 2011)