Me: I live in Silicon Valley with my wife, child and cat. I have worked at Microsoft since I graduated from College, both in the Macintosh Business Unit on products such as Outlook Express, Entourage, IE, and Virtual PC and in Windows Live on Hotmail, Calendar and People. I am currently a Principal Lead Program Manager on the Windows Live Social Networking team. I basically manage a team of Program Managers responsible for delivering features to support our web and client applications. I've been blogging since 2001 and like to play around with .NET in my spare time working on projects such as dasBlog (the blog that powers this site) and Send to SmugMug (an application for uploading photos to SmugMug). I blog about a number of technology and productivity related topics.
Powered by: newtelligence dasBlog 2.3.9074.18820
Disclaimer The posts on this weblog are provided "AS IS" with no warranties, and confer no rights. The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.
© Copyright 2010, Omar Shahine
E-mail
The other day I was looking for some code that would extract some XMP metadata from a JPEG. You see on Vista, all metadata is now written to the file using XMP for a number of image formats, one of which is JPEG. This is truly glorious as on XP there was no interop story for any keywords, captions etc that were entered into Microsoft APIs (Win32 - GDI+ and .NET System.Drawing).
This is possible because Vista and the .NET Framework 3.0 have a new Photo subsystem called the Windows Imaging Component and it's part of the Windows Presentation Foundation (WPF). This is a subsystem that relies on image codecs to describe the contents of images (like video codecs). These codecs also handle reading and writing metadata.
For Vista/.NET Microsoft has written a number of codecs that ship in the box. This includes:
Metadata support is described on the Microsoft Photography Blog in this post.
EXIF, IPTC, and XMP – oh my!There are a number of competing standards for imaging metadata. That is, different ways of reading and writing metadata for photos. One of the biggest standards, EXIF, is commonly written to photos by most cameras, but has many limitations. It’s somewhat antiquated, fragile, not very flexible, and doesn’t support international languages like Japanese very well. IPTC is a standard that is used pretty widely in journalism applications, but is undergoing a transformation towards an XMP-based system. XMP is an extensible framework for embedding metadata in files that was developed by Adobe, and is the foundation for our “truth is in the file” goal. All metadata written to photos by Windows Vista will be written to XMP (always directly to the file itself, never to a ‘sidecar’ file). When reading metadata from photos on Windows Vista, we will first look for XMP metadata, but if we don’t find any, we’ll also look for legacy EXIF and IPTC metadata as well. If we find legacy metadata, we’ll write future changes back to both XMP and the legacy metadata blocks (to improve compatibility with legacy applications).
EXIF, IPTC, and XMP – oh my!There are a number of competing standards for imaging metadata. That is, different ways of reading and writing metadata for photos. One of the biggest standards, EXIF, is commonly written to photos by most cameras, but has many limitations. It’s somewhat antiquated, fragile, not very flexible, and doesn’t support international languages like Japanese very well. IPTC is a standard that is used pretty widely in journalism applications, but is undergoing a transformation towards an XMP-based system.
XMP is an extensible framework for embedding metadata in files that was developed by Adobe, and is the foundation for our “truth is in the file” goal. All metadata written to photos by Windows Vista will be written to XMP (always directly to the file itself, never to a ‘sidecar’ file). When reading metadata from photos on Windows Vista, we will first look for XMP metadata, but if we don’t find any, we’ll also look for legacy EXIF and IPTC metadata as well. If we find legacy metadata, we’ll write future changes back to both XMP and the legacy metadata blocks (to improve compatibility with legacy applications).
Well, what I wanted to do is add some code to Send to smugmug that can read the keywords, ratings and captions that I enter in using Vista as well as Adobe Photoshop Bridge and Microsoft's new iView Media Pro Microsoft Expression Media. However, Send to smugmug is a .NET 1.1 application and all this cool new stuff is in .NET 3.0. Ugh.
After a lot of searching I came up empty handed. It seemed impossible to extract XMP from JPEG. Or so I thought. But I found this hidden gem. It turns out that if you just open the JPEG file and read it in using a StreamReader the XMP text is sitting right there in plain view. Right in the middle of all this binary text.
Here is a code snippet to load a jpeg and extract the XMP section.
public static string GetXmpXmlDocFromImage(string filename) { string contents; string xmlPart; string beginCapture = "<rdf:RDF"; string endCapture = "</rdf:RDF>"; int beginPos; int endPos; using (System.IO.StreamReader sr = new System.IO.StreamReader(filename)) { contents = sr.ReadToEnd(); Debug.Write(contents.Length + " chars" + Environment.NewLine); sr.Close(); } beginPos = contents.IndexOf(beginCapture, 0); endPos = contents.IndexOf(endCapture, 0); Debug.Write("xml found at pos: " + beginPos.ToString() + " - " + endPos.ToString()); xmlPart = contents.Substring(beginPos, (endPos - beginPos) + endCapture.Length); Debug.Write("Xml len: " + xmlPart.Length.ToString()); return xmlPart; }
Notice that I am looking for the <rdf:RDF and </rdf:RDF> start and end tags here. This is to ensure maximum compatibility. Normally an XMP blog starts with <x:xmpmeta and ends with </x:xmpmeta> however, this root tag is optional per the XMP spec and for some reason Vista uses <xmp:xmpmeta and </xmp:xmpmeta>.
Once you have the xml extracted from the binary file you simply load it into an XML Document and go looking for what you want. In the below code example I'm looking for Rating, Keywords and Description.
private void LoadDoc(string xmpXmlDoc) { XmlDocument doc = new XmlDocument(); try { doc.LoadXml(xmpXmlDoc); } catch (Exception ex) { throw new ApplicationException("An error occured while loading XML metadata from image. The error was: " + ex.Message); } try { doc.LoadXml(xmpXmlDoc); NamespaceManager = new XmlNamespaceManager(doc.NameTable); NamespaceManager.AddNamespace("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#"); NamespaceManager.AddNamespace("exif", "http://ns.adobe.com/exif/1.0/"); NamespaceManager.AddNamespace("x", "adobe:ns:meta/"); NamespaceManager.AddNamespace("xap", "http://ns.adobe.com/xap/1.0/"); NamespaceManager.AddNamespace("tiff", "http://ns.adobe.com/tiff/1.0/"); NamespaceManager.AddNamespace("dc", "http://purl.org/dc/elements/1.1/"); // get ratings XmlNode xmlNode = doc.SelectSingleNode("/rdf:RDF/rdf:Description/xap:Rating", NamespaceManager); // Alternatively, there is a common form of RDF shorthand that writes simple properties as // attributes of the rdf:Description element. if (xmlNode == null) { xmlNode = doc.SelectSingleNode("/rdf:RDF/rdf:Description", NamespaceManager); xmlNode = xmlNode.Attributes["xap:Rating"]; } if (xmlNode != null) { this.Rating = Convert.ToInt32(xmlNode.InnerText); } // get keywords xmlNode = doc.SelectSingleNode("/rdf:RDF/rdf:Description/dc:subject/rdf:Bag", NamespaceManager); if (xmlNode != null) { foreach (XmlNode li in xmlNode) { Keywords.Add(li.InnerText); } } // get description xmlNode = doc.SelectSingleNode("/rdf:RDF/rdf:Description/dc:description/rdf:Alt", NamespaceManager); if (xmlNode != null) { this.Description = xmlNode.ChildNodes[0].InnerText; } } catch (Exception ex) { throw new ApplicationException("Error occured while readning meta-data from image. The error was: " + ex.Message); } finally { doc = null; } }
There you have it. I hope this saves someone a few hours when they try and do this.
 
public static string GetXmpXmlDocFromImage(string filename) { char contents; string beginCapture = "<rdf:RDF"; string endCapture = "</rdf:RDF>"; string collection = string.Empty; bool collecting = false; bool matching = false; int collectionCount = 0; using (System.IO.StreamReader sr = new System.IO.StreamReader(filename)) { while (!sr.EndOfStream) { contents = (char)sr.Read(); if (!matching && !collecting && contents == '<') { matching = true; } if (matching) { collection += contents; if (collection.Contains(beginCapture)) { //found the begin element we can stop matching and start collecting matching = false; collecting = true; } else if (contents == beginCapture[collectionCount++]) { //we are still looking, but on track to start collecting continue; } else { //false start reset everything collection = string.Empty; matching = false; collecting = false; collectionCount = 0; } } else if (collecting) { collection += contents; if (collection.Contains(endCapture)) { //we are finished found the end of the XMP data break; } } } } Debug.WriteLine("Collection: " + collection); return collection; }