shahine.com/omar/

homepage | Send mail to the author(s) contact

yet another Microsoft blogger

# Monday, January 29, 2007

Reading XMP Metadata from a JPEG using C#

The other day I was looking for some code that would extract some XMP metadata from a JPEG. You see on Vista, all metadata is now written to the file using XMP for a number of image formats, one of which is JPEG. This is truly glorious as on XP there was no interop story for any keywords, captions etc that were entered into Microsoft APIs (Win32 - GDI+ and .NET System.Drawing).

This is possible because Vista and the .NET Framework 3.0 have a new Photo subsystem called the Windows Imaging Component and it's part of the Windows Presentation Foundation (WPF). This is a subsystem that relies on image codecs to describe the contents of images (like video codecs). These codecs also handle reading and writing metadata.

For Vista/.NET Microsoft has written a number of codecs that ship in the box. This includes:

  • BMP
  • GIF
  • ICO
  • JPEG
  • PNG
  • TIFF
  • Windows Media Photo

Metadata support is described on the Microsoft Photography Blog in this post.

EXIF, IPTC, and XMP – oh my!
There are a number of competing standards for imaging metadata. That is, different ways of reading and writing metadata for photos. One of the biggest standards, EXIF, is commonly written to photos by most cameras, but has many limitations. It’s somewhat antiquated, fragile, not very flexible, and doesn’t support international languages like Japanese very well. IPTC is a standard that is used pretty widely in journalism applications, but is undergoing a transformation towards an XMP-based system.

XMP is an extensible framework for embedding metadata in files that was developed by Adobe, and is the foundation for our “truth is in the file” goal. All metadata written to photos by Windows Vista will be written to XMP (always directly to the file itself, never to a ‘sidecar’ file). When reading metadata from photos on Windows Vista, we will first look for XMP metadata, but if we don’t find any, we’ll also look for legacy EXIF and IPTC metadata as well. If we find legacy metadata, we’ll write future changes back to both XMP and the legacy metadata blocks (to improve compatibility with legacy applications).

Well, what I wanted to do is add some code to Send to smugmug that can read the keywords, ratings and captions that I enter in using Vista as well as Adobe Photoshop Bridge and Microsoft's new iView Media Pro Microsoft Expression Media. However, Send to smugmug is a .NET 1.1 application and all this cool new stuff is in .NET 3.0. Ugh.

After a lot of searching I came up empty handed. It seemed impossible to extract XMP from JPEG. Or so I thought. But I found this hidden gem. It turns out that if you just open the JPEG file and read it in using a StreamReader the XMP text is sitting right there in plain view. Right in the middle of all this binary text.

Here is a code snippet to load a jpeg and extract the XMP section.

public static string GetXmpXmlDocFromImage(string filename) 
{ 
    string contents; 
    string xmlPart; 
    string beginCapture = "<rdf:RDF"; 
    string endCapture = "</rdf:RDF>"; 
    int beginPos; 
    int endPos; 
    
    using (System.IO.StreamReader sr = new System.IO.StreamReader(filename))
    {
        contents = sr.ReadToEnd(); 
        Debug.Write(contents.Length + " chars" + Environment.NewLine); 
        sr.Close(); 
    }
    
    beginPos = contents.IndexOf(beginCapture, 0); 
    endPos = contents.IndexOf(endCapture, 0); 

    Debug.Write("xml found at pos: " + beginPos.ToString() + " - " + endPos.ToString()); 
    
    xmlPart = contents.Substring(beginPos, (endPos - beginPos) + endCapture.Length); 

    Debug.Write("Xml len: " + xmlPart.Length.ToString()); 

    return xmlPart; 
} 

Notice that I am looking for the <rdf:RDF and </rdf:RDF> start and end tags here. This is to ensure maximum compatibility. Normally an XMP blog starts with <x:xmpmeta and ends with </x:xmpmeta> however, this root tag is optional per the XMP spec and for some reason Vista uses <xmp:xmpmeta and </xmp:xmpmeta>.

Once you have the xml extracted from the binary file you simply load it into an XML Document and go looking for what you want. In the below code example I'm looking for Rating, Keywords and Description.

private void LoadDoc(string xmpXmlDoc) 
{ 
    XmlDocument doc = new XmlDocument(); 

    try 
    { 
        doc.LoadXml(xmpXmlDoc); 
    } 
    catch (Exception ex) 
    { 
        throw new ApplicationException("An error occured while loading XML metadata from image. The error was: " + ex.Message); 
    } 

    try 
    { 
        doc.LoadXml(xmpXmlDoc);

        NamespaceManager = new XmlNamespaceManager(doc.NameTable);
        NamespaceManager.AddNamespace("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
        NamespaceManager.AddNamespace("exif", "http://ns.adobe.com/exif/1.0/");
        NamespaceManager.AddNamespace("x", "adobe:ns:meta/");
        NamespaceManager.AddNamespace("xap", "http://ns.adobe.com/xap/1.0/");
        NamespaceManager.AddNamespace("tiff", "http://ns.adobe.com/tiff/1.0/");
        NamespaceManager.AddNamespace("dc", "http://purl.org/dc/elements/1.1/");

        // get ratings
        XmlNode xmlNode = doc.SelectSingleNode("/rdf:RDF/rdf:Description/xap:Rating", NamespaceManager);
        
        // Alternatively, there is a common form of RDF shorthand that writes simple properties as
        // attributes of the rdf:Description element.
        if (xmlNode == null)
        {
            xmlNode = doc.SelectSingleNode("/rdf:RDF/rdf:Description", NamespaceManager);
            xmlNode = xmlNode.Attributes["xap:Rating"];
        }
        
        if (xmlNode != null)
        {
            this.Rating = Convert.ToInt32(xmlNode.InnerText);
        }

        // get keywords
        xmlNode = doc.SelectSingleNode("/rdf:RDF/rdf:Description/dc:subject/rdf:Bag", NamespaceManager);
        
        if (xmlNode != null)
        {
             
            foreach (XmlNode li in xmlNode)
            {
                Keywords.Add(li.InnerText);
            }
        }

        // get description
        xmlNode = doc.SelectSingleNode("/rdf:RDF/rdf:Description/dc:description/rdf:Alt", NamespaceManager);
        
        if (xmlNode != null)
        {
            this.Description = xmlNode.ChildNodes[0].InnerText;
        }

    } 
    catch (Exception ex) 
    { 
        throw new ApplicationException("Error occured while readning meta-data from image. The error was: " + ex.Message); 
    } 
    finally 
    { 
        doc = null; 
    } 
} 

There you have it. I hope this saves someone a few hours when they try and do this.

 

Tuesday, January 30, 2007 5:45:59 PM (Pacific Standard Time, UTC-08:00)
There is another way - it's to use the windows imaging components as you mentioned. The key is using the BitmapMetadata classes. They provide access to what in my experiements is ALL metadata stored within a JPG -- including XMP. It's a bit baffling at first, but once you get the hang of using the classes and the query syntax, it's robust and simple (I'm not going to go so far as saying logical though!). Better yet, it's all C# friendly with .NET 3.0 installed. I've got a hacky application to view and show it that I could post to my web site if anyone is interested.
Tuesday, January 30, 2007 5:53:38 PM (Pacific Standard Time, UTC-08:00)
I needed something that didn't require .net 3.0, hence this post.
Wednesday, January 31, 2007 5:40:27 AM (Pacific Standard Time, UTC-08:00)
Maybe it's time to upgrade to a new version of .NET. :)

Wednesday, January 31, 2007 9:30:53 AM (Pacific Standard Time, UTC-08:00)
I confess, I'm only just getting into the world of portable image metadata (hey, what can I say, I'm lazy) but I thought I read - while doing some learning - that Smugmug natively supports XMP-based IPTC information (keywords, captions, etc.). All you need do is upload the file and the systems on their end can parse out the meta. If this is true (and I don't know if it is, I haven't had a chance to try this stuff out yet), why do you need your loader to explicitly know it beforehand?
Wednesday, January 31, 2007 10:48:10 AM (Pacific Standard Time, UTC-08:00)
Because one feature of my uploader is the ability to filter uploads based on ratings and keywords. I only upload 3 ratings or higher.
Friday, April 20, 2007 4:53:45 AM (Pacific Daylight Time, UTC-07:00)
I used the example code posted in this site, but some files a could not read the XMP info.
I'm developing a little system to store and organise images for a photografer, and it's essencial to access (read and write) the info associated to the image.
In the Adobe Photoshop appears the description, but with this example some image didn't show the info.
In my tests, I saw that the information appears to be stored in binary format, not text.
Do you have more information about this? Sending the image to you would help in detect how the info is stored in the image?
Any help I thank's a lot, sorry my poor English.
I'm from Brazil, Porto Alegre.

Thank you.
Friday, April 27, 2007 1:48:11 PM (Pacific Daylight Time, UTC-07:00)
omar: this code & usenet bit rocks, thanks! i wrote an EXIF parser, but exif is so flaky between cameras it was really hackish. now that ive switched to Adobe LightRoom for my tagging & editing, and it exports XMP in the JPGs, im looking forward to trying this out for my photo site. (i maintain an XML quasi-db in memory, which allows users to search for any of the 5k+ images very rapidly)
Tuesday, July 17, 2007 1:12:00 PM (Pacific Daylight Time, UTC-07:00)
Great Post! Any thoughts on how you might modify this to write xmp metadata to a jpeg? I have looked into the .NET 3.0 solutions above but they still seem a little weak. I am looking for something a little more reliable...
Torey Maerz
Friday, September 07, 2007 1:29:50 AM (Pacific Daylight Time, UTC-07:00)
It really goes! With bit bugfix:
if(beginPos != -1 && endPos != -1)
xmlPart = contents.Substring(beginPos, (endPos - beginPos) + endCapture.Length);

..I could just import it to my project. Omar You are great!

Blasius
Tuesday, September 18, 2007 3:59:48 AM (Pacific Daylight Time, UTC-07:00)
. Its a great post and has helped me a lot .I was wondering on how to modify / append the XMP metadata for the Jpeg's .. Any ideas ??
AB
Tuesday, November 20, 2007 10:10:18 AM (Pacific Standard Time, UTC-08:00)
Thanks for the post... I have even found the 3.0 stuff to be hit or miss on some of the xmp stuff. I liked your solution it is simple, but the one change I made to the function that loads the XML from the image is to process it as it is read. The image data is in the header so it is not necessary to read the entire file and with large images this will save a lot of processing time.

Here is the updated code that only reads until it finds the XMP block in the file. (I tried to get the formatting right in the comments, but it didn't work out)


public static string GetXmpXmlDocFromImage(string filename)
{
char contents;
string beginCapture = "<rdf:RDF";
string endCapture = "</rdf:RDF>";
string collection = string.Empty;
bool collecting = false;
bool matching = false;
int collectionCount = 0;

using (System.IO.StreamReader sr = new System.IO.StreamReader(filename))
{
while (!sr.EndOfStream)
{
contents = (char)sr.Read();

if (!matching && !collecting && contents == '<')
{
matching = true;
}

if (matching)
{
collection += contents;

if (collection.Contains(beginCapture))
{
//found the begin element we can stop matching and start collecting
matching = false;
collecting = true;
}
else if (contents == beginCapture[collectionCount++])
{
//we are still looking, but on track to start collecting
continue;
}
else
{
//false start reset everything
collection = string.Empty;
matching = false;
collecting = false;
collectionCount = 0;
}

}
else if (collecting)
{
collection += contents;

if (collection.Contains(endCapture))
{
//we are finished found the end of the XMP data
break;
}
}
}

}

Debug.WriteLine("Collection: " + collection);

return collection;
}
Brian
Saturday, December 08, 2007 8:43:54 AM (Pacific Standard Time, UTC-08:00)
Hello,

I found this article while I was looking for a way to read XMP information from JPEG/TIFF files. I have already managed to read EXIF data (though my implementation is not yet as "beautiful" as I'd like it to be), but images processed in Adobe Lightroom don't contain EXIF but mostly XMP data only. BTW, you can read the XMP data just like EXIF, with the tag value of 0x2BC. Read the byte[], convert it to UTF8 and load it into an XmlDocument - here you go. I could then use your XML code above, thanks for that one! (I failed at all those namespace first...) Only the EXIF values inside the XMP section are still (mostly) encoded the way as in regular EXIF sections. I'll need to apply my EXIF value parser on them, too. (Resolve RATIONALs like 1/60, etc.)

I was going to make a C# class that can read photo information like saved from the camera, but from EXIF and XMP. When it's done, you can likely find it on my website.

More information on reading EXIF in C#:
http://www.codeproject.com/KB/graphics/exifextractor.aspx
The EXIF specification:
http://www.exif.org/Exif2-2.PDF
Thursday, February 28, 2008 4:52:36 PM (Pacific Standard Time, UTC-08:00)
Great post. I am in particular interested in IPTC data. At the top of this thread, it makes reference to EXIF, IPTC, and XMP, but the actual samples here only work with exif and xmp. Does anyone know how to both read and write IPTC data back to a jpeg using c#? (in a lossless manner)

cheers,
j

Jon
Thursday, February 28, 2008 10:33:43 PM (Pacific Standard Time, UTC-08:00)
Hi,
I need the .Net application to read metadata(EXIF & IPTC metadata) from any image file that may be native image files like tiff,jpeg,bmp or gif and RAW image files as well.It would be great if someone can provide me the code for this.Thanks in advance.
Sathish kumar
Thursday, March 06, 2008 12:24:26 PM (Pacific Standard Time, UTC-08:00)
IPTC is not something I've ever figured out. If you need that use WIC (windows imaging component) that is part of .net 3.0.
Sunday, March 09, 2008 9:21:52 PM (Pacific Daylight Time, UTC-07:00)
I'm very new to this, but isn't the IPTC data just a subset of the XMP data? (www.iptc.org)

Also a couple of comments above, ask about how to write back to the JPEG file with changed values. Any ideas on this?

I am assuming this will involve removing the XMP XML from the original file, re-inserting it and saving it...somehow?
Stu
Tuesday, March 18, 2008 11:05:18 AM (Pacific Daylight Time, UTC-07:00)
Thanks for the example, this help me out a lot!
Sunday, May 25, 2008 6:47:24 PM (Pacific Daylight Time, UTC-07:00)
Brilliant, thanks Omar. Also, thanks to Brian for the optimisation, that really helps me.
Saturday, July 12, 2008 9:58:07 PM (Pacific Daylight Time, UTC-07:00)
I am also having the same query as Stu...

"How to write back to the JPEG file with new/changed values?"

Can you post some code for novice like me?

Thanks in adv.
Nhilesh Baua
Sunday, July 13, 2008 5:43:31 PM (Pacific Daylight Time, UTC-07:00)
To write back you'll want to use the Windows Imaging Component in .NET 3.0.
Monday, August 04, 2008 2:29:55 PM (Pacific Daylight Time, UTC-07:00)
The WIC looks super scary. There's no simple documentation for C# developers. Does anyone have a simple way, like how we have here for reading, to write XMP back to JPEGs?

I was using the C# XMP Toolkit, but after getting a working version in development, upon deployment it was obvious it was not 64-bit compatible, so I'm back to square one for writing. Argh!
Wednesday, August 13, 2008 4:20:03 AM (Pacific Daylight Time, UTC-07:00)
hi guys,

Reading XMP Metadata from a JPEG using C# is fine.
I want writing
XMP Metadata from a JPEG using C#


any one help me please

eranna
Wednesday, August 13, 2008 9:17:22 AM (Pacific Daylight Time, UTC-07:00)
thank you so much...

I create a EventHandler for Microsoft SharePoint. The customer loads pictures to SharePoint but want so see his XMP-Metadata on a SharePoint Field.

I was able to readout XMP from a TIF (PropertyID 700), but a JPG was completly different. I tried a few other ways to fullfil this requirement.

With your way of reading out XMP I am now able to readout the Metadata of JPG, TIF, EPS (PDF comes later... until now I havn't made any tests with PDF).

Best Regards
Adrian
OpenID
Please login with either your OpenID above, or your details below.
Name
E-mail
(will show your gravatar icon)
Home page

Comment (Some html is allowed: a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

Enter the code shown (prevents robots):

Live Comment Preview