shahine.com/omar/

homepage | Send mail to the author(s) contact

yet another Microsoft blogger

# Wednesday, January 17, 2007

Stuff you didn't want to know about IFilters

Why am I writing this post? I wish I didn't have to. I spent a few days trying to figure out why PDFs were not being indexed on Vista, and why none of my TIFF files that were created with Microsoft Office Document Imaging (and have embedded OCR text) were showing up in search results. This was happening on Vista and on XP with Office 2007 and Windows Desktop Search.

After a bit of digging around, and a couple of emails I got the answer.

Vista and Windows Desktop Search 3 (which share the same technology) do not support IFilters that only implement IPersistFile. In order for the contents of files to be indexed the IFilter must support IPersistStream.

If you want an IFilter for PDF files then you should download Adobe Acrobat Reader 8 for Vista (only the Vista version has the IFilter). The previous IFilter does not work. While you are at it, read this post for instructions on how to make your own very well behaved Adobe Reader Installer.

If you want an IFilter for your TIFF or MDI files then you are out of luck for now.

I would like to add that Microsoft Office Document Imaging is one of the best values in the Office Suite. It's completely ignored, and not installed by default on Office 2007 any longer. If you do install it you can use it with any TWAIN compatible scanner to scan all your legacy paper. The text in that paper is OCR'ed (text is recognized) and you can save as a TIFF file which can be viewed on almost any computer. That text can also be indexed and searched on your computer. This is a handy way to find that receipt for your TV from 2 years ago etc. You can also create these files from any application that can print. I've been using this for years to save any web receipts that I need.

 

Thursday, January 18, 2007 2:38:16 AM (Pacific Standard Time, UTC-08:00)
In interim while you wait for a TIFF IFilter you could drag your existing scans (tiffs etc.) into OneNote 2007 and have it's IFilter expose OCR'ed text to the new indexing engine.

http://blogs.msdn.com/chris_pratley/archive/2005/09/14/unifying-the-analog-and-the-digital-with-onenote.aspx

Cheers
Thursday, January 18, 2007 4:18:36 AM (Pacific Standard Time, UTC-08:00)
I use the Foxit IFilter and Vista search indexes the contents of my PDFs as expected:
http://gallery.live.com/liveItemDetail.aspx?li=d0365ff5-6ec1-4ddc-99b9-3991dd43062a&l=3

Why install Acrobat Reader when you don't have to?
Saj
Friday, January 19, 2007 6:33:00 AM (Pacific Standard Time, UTC-08:00)
It's good to see someone else thumbing-up MDI. As a format, I think it rocks - it doesn't make any sense at all that it's not installed by default...!
Friday, January 19, 2007 9:35:09 AM (Pacific Standard Time, UTC-08:00)
Don't forget that TIFFs can be HUGE, so if you've got lots of legacy paper, then you might want to invest in some good storage....
Sean
Monday, March 05, 2007 12:46:41 PM (Pacific Standard Time, UTC-08:00)
I really miss the indexing functionality of TIFF and MDI files. And it is not only not working with desktop search but also SharePoint 2007 does not have this functionality anymore. A real big step back and not understandable why the decision has been taken to remove this.
Rolf Hansmann
Wednesday, March 28, 2007 6:24:44 AM (Pacific Daylight Time, UTC-07:00)
I am happy to have found this discussion and a group of TIF-MDI fans. I love the format and not having the TIF Ifilter is a show stopper for our business. Lets hope MS put Document Imaging back on the agenda.
PS Sean - TIF files from scanned documents are no larger than PDFs and in many cases much smaller. MDI files are very small.
Mike
Mike in cairns
Monday, July 02, 2007 9:16:14 PM (Pacific Daylight Time, UTC-07:00)
Wow, indexing tif is no longer supported in Vista? This is bad news for me. What is the alternative?
stmer
Monday, July 02, 2007 9:25:14 PM (Pacific Daylight Time, UTC-07:00)
Couldn't one register a plain text filter with .tif? Assuming the OCR has already been run, the plain text will live somewhere in the file and the plain text filter should pick it up.

Now, if I could just figure out how to change the filter for .tif. Must be in the registry here somewhere...
stmer
Tuesday, July 03, 2007 8:29:43 AM (Pacific Daylight Time, UTC-07:00)
My hope is that the TIFF filter gets updated by the office folks. maybe in SP1
Tuesday, July 03, 2007 8:08:13 PM (Pacific Daylight Time, UTC-07:00)
In the mean time, one could set up a Virtual PC with XP, and set up file sharing, and let XP index the documents. All searches would need to be performed in the Virtual PC image.
stmer
Wednesday, July 04, 2007 7:05:17 PM (Pacific Daylight Time, UTC-07:00)
I set up a Virtual PC as described above, and it works fine. It's a pretty heavyweight work-around. Just be aware that you need Windows Desktop Search 2.x, not 3.x, because 3.x suffers the same problem as Vista. The MODI IFilter components don't support IPersistStream, which both Vista and WDS 3.x require, but WDS 2.x don't. I hope the Office team supplies an update for this, but I don't hold out much hope.
stmer
Comments are closed.