Issue with Windows Desktop Search 4.0 and PPT (PowerPoint 2003) documents produced by Apsose

We have an issue with Windows Desktop Search 4.0 and PPT (PowerPoint 2003) documents produced by Apsose.

If we create a PPT file via aspose library we can not find it by Windows Desktop Search 4.0 (http://www.microsoft.com/windows/products/winfamily/desktopsearch/default.mspx)

If we save this document via native PowerPoint interface then this document may be found by the search engine.

The issue is urgent for us. Please advise how it may be fixed.

Dear Dmitry,

Thanks for reporting.

Development team will look into this issue.

Hello Dmitry,

Do you mean search by keywords or full text search? In the first case I think it’s necessary to set document properties like Title, Subject, Author, Subject, Keywords and etc.

We use full text search by content of documents (slides)

Probably Desktop Search has the same problems like SQL Server full text search. I wrote the main idea in this thread but I’m not sure that it can be fixed by changing search engine settings.

Hello.

We still have the problem with full text search by Windows Desktop Search engine. Full text search does not work for presentations created by aspose libraries.

The given link was analyzed but unfortunately it does not have value for us.

Now the situation became even worse because we found that IIS 6.0 search engine can not work properly with slides created by aspose libraries either. Full text search can not be applied for these files. The functionality is critical for our application. Can you please inform us if this is a defect of aspose library or there are some steps to make it to work?

Thank you.

Hello,

I think that is problem of search engine but not of Aspose.Slides. Unicode allowed (and must be used for any languages except English) by ppt specification so normal search engine should process it. I saw something in the MSDN about custom IFilter(s) and “language specific search” options. By the way, standard Vista’s search works fine with all ppt files and any language.

Please find below error message of IIS 6.0 search engine crawler:

Crawled (The filtering process could not load the item. This is possibly caused by an unrecognized item format or item corruption. )

As you can see usage or not usage of unicode is not the issue. The search engine crawler says that format of the file is unrecognized. File created via aspose library is attached. Please advise.

Colleagues from Aspose team,

Please let me know if (and when) some response from you may be expected. Thank you.

Hello,

Attached presentation contains several blocks which can’t be created by Aspose.Slides and shouldn’t be in the normal ppt file. So I think presentation was processed by another application. Anyway, PowerPoint can read it properly and standard search engine in Vista also works fine with this file. Sorry, but we don’t have any other ideas. By my opinion that is just a problem of IIS search engine which can’t be fixed from our side.

Colleagues,

Could you please point out exactly what blocks of the file were created not by Aspose.Slides and non-compliant with normal ppt format. This information may help us to analyze the issue.

Thank you.

Hello,

Please check attached dump of pres.ppt file. It contains MsoDataStore container with office XML schemes. Standard ppt file created by MS PowerPoint or by Aspose.Slides shouldn’t have it.

Thank you for prompt response.

We analyzed the attached dump and as a result we can say that those blocks are custom tags created by Aspose.Slide API: public sealed class Tags.

These tags contain some custom strings in XML fomat. They are created by Aspose.Slides API, like slide.Tags.Add(some_key, some_text);

Please let us know:

  1. If it is normal to use this API.

  2. If you tested on your side work of standard MS IIS 6.0 search engine or standard MS WDS 4.0 search engine with slides created via aspose (standard Vista search engine is not point of our interest now.)

  3. If you made these tests in which we are extremely interested please let us know results on your side.

Thank you.

That is absolutely normal to use Tags API but all tags are created inside “PowerPoint Document” container. Aspose.Slides can’t create or parse “MsoDataStore”. I don’t know what is it and it’s the first presentation with such container I see in my practice.

Colleagues,

You mentioned that full text search in Vista works properly with presentations created by Aspose libraries ("standard Vista's search works fine with all ppt files and any language").

Could you please give us (attach) an example of presentation created by Aspose libraries and we will see on our site how it works with Vista's full text search, WDS 4.0, IIS 6.0. Also please let us know exact version of library that you use on your site.

Thank you.

Colleagues,

Please answer to our previous question. Also please let us know if MS SharePpoint search engine works with ppt files created via Aspose library.

Again the issue is very urgent for us and we rely on your cooperation. Please find attached email from Shelia Holt with recommendations to setup an urgent call or online chat. Please let us know when we can have this communication.

Thank you.

Hello Dmitry,

At first about Vista’s search engine. I tried your presentation attached in this thread and several other presentations which were sent by our customers or created locally with Aspose.Slides. Currently I use latest 3.1.0 version of Apose.Slides but there are some presentations created with very old versions couple of years ago. There are no problems with Vista’s search in all files.

Unfortunately, I can’t give you information about SharePoint search engine because we never tested it.

Hello,

Can we ask you to make testing under SharePoint and let us know results. It is very important for us. We did this testing on our site by it does not work. Your prompt response is highly appreciated.

Thank you.

Hello Dmitry,

We don’t have installed SharePoint just now for testing but we will test it in the nearest future.

Windows Desktop Search, I believe, uses the same ifilters as SQL Server's indexing. This has apparently been an issue for years, and we are still having the issue. Apparently Aspose is putting something into the files or removing something from the files that makes the ifilter either fail to recognize the file format, or run into some internal error. One of my team members noted that the files that fail indexing on SQL Server also fail on Windows searching.

If I take a PPT created in PowerPoint, and save to the database, it can be indexed. When I save from Aspose, it cannot be indexed. If I take a file last saved by Aspose, and open it with PowerPoint, and then save it back out to the database, it can be indexed again. Are you able to interpret the difference between files to see what it is that you're putting into them or taking out of them? Even if you're putting something (like tags) into these files for your own purposes, could you perhaps provide a save option that doesn't do that?

This is a very big issue for us, and Aspose has been aware of it for years.