SQL Server 2005 Full-Text Index search seems to fail indexing ppt files saved by Apose.Slides. The excact same setup have no problem indexing presentations that hs not saved by Aspose. The only difference is that I open a presentation with Aspose and save it to the database.
It's critical that we can index the uploaded prensetations in our database. Do you have a solution for this?
This is really a strange problem. Indexing is totally a different thing and should not have any relation with Aspose.Slides or its output ppts. I have requested the technical team to look into this issue and provide you solution if any.
I would suppose that the IFilter in SQL Server 2005 encounteres something that’s different for a ppt saved by Aspose.Slides.
My temporary solution to this issue was to iterate through all Text shapes in the slides, save them all to a varchar(max) field, and then index that field and not the varbinary(max) field containing the ppt file.
I would be much interested in any progress related to this issue.
That is not a problem of Aspose.Slides and we can’t fix it. To solve it SQL Server should index both unicode and ansi text from a ppt file instead of ansi only.
I gave only my opinion about this issue and why it can happen. I’m not SQL Server admin and can’t give you any exact information how to set up full-text index search.
But still you conclude that Full-text index is configured wrong, and the generated ppt is perfect.
I was really looking for some constructive feedback on this issue.
- What is the difference between an ordinary ppt and a ppt generated by Aspose.Slides - Why is SQL Server 2005 Full-Text index able to index one but not the other? - Are you able to reproduce this behavior?
MS PowerPoint stores text as: - ANSI - pure English text - Unicode - all other languages. This includes also all European languages which use Latin alphabet with umlauts á, é, ó etc. If you use any special characters inside English text then whole text also will be stored as Unicode.
We are running into this same issue. Slides created with Aspose do not get indexed by SQL Server in a default SQL Server installation. We do not know the proper procedure for correcting this issue, as this forum thread (which appears to be the only one on this issue) only gives vague suggestions. If someone is aware of the proper procedure to correct this, please post that information. This will prove to be a big inconvenience for customers, and may prove to be a deterrant for future customers, as our product would no longer support an "out of the box" SQL Server installation.
Natively created Office documents can be indexed in SQL Server, regardless of language, if the proper ifilter is installed for that language. I believe this works off of some kind of setting in the file that indicates the language. I have seen a few symptoms that may indicate that Aspose may not correctly set the language in files created using Aspose (not limited to .Slides). Are you familiar with how this works in native Office documents? Does Aspose handle document content languages in a way that should support proper indexing in SQL Server?
I have requested our development about the issue inquired by you. As soon as I receive some information from them, I will share that with you. I really appreciate your patience for that.
I have been able to discuss the issue with our development team and according to them the issue is not related to Aspose.Slides and there should be Unicode (UTF-16 LE) text filter applied on SQL Server end in order to resolve the problem, which is mentioned in previous post as well.
I ran your suggestion by a team member familiar with ifilters. His response was:
The “tokenization” of the terms to be indexed from a Microsoft Office 2003 document (.doc, .ppt, etc.) is performed by the IFilter (DLL) that is assigned to the file “type” (extension). For Microsoft Office documents these IFilters are provided, in a variety of ways, by Microsoft.
That said, Microsoft has released numerous versions of the “office” IFilter DLL on different versions of the WIndows OS and/or SQL Server release so there “may” be a version of the IFilter that will work. Case in point: we found an issue with the indexing of embedded documents (e.g., Excel within Word) in Office 2003 and 2007 only on 64-bit platforms. Fixed by a later revision of the filter subsystem plus some Registry modifications to SQL Server to point to the newer DLLs.
Now if Aspose is stating that a different IFilter DLL is to be installed (?)/used, then we need the specifics. I.e., name of the DLL, how acquired, where should/does it reside on the file system (i.e., where is it installed to), any Registry changes to have SQL Server reference it, etc., etc.
So, we are looking for your assistance on instructions for how to resolve this issue.
2011-03-04 12:17:53.59 spid32s Error ‘0x8004170c: The document format is not recognized by the filter.’ occurred during full-text index population for table or indexed view ‘[Sandbox].[dbo].[CONTENT]’ (table or indexed view ID ‘565577053’, database ID ‘8’), full-text key value ‘610EFF57-93B4-40BF-BF69-1AAB161CA072’. Failed to index the row.
I have requested our development team to share the feedback in response to your request. As per my initial observation the issue don't seems to be related to Aspose.Slides. I shall really appreciate your patience till the time our development team will share its response.