SQL Server 2005 Full-Text Index search

SQL Server 2005 Full-Text Index search seems to fail indexing ppt files saved by Apose.Slides. The excact same setup have no problem indexing presentations that hs not saved by Aspose. The only difference is that I open a presentation with Aspose and save it to the database.

It's critical that we can index the uploaded prensetations in our database. Do you have a solution for this?

Kind regards
Dennis

Dear Dennis,

This is really a strange problem. Indexing is totally a different thing and should not have any relation with Aspose.Slides or its output ppts. I have requested the technical team to look into this issue and provide you solution if any.

I would suppose that the IFilter in SQL Server 2005 encounteres something that’s different for a ppt saved by Aspose.Slides.

My temporary solution to this issue was to iterate through all Text shapes in the slides, save them all to a varchar(max) field, and then index that field and not the varbinary(max) field containing the ppt file.

I would be much interested in any progress related to this issue.

Are there any updates on this issue?

Kind regards
Dennis Milandt

Hello Dennis,

That is not a problem of Aspose.Slides and we can’t fix it.
To solve it SQL Server should index both unicode and ansi text from a ppt file instead of ansi only.

Could you please give some more details on how to configure this?

As far as I am aware Full-text index on varbinary(max) columns does’t care if the text is unicode or not.

How do I configure Full-text index to index unicode files in varbinary columns then?

Alexey, your post didn’t make much sense. Could you please get into more details?

Probably this question should be addressed to SQL Server administrator.

Full-text index search works fine for other ppt files as long as they are not created by Aspose.Slides.

I gave only my opinion about this issue and why it can happen. I’m not SQL Server admin
and can’t give you any exact information how to set up full-text index search.

But still you conclude that Full-text index is configured wrong, and the generated ppt is perfect.

I was really looking for some constructive feedback on this issue.

- What is the difference between an ordinary ppt and a ppt generated by Aspose.Slides
- Why is SQL Server 2005 Full-Text index able to index one but not the other?
- Are you able to reproduce this behavior?

There is only one difference.

MS PowerPoint stores text as:
- ANSI - pure English text
- Unicode - all other languages. This includes also all European languages which use Latin alphabet with umlauts á, é, ó etc. If you use any special characters inside English text then whole text also will be stored as Unicode.

Aspose.Slides stores everything as Unicode.

Unicode = UTF-16LE

We are running into this same issue. Slides created with Aspose do not get indexed by SQL Server in a default SQL Server installation. We do not know the proper procedure for correcting this issue, as this forum thread (which appears to be the only one on this issue) only gives vague suggestions. If someone is aware of the proper procedure to correct this, please post that information. This will prove to be a big inconvenience for customers, and may prove to be a deterrant for future customers, as our product would no longer support an "out of the box" SQL Server installation.

Natively created Office documents can be indexed in SQL Server, regardless of language, if the proper ifilter is installed for that language. I believe this works off of some kind of setting in the file that indicates the language. I have seen a few symptoms that may indicate that Aspose may not correctly set the language in files created using Aspose (not limited to .Slides). Are you familiar with how this works in native Office documents? Does Aspose handle document content languages in a way that should support proper indexing in SQL Server?

Hi Jason,

Thanks for your interest in Aspose.Slides.

I have requested our development about the issue inquired by you. As soon as I receive some information from them, I will share that with you. I really appreciate your patience for that.

We are sorry for your inconvenience,

Hi Jason,

I have been able to discuss the issue with our development team and according to them the issue is not related to Aspose.Slides and there should be Unicode (UTF-16 LE) text filter applied on SQL Server end in order to resolve the problem, which is mentioned in previous post as well.

We are extremely sorry for your inconvenience,

We are evaluating your suggestion. Any assistance you can provide on specifics would be appreciated.

I ran your suggestion by a team member familiar with ifilters. His response was:

The “tokenization” of the terms to be indexed from a Microsoft Office 2003 document (.doc, .ppt, etc.) is performed by the IFilter (DLL) that is assigned to the file “type” (extension). For Microsoft Office documents these IFilters are provided, in a variety of ways, by Microsoft.

That said, Microsoft has released numerous versions of the “office” IFilter DLL on different versions of the WIndows OS and/or SQL Server release so there “may” be a version of the IFilter that will work. Case in point: we found an issue with the indexing of embedded documents (e.g., Excel within Word) in Office 2003 and 2007 only on 64-bit platforms. Fixed by a later revision of the filter subsystem plus some Registry modifications to SQL Server to point to the newer DLLs.

Now if Aspose is stating that a different IFilter DLL is to be installed (?)/used, then we need the specifics. I.e., name of the DLL, how acquired, where should/does it reside on the file system (i.e., where is it installed to), any Registry changes to have SQL Server reference it, etc., etc.

So, we are looking for your assistance on instructions for how to resolve this issue.

Still unsure of what exactly your suggestion means, a team member ran a fresh test with updated ifilters, here is his reply:

Still fails on our SQL Server 2008 R2 x64 instance (see log below).

I’m using the Microsoft Office 2010 Filter Pack (AKA Filter Pack 2.0).

document_type class_id path version manufacturer

.ppt 64F1276A-7A68-4190-882C-5F14B7852019 C:\Program Files\Common Files\Microsoft Shared\Filters\OFFFILT.DLL 2010.1400.4746.1000 Microsoft Corporation

componenttype componentname clsid fullpath version manufacturer

filter .ppt 64F1276A-7A68-4190-882C-5F14B7852019 C:\Program Files\Common Files\Microsoft Shared\Filters\OFFFILT.DLL 2010.1400.4746.1000 Microsoft Corporation

2011-03-04 12:17:53.59 spid32s Error ‘0x8004170c: The document format is not recognized by the filter.’ occurred during full-text index population for table or indexed view ‘[Sandbox].[dbo].[CONTENT]’ (table or indexed view ID ‘565577053’, database ID ‘8’), full-text key value ‘610EFF57-93B4-40BF-BF69-1AAB161CA072’. Failed to index the row.

Hello Dear,

I have requested our development team to share the feedback in response to your request. As per my initial observation the issue don't seems to be related to Aspose.Slides. I shall really appreciate your patience till the time our development team will share its response.

Thanks and Regards,