Documents converted from Word to PDF are not properly extractable by an ifilter

mga · February 5, 2016, 3:29am

We are using an ifilter to get out information from pdf documents. This is done so you can have a search engine that can include document content.

We are using the adobe ifilter (ftp://ftp.adobe.com/pub/adobe/acrobat/win/9.x/PDFiFilter64installer.zip).

When extracting text from a word document converted to PDF using another provider, we get the words with spaces as expected. “This is a text line this is a new text line” (correct). When trying to use this tool with a document converted with Aspose.Words the linebreak does not properly add a spacebar. “This is a text linethis is a new text line” (incorrect).

Is there some kind of configuration to change and fix this? Or is it a bug?

mga · February 7, 2016, 11:08am

Just wanted to add the simple code we actually use to convert from Word to PDF:

var doc = new Aspose.Words.Document(fileToConvert);
var options = new Aspose.Words.Saving.PdfSaveOptions
{
    SaveFormat = Aspose.Words.SaveFormat.Pdf,
    Compliance = Aspose.Words.Saving.PdfCompliance.PdfA1b,
    ExportDocumentStructure = true
};
doc.Save(destination, options);

awais.hafeez · February 9, 2016, 12:28am

Hi Marius,

Thanks for your inquiry. We have just released a new version of Aspose.Words for .NET 16.1.0; please upgrade to the latest version and see how it goes on your end. Hope, this helps.

In case the problem still remains, please attach your input Word document and Aspose.Words generated output PDF file here for testing. We will investigate the issue on our end and provide you more information.

Best regards,