Convert Word to PDF with Image

Images are distorted when I convert to PDF using the new rendering engine in version 6. I’m using v6.0.1. Attached is an example (Image Test.doc).

I would just revert to using Aspose.PDF but we just ran into an issue where a line was duplicated in the generated PDF. Look at the third paragraph from the bottom in Duplicate.doc for an example. The end of the paragraph, 18:1672-1681 (2007)" is repeated.

Any suggestions on how to solve one or both of these issues would be greatly appreciated.

Hi
Thanks for your request.

  1. I managed to reproduce the problem and created new issue #7238 in our defect database. I will notify you as soon as it is fixed.
  2. I cannot reproduce the problem with duplication. Could you please attach the output PDF?
    Best regards.

Sorry, the attached Word document did not re-produce the problem. I’ll give you the full details this time. What I’m doing is inserting the attached HTML (html.txt), using Aspose’s InsertHtml function. I then convert this Word document to PDF using the below function. This should reproduce the issue. If I modify the html, say remove one of the paragraphs that appear before the third paragraph from the bottom, the problem goes away. I’m unable to identify the problem and thus unable to tell my customers that this is not occuring in other documents as well.

Thanks for your help.

Public Shared Function ConvertAsposeDocToPDF(ByVal doc As Aspose.Words.Document) As MemoryStream
Try
'Save the document in Aspose.Pdf.Xml format into a memory stream.
Dim xmlStream As MemoryStream = New MemoryStream
doc.Save(xmlStream, SaveFormat.AsposePdf)
'Seek to the beginning so it can be read by XmlDocument.
xmlStream.Seek(0, SeekOrigin.Begin)

'Load the XML document into Aspose.Pdf
Dim pdf As Aspose.Pdf.Pdf = New Aspose.Pdf.Pdf
'Make sure the images that were saved by Aspose.Words into Windows temporary
'folder are automatically deleted by Aspose.Pdf when they are no longer needed.
pdf.IsImagesInXmlDeleteNeeded = True

pdf.BindXML(xmlStream, Nothing)

'*** Aspose.Pdf font cache, see comments below.
pdf.IsTruetypeFontMapCached = False

'If you convert to PDF multiple files in your application, 
'uncomment the following lines to improve the speed of conversion.
'pdf.IsTruetypeFontMapCached = true;
'pdf.TruetypeFontMapPath = <some path where you have read/write access>

Dim msOut As New MemoryStream
pdf.Save(msOut)

Return msOut

Catch ex As Exception
Throw New Exception("[ConvertAsposeDoctoPDF] " & ex.Message)
End Function

You can ignore the html file in my previous post as it has the modified HTML that I was able to get working. Attached is an example that I tested to make sure it works this time. The HTML file text is inserted into the word document which is then converted to PDF. These are the actual files I used to make an example that replicates this problem.

Hi
Thank you for additional information. I still cannot reproduce the problem on my side.

  1. I tried to convert “pub.doc” to PDF using SaveToPdf method. The output PDF looks fine and no duplication.
  2. I tried using the following code:
string html = File.ReadAllText(@"Test100\html.txt");
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.InsertHtml(html);
doc.Save(@"Test100\out.doc");
doc.SaveToPdf(@"Test100\out.pdf");

Still no duplications.
Best regards.

SaveToPDF function works. It’s when I use Aspose.PDF using the function in previous reply that this happens. I switched back to using Aspose.PDF because of the images problems.

Hi
Thank you for additional information. It seems to be an issue with Aspose.Pdf. Aspose.Words generates correct intermediate XML. As you can see, there is no duplication in the XML produced by Aspose.Words:

<Text MarginTop="14" IsSpaced="true" LineSpacing="1.35">
<Segment FontName="Verdana" FontSize="9">"Scaling of the Resolving Power and Sensitivity for Planar FAIMS and Mobility-Based Discrimination in Flow- and Field-Driven Analyzers," A.A. Shvartsburg and R.D. Smith, </Segment>
<Segment FontName="Verdana" IsTrueTypeFontItalic="true" FontSize="9">J. Amer. Soc. Mass Spectrom.</Segment>
<Segment FontName="Verdana" FontSize="9">, </Segment>
<Segment FontName="Verdana" IsUnderline="true" FontSize="9">18</Segment>
<Segment FontName="Verdana" FontSize="9">: 1672-1681 (2007). </Segment>
</Text>

Therefore, I think you should post this question in Aspose.Pdf forum. Our colleagues will answer you shortly.
Best regards.

The issues you have found earlier (filed as 7238) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by alexey.noskov.

Unfortunately I’m having the same problem with the image in this file.

Hi
Thanks for your request. As I can see the image looks a bit better, but the problem still there. I reopened this issue.
Best regards.

The issues you have found earlier (filed as 7238) have been fixed in this update.