Convert pdf->docx->xml->pdf corrupts list bullets

Not sure either it is Aspose.Pdf or Aspose.Words issue.

One of our customers creates a document, then converts it to xml (through docx) and then converts to pdf. As a results they have corrupted list bullets in output pdf result:
image.png (2.2 KB)

As I investigated, unexpected actions happens on first converting from PDF to DOCX. For some reason we have unexpected prefixes in Symbol font references:
<w:rFonts w:ascii="IHLJPA+Symbol" w:hAnsi="IHLJPA+Symbol" w:cs="IHLJPA+Symbol"/>

But Word displays that document properly, that is why I created the topic here.

XML file (created from DOCX) contains the following font declaration inside:

		<w:font w:name="IHLJPA+Symbol">
			<w:panose-1 w:val="05050102010706020507"/>
			<w:charset w:val="01"/>
			<w:family w:val="Auto"/>
			<w:pitch w:val="variable"/>
			<w:sig w:usb-0="01010101" w:usb-1="01010101" w:usb-2="01010101" w:usb-3="01010101" w:csb-0="01010101" w:csb-1="01010101"/>
		</w:font>

As I understood it is unknown font and Pdf can’t render it. Btw, adding just one line would eliminate this issue:

<w:altName w:val="Symbol"/>

Please find attached sample input, unexpected output, intermediate xml and sample application.BulletsFiles.zip (55.3 KB)

@Vitaly_Filatenko,

Please just copy the latest versions of following font files from Windows 10 machine and install them on the machine where you are converting “InputSample.docx” to PDF format:

  • Calibri
  • Times New Roman
  • Symbol

On our end, this is how DOCX to PDF conversions look like (with & without these fonts):

It is not a root of the issue. I have all these fonts installed. Moreover, in your sample 21.3 output without fonts.pdf doesn’t contain such fonts inside (open it and take a look into properties), while in my sample all these fonts are included (except Symbol font due to mentioned issue with font prefix in XML), but bullets are not displayed:

image.png (13.7 KB)

I’ve just created very simply docx with just one font and the issue was reproduced: InputSample2.zip (10.3 KB)

I suppose you didn’t tried to run my samples and missed my explanation about Symbol font.

@Vitaly_Filatenko,

I am afraid, I could not find this sample application in BulletsFiles.zip that you attached in first post. Please create a standalone simple Console Application (source code without compilation errors) that helps us to reproduce this problem on our end and attach it here for testing. Please do not include Aspose.Words DLL files in it to reduce the file size.

I can even observe the problem with bullets when I open “_Output.WordML” with MS Word on my end (please see screenshot.png (12.4 KB)). Does MS Word on your end produce correct output when saving _Output.WordML to PDF format?

I have also noticed that you are using very old (20.2) version of Aspose.Words for .NET. Please try to upgrade to the latest 21.3 version of Aspose.Words for .NET and see if it resolves the problem on your end?

Regarding the InputSample2.docx that you attached in your second post, both MS Word and the following code of Aspose.Words produce correct output in PDF:

Document doc = new Document("C:\\temp\\InputSample2\\InputSample2.docx");
doc.Save("C:\\Temp\\InputSample2\\21.3.pdf");

Sorry for misunderstanding with sample application. I’ve created and packed this one, but forgot to attach.ConvertBullets.zip (14.9 KB)

Unfortunately not, this WORDML is not displayed properly neither being opened in Word, nor being saved to PDF. And that is the issue, because this WORDML was created by Aspose.

It is reproducible on the latest Aspose versions as well, I used latest version in the application sample.

I know. It works fine with docx->pdf and vice verse. Issue is in converting from PDF to XML (wordml) through DOC. The application demonstrates this step by step, I provided some comments inside, too.

@Vitaly_Filatenko,

We tested the scenario and have managed to reproduce the same problem on our end. For the sake of correction, we have logged this problem in our issue tracking system. The ID of this issue is WORDSNET-21974. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

1 Like