Convert PDF to PPT in C# using Aspose.PDF for .NET - Problems in output files

@stefan.net.test

Thanks for sharing the font files.

We have tested the scenario in our environment while using Aspose.PDF for .NET 20.12 and Aspose.Wrods for .NET 21.1. Following code snippet was used to carry out the conversions:

Aspose.Words.Document doc = new Words.Document(dataDir + "20201218_Admin_DAS.docx");

Aspose.Words.Saving.PdfSaveOptions saveOption = new Words.Saving.PdfSaveOptions();
saveOption.Compliance = Words.Saving.PdfCompliance.PdfA1b;
saveOption.UseHighQualityRendering = true;
doc.Save(dataDir + "20201218_Admin_DAS.pdf", saveOption);
            
Document document = new Document(dataDir + "20201218_Admin_DAS.pdf");
document.Save(dataDir + "Converted.pptx", new PptxSaveOptions());

For the Files_V1, we did not notice any issue both in generated PDF and PPTX output after installing the fonts in our system. Please check the attached output PDF and PPTX that they both contains desired font.

Files_V1.zip (57.8 KB)

For second set of files i.e. Files_v2, output PDF did not have the desired font and so did the output PPTX. The reason seems related to the font name in DOCX input file. The font name used in the Word file was different than the name of installed font in the system. However, the issue is more related to Aspose.Words and we request you create a topic in respective category so that it can further be investigated.

Regarding the other issue i.e. PDFNET-46581, here we have the same situation as in PDFNET-46582 and PDFNET-46583. A new bullet would appear, if you would be formatting a list. And a list is an element of logical structure, which is not recognized by the API. Without this recognition you are formatting just an ordinary paragraph, where those bullets are ordinary characters, not list item markers.

The ticket information has been updated as per your feedback and we will inform you as soon as we have further updates regarding its resolution.

PS: We have uninstalled and removed the fonts from our system after testing the scenario.

I can not download the files you attached because I’m not the owner of this topic. Can you please send me the files in a private message?

@stefan.net.test

We have shared the files with you in a private message.

Are there any updates on PDFNET-46567?

@stefan.net.test

We have performed the investigation against the ticket PDFNET-46567 and found that If the initial DOC file is generated on the same machine then the font most likely is modified along with DOC->PDF conversion which is not performed by Aspose.PDF API. As to Arial in “Detecon_Carr_DE_2020-01-22 (1).pdf”, you can discover with Adobe Acrobat Reader that the font names are actually “Arial-BoldMT” and “ArialMT”. We don’t have fonts with these names installed in our machine, so we observed the same issues that you described.

Could you please again verify if DOC to PDF Conversion is preserving the actual fonts at your end? Please let us know about your feedback and we will further proceed with the investigation accordingly.

You’re right. the font name in the PDF file is Arial-BoldMT. What do you suggest? What should be the name of the font? Just Arial-Bold?

@stefan.net.test

Yes, the font name should be the same as it is in the DOC file before conversion.

Why are there some characters added in front of the font? These are added after conversion to PPT. Is i because of the already wrong font from the PDF?image.png (748 Bytes)

@stefan.net.test

Yes, the change in font name during DOC to PDF conversion is causing it.

I rechecked one thing. If I have a DOC file with a bold font.After saving it as PDF (directly from Word) the font also gets the MT added after the name.But this does not happen for every font. E.g. for Calibri it is not added but for Arial. So this is not a problem from the software we are using to convert DOC -> PDF.
I also don’t think that the other user (that originally reported this bug) uses the same software as we do.
Can you please recheck if there is some fix or workaround for this.

@stefan.net.test

The DOC to PDF conversion is not done using Aspose.PDF. It is carried out with the help of Aspose.Words and right place to get an answer related to DOC to PDF conversion is respective category. Please create a post there with the details of your scenario and you will be assisted there accordingly.

But we are not even using the DOC -> PDF from Aspose.
I just rechecked by removing the strange characters that were added when using Aspose. The font looks correct.
As I already said in my last comment the MT is also added when exporting to PDF directly from Word. Then this has to be an issue from Word?
In the attached screenshot you can see it. First the output we get using Aspose. Second one after removing GODOLG from the font name. Third one when I directly choose the font in the dropdown in PPT. image.png (3.4 KB) image.png (701 Bytes)

@stefan.net.test

As per our understandings now, you are converting a Word document into PDF using MS Word and then generating a PPTX/PPT file from obtained PDF using Aspose.PDF. In the final generated output PPTX/PPT, you are noticing font related issues? Please confirm if we stated correctly. Also, kindly share the new PDF file about which you mentioned in your last two messages. We will proceed accordingly.

You can find the files in this comment (replied to the comment from Jan 20).

So actually we are using a different software to generate DOC -> PDF. And after that we use Aspose to convert it to PPT.
To test if the other software causes the problem I generated the PDF directly from Word and both looked the same (MT added to font name).
I also attached the PDF that was generated directly from Word. There you can see in the font settings that there is also MT added to the name. This is a test.pdf (6.3 KB)
I did not test to convert the PDF generated in Word to PPT. But if it is correct that the problem occurs because of the MT then there should be the same problem.

@stefan.net.test

Thanks for the feedback and explaining the issue further. We have updated the ticket information accordingly and will investigate from this perspective. We will let you know once we find some way to tackle such situation using Aspose.PDF.

Are there any updates?

@stefan.net.test

We have performed an initial investigation and found that the font name did changed when we converted a simple Word Document into PDF using MS Word. The “MT” was added to the font name only when the font was Arial. We are afraid that currently we cannot guess a system font ignoring “MT” and further investigation or enhancement is required for it. Another ticket has been opened as PDFNET-51820 in our issue tracking system to further address this particular case. We will let you know once we make some progress in this regard.

We apologize for the inconvenience.

Hi,
are there new information regarding PDFNET-51820?

@stefan.net.test

We are afraid that we do not have any news about ticket resolution at the moment. We will let you know once we find some way to prevent the issue you are currently facing. Please give us some time. We apologize for the inconvenience.

Hi, are there any updates?