Convert PDF to DOCX - output document has bold style as doubled bold

Hello Aspose team,

After the upgrading of the aspose pdf version to 20.11, we encountered issues with fonts. The words which are supposed to be bold are being double bold (refer to the screenshot provided in the zip file) when exported to word documents. However, this issue does not exist in the export of pdf. We are currently using Arial font in our exported pdfs/words.

Please refer to the attached resources which contains executable code and templates to further analyze this from your end.

FontTest.zip (1006.5 KB)

Also please let us know if we can work our way around this and fix this with minor fixes in the current code or not.

Thanks,
Astha

@aasthapa

Would you please try to provide a PDF document as well which has been obtained at your side and you are converting it to the DOC/DOCX? We faced some issue while running the sample application that you shared and are looking into it at the moment.

The exported pdf is here output.zip (57.6 KB)

@aasthapa

We were able to notice the issue in our environment when we tested the scenario using Aspose.PDF for Java 20.12 while converting your recently shared PDF into DOCX. We used following code snippet for testing:

Document doc = new Document(dataDir + "output.pdf");
DocSaveOptions saveOption = new DocSaveOptions();
saveOption.setMode(DocSaveOptions.RecognitionMode.Flow);
saveOption.setFormat(DocSaveOptions.DocFormat.DocX);
saveOption.setRecognizeBullets(true);
doc.save(dataDir + "sample20.12.docx", saveOption);

We have logged an issue as PDFJAVA-40031 in our issue management system. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

@asad.ali

Thank you for your quick reply. We hope it will be fixed soon.

@aasthapa

We will surely resolve this issue. However, it will be investigated and fixed on a first come first serve basis. You will receive a notification within this forum thread once the ticket is rectified.

Hi, do you have any updates on this? The client on our end is waiting for the fix so we would like to know if you have any update on this.

Thanks

@aasthapa

We are afraid that earlier logged ticket is not yet investigated and we cannot share any ETA for its resolution at the moment. However, we will surely let you know as soon as the issue is fully analyzed and additional updates are available for its fix or ETA. Please give us some time.

We apologize for the inconvenience.

@asad.ali

Hello again, Is there any update on this?

@aasthapa

We would like to share with you that your issue is expected to be released in 21.5 version of the API which will be released in May 2021. We will notify you as soon as the fix-in version is available.

@aasthapa

We have investigated the ticket. This is not a bug. In fact, double bold on bold text is the result of using the Arial-BoldMT font along with the bold style, while the rest of the text uses ArialMT font. The visual difference is due to the specifics of the difference between Word and Acrobat readers.

But font Arial-BoldMT can be changed into Arial font with bold style it helps to avoid double bold effect in result.

Code snippet:

 Document doc = new Document(dataDir + "output.pdf");

        TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber();
        doc.getPages().accept(textFragmentAbsorber);
        Font arial_unicode_ms = FontRepository.findFont("Arial");
        for (TextFragment textFragment : textFragmentAbsorber.getTextFragments()) {
            TextFragmentState textState = textFragment.getTextState();
            if ("Arial-BoldMT".equals(textState.getFont().getFontName())) {
                    textState.setFont(arial_unicode_ms);
                    textState.setFontStyle(FontStyles.Bold);
            }
        }

        DocSaveOptions saveOption = new DocSaveOptions();
        saveOption.setMode(DocSaveOptions.RecognitionMode.Flow);
        saveOption.setFormat(DocSaveOptions.DocFormat.DocX);
        saveOption.setRecognizeBullets(true);

        doc.save(dataDir + "sample_22_9__substituredFont.docx", saveOption);