How to retain the formatting attributes of the title when converting PDF to DOCX (selecting the title, it is indicated that the attribute is title rather than body text)

David_Matin · September 25, 2024, 2:08am

The docx I converted now has its titles formatted as body text

Professionalize.Discourse · September 25, 2024, 2:08am

To retain the formatting attributes of titles when converting a PDF to DOCX using Aspose.PDF for Java, you can utilize the DocSaveOptions class, which allows you to specify various settings for the conversion process.

Here’s a simple example of how to convert a PDF to DOCX while ensuring that titles are recognized correctly:

import com.aspose.pdf.DocSaveOptions;
import com.aspose.pdf.Document;

public class ConvertPDFToDOCX {
    public static void main(String[] args) {
        // Load the source PDF document
        Document pdfDocument = new Document("path/to/your/input.pdf");

        // Create an instance of DocSaveOptions
        DocSaveOptions saveOptions = new DocSaveOptions();
        
        // Set the output format to DOCX
        saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
        
        // Optionally, set recognition mode to Flow to improve text layout
        saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);
        
        // Save the document in DOCX format
        pdfDocument.save("path/to/your/output.docx", saveOptions);
    }
}

In this code, the DocSaveOptions class is used to specify that the output format should be DOCX. Additionally, setting the recognition mode to Flow can help in maintaining the structure of the document, which may assist in preserving the formatting of titles versus body text.

Make sure to adjust the paths to your input and output files accordingly. This approach should help in retaining the formatting attributes of titles during the conversion process.

For more advanced options and configurations, you may want to refer to the official documentation on Aspose.PDF for Java, which provides detailed information on handling various conversion scenarios.

Sources:
[1]: Change PDFs to DOC or DOCX Format | Aspose.PDF Java PDF Processing API

David_Matin · September 25, 2024, 2:08am

【pdf-书签】江苏鲜之源水产食品有限公司报告.pdf (408.1 KB)

David_Matin · September 25, 2024, 2:09am

image.png (263.1 KB)

asad.ali · September 25, 2024, 9:49am

@David_Matin

Can you please share which code snippet you are using to perform PDF to DOCX conversion? We will test the scenario in our environment and address it accordingly.

David_Matin · September 26, 2024, 10:30am

docc = ap.Document("【pdf-书签】江苏鲜之源水产食品有限公司报告.pdf")

save_options = ap.DocSaveOptions()
save_options.format = ap.DocSaveOptions.DocFormat.DOC
# Set the recognition mode as Flow
save_options.mode = ap.DocSaveOptions.RecognitionMode.FLOW
# Set the Horizontal proximity as 2.5
save_options.relative_horizontal_proximity = 2.5
# Enable the value to recognize bullets during conversion process
save_options.recognize_bullets = True
docc.save("江苏鲜之源水产食品有限公司报告.docx", save_options)

asad.ali · September 26, 2024, 6:14pm

@David_Matin

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-58234

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.