PDF to docx issue in indentation

Hi,


While copnverting pdf to docx, we are facing an issue with indentation.

For example:
input text in PDF is having the left indent: 0.8 in inch and right indent: 0.55 in inch.
After converting to docx the values are below:

Actual output : left indent: 0.0 in inch Ringht indent: 0.0 in inch

Expected output: left indent: 0.8 in inch and right indent: 0.55 in inch.

could you please find the attachment for more info and help us on this.

Thanks,
Rajesh




Hi Rajesh,


Thanks for your inquiry. I have tested the conversion and noticed that paragraphs in resultant DOCX have 0.0 indentation. However can you please confirm how you are measuring/identifying the indent in PDF document, as I am unable to notice any indentation value against PDF text.

We are sorry for the inconvenience.

Best Regards,
Hi Ahmad,
For the above query, we have given the brief description below. Requesting you to please update as soon as possible.

Below are the steps we followed.
1. As per our requirement we need PDF as input file, so first we created file with InputDoc.docx format with left indent: 0.8 in inch and right indent: 0.55 in inch.
2. Then this InputDoc.docx file is saved as PDF format with name InputPdf.pdf file.
3. So when you open the InputPdf.pdf, you can see the proper indentation with left indent: 0.8 in inch and right indent: 0.55 in inch as we have provided in the InputDoc.docx. Here we are measuring the indentation values by just seeing the indentation positions of the line by comparing InputPdf.pdf vs InputDoc.docx.
4. Now by using InputPdf.pdf as input and we are converting this InputPdf.pdf file into OutputDoc.docx using jar "aspose.pdf-11.6.0.jar", you can see the indentation issue in OutputDoc.docx. In OutputDoc.docx, left indent: 0.0 in inch Right indent: 0.0 in inch.

Could you please investigate and help us.

Hi Rajesh,


Thanks for sharing the details. I have noticed that Aspose.Pdf indent the paragraph in resultant DOC(X) properly but set the Indentation properties 0. I have logged an enhancement ticket PDFJAVA-35941 in our issue tracking system investigate and set indentation values as per actual. We will keep you updated about the issue resolution progress.

We are sorry for the inconvenience caused.

Best Regards,

Hi Ahmad,


Any updates on this issue. Waiting for your response

Hi Rajesh,


Thanks for your patience.

As we recently have noticed earlier reported issue, so its pending for review and is not yet resolved. However the product team will surely consider investigating/fixing it as per development schedule and as soon as we have some definite updates regarding its resolution, we will let you know. Please be patient and spare us little time. We are sorry for this delay and inconvenience.

Hi Nayyer Shahbaz,


Thanks for your response.

While converting PDF to docx, We are able to find the Indentations in left and right with the help of below code.


com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(“Input Document”);

DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);
saveOptions.setRelativeHorizontalProximity(2.5f);
saveOptions.setAddReturnToLineEnd(false);
pdfDocument.save(“Output document”, saveOptions);

But still we are facing issues in right indent in most of the scenarios while reading from the Converted docx from PDF.

For example:

The below paragraph which was present in Input document indented left and right but aspose is finding only left indent but not right indent.

Line of Credit. Your Line is an open-end line of credit which you may use

to obtain cash advances (Advances) from time to time for a period of 10

years (Term). Your Line will mature on the last day of the billing cycle

ending on May 2011 (Maturity Date). If you continue to meet Bank's then

current standards for credit criteria and collateral value, at Bank's

discretion, Bank will either extend the Maturity Date for one or more

additional Terms or Bank will refinance your Line on the terms then being

offered by Bank for Equity Reserve Line of Credit.


Expected indentation : left indent : 0.5
Right indent : 0.5

Actual Indentation : left indent : 0.5
Right indent : 0.0

PFA for more information.

Converted docx from pdf- "c_Cc_2015-Ohio-4092_ASPOSE.docx"

Input PDF : "Cc_2015-Ohio-4092_ASPOSE.pdf"

Could you please help us on this.

Thanks,
Rajesh



Hi Rajesh,


Thanks for your patience.

We have further investigated the earlier reported PDFJAVA-35941 and with latest release version, when using following code snippet, the left indent value is equal 0.8 inches and right indent value is equal 1.48 inches.

[Java]

Document document = new Document(InputPdf(1).pdf");<o:p></o:p>

DocSaveOptions saveOption = new DocSaveOptions();<o:p></o:p>

saveOption.setFormat(DocSaveOptions.DocFormat.DocX);<o:p></o:p>

saveOption.setAddReturnToLineEnd(false);<o:p></o:p>

saveOption.setMode(DocSaveOptions.RecognitionMode.Flow);<o:p></o:p>

saveOption.setRelativeHorizontalProximity(2.5f);<o:p></o:p>

document.save(“Resultant.docx”, saveOption);

Please, note that:

1. The document OutputDoc.docx was saved with a Textbox recognition mode (saveOption.Mode = DocSaveOptions.RecognitionMode.Textbox) although the snippet code uses saveOption.Mode = (DocSaveOptions.RecognitionMode.Flow). In the case of using ‘Textbox’ mode a document will not have paragraphs indent values because of absolute positioning text elements. To restore the original document author’s intent and in order to produce a maximally editable document, the Flow recognition mode should be used.

2. The option:

saveOption.setAddReturnToLineEnd(true);

adds hard line breaks on the ends of lines and sets right intent values to 0. For saving the right intent values, please use the code:

saveOption.setAddReturnToLineEnd(false);

3. The right intent value in the output document is equal 1.48 inches, although in the input document the value is equal 0.55 inches. The reason of the difference is because we are forced always set the right margin of .DOC documents to 0 for a better conversion most of the documents, so the right intent value become larger on the value of right margin (~0.93 inches).

Kusumanchi.Rajesh:
Hi Nayyer Shahbaz,

Thanks for your response.

While converting PDF to docx, We are able to find the Indentations in left and right with the help of below code.


com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(“Input Document”);

DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);
saveOptions.setRelativeHorizontalProximity(2.5f);
saveOptions.setAddReturnToLineEnd(false);
pdfDocument.save(“Output document”, saveOptions);

But still we are facing issues in right indent in most of the scenarios while reading from the Converted docx from PDF.

For example:

The below paragraph which was present in Input document indented left and right but aspose is finding only left indent but not right indent.

Line of Credit. Your Line is an open-end line of credit which you may use

to obtain cash advances (Advances) from time to time for a period of 10

years (Term). Your Line will mature on the last day of the billing cycle

ending on May 2011 (Maturity Date). If you continue to meet Bank's then

current standards for credit criteria and collateral value, at Bank's

discretion, Bank will either extend the Maturity Date for one or more

additional Terms or Bank will refinance your Line on the terms then being

offered by Bank for Equity Reserve Line of Credit.


Expected indentation : left indent : 0.5
Right indent : 0.5

Actual Indentation : left indent : 0.5
Right indent : 0.0

PFA for more information.

Converted docx from pdf- "c_Cc_2015-Ohio-4092_ASPOSE.docx"

Input PDF : "Cc_2015-Ohio-4092_ASPOSE.pdf"

Hi Rajesh,

Thanks for using our API's.

I have tested the scenario and have managed to reproduce same problem. For the sake of correction, I have logged it as PDFJAVA-36178 in our issue tracking system. We will further look into the details of this problem and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

The issues you have found earlier (filed as PDFJAVA-36178) have been fixed in Aspose.Pdf for Java 16.12.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.