Detect numbered items when converting PDF to DOCX

Good day,

We are using Aspose.PDF to convert PDF documents to Word documents. During conversion, Aspose correctly identifies bullet list items during conversion, but for some reason numbered list items are just converted to paragraphs with the item number being part of the paragraph. This applies to all different types of numbered items, including alphabetic and roman numerals. Calling .isListItem() in Aspose.WORDS returns false for these numbered list items.

Is there a way to configure Apose.PDF to detect numbered list items during conversion? If not, is there a way that I can convert paragraphs to numbered list items in Aspose.WORDS?

@jacogericke

Could you please ZIP and attach your input PDF and problematic output DOCX here for testing? We will investigate the issue and provide you more information on it.

Please see the attached ZIP file. I have also included another DOCX which was converted from the same PDF using a different application. The other DOCX correctly identifies both numbered items as well as different heading levels, e.g. Heading 1, Heading 2. Is it also possible to detect headings when converting PDF to DOCX with Aspose?

Example Document.zip (79.5 KB)

@jacogericke

We have logged this problem in our issue tracking system as PDFNET-53371. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

1 Like

A post was split to a new topic: Detect list items when converting PDF to DOCX

@tahir.manzoor I would just like to add, I checked the output Word document again and I see that Aspose.WORDS in this case also did not detect the bulleted list items, but just converted them to normal paragraphs (not list paragraphs). So to reiterate, these are the three issues that I have with converting PDF to DOCX:

  1. Headings are not detected, e.g. Heading 1, Heading 2, etc.
  2. Numbered list items are not detected (including numerical, alphabetical and roman numerals, as well as numbered list items in parentheses or containing sections separated by periods)
  3. Some bulleted list items are not detected (perhaps not all possible bullet characters are being considered?)

Can you please update the ticket to include all of the points mentioned?

@jacogericke

Thanks for sharing the detail. We have logged it in our issue tracking system. You will be informed once this issue is resolved.