Use python,aspose-words=25.6.0, convert pdf to markdown.Extra spaces appear in the converted md file. For example, in the pdf, there is a paragraph with a URL in normal format. In the md document after the conversion is completed, extra spaces are added to this URL.
code:
document = aw.Document(file_path)
save_options = aw.saving.MarkdownSaveOptions()
save_options.image_resolution = 150
document.save("【标准报表】问题汇总 副本.md", save_options)
PDF:
【标准报表】问题汇总 副本.pdf (6.8 MB)
markdown:
file:https://drive.google.com/file/d/1aLQCxoefC_H68CnpYL8Z4-wDRXg5iZ-s/view?usp=sharing
@David_Matin
Cause
The issue of extra spaces appearing in the converted Markdown file when converting from PDF using Aspose.Words can occur due to how the PDF content is interpreted and translated into Markdown format. This often happens with URLs or other text elements that may have irregular spacing in the original PDF.
Solution
To address this, you can adjust the Markdown save options to better handle spacing and formatting. Specifically, explore the MarkdownSaveOptions
class to find settings that can help reduce unwanted spaces.
Code
Here’s an example of how you can modify your existing code to include additional options:
import aspose.words as aw
# Load the PDF document
document = aw.Document(file_path)
# Create Markdown save options
save_options = aw.saving.MarkdownSaveOptions()
save_options.image_resolution = 150
# Optionally, you can explore other properties of save_options to adjust formatting
# For example, you can set table content alignment if you have tables
# save_options.table_content_alignment = aw.saving.TableContentAlignment.LEFT
# Save the document as Markdown
document.save("【标准报表】问题汇总 副本.md", save_options)
Next Steps
- Test the above code to see if it resolves the spacing issue.
- If the problem persists, check the original PDF for any hidden formatting or characters that might be affecting the conversion.
- Explore the documentation for
MarkdownSaveOptions
to find additional properties that may assist with formatting issues.
Please implement these suggestions and let me know the outcome!
@David_Matin
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): WORDSNET-28388
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
The issues you have found earlier (filed as WORDSNET-28388) have been fixed in this Aspose.Words for .NET 25.10 update also available on NuGet.