Hey,
I have some pdf files containing checkboxes. When I convert pdf to HTML, the checkbox’s checked information are coming as an image. As a requirement, I need to remove all images from the HTML file.
So when I remove all images, this checkbox info is also removing because of retrieving as an image.
Is there any other way to retrieve checkbox-tricked information instead of an image?
here I have attached a pdf file and my code for your consideration.
code:
pdf_bytes = BytesIO(pdf_recover)
converted_pdf_load = ap.Document(pdf_bytes)
save_options = ap.HtmlSaveOptions()
save_options.raster_images_saving_mode = 2
save_options.parts_embedding_mode = 0 # embed CSS and fonts only
#Delete all images on all pages
for i in range(len(converted_pdf_load.pages)):
while len(converted_pdf_load.pages[i + 1].resources.images) != 0:
converted_pdf_load.pages[i + 1].resources.images.delete(1)
# converted_pdf_load.save(html_file)
converted_pdf_load.save(html_file, save_options)
example.pdf (159.6 KB)