Read an ADA Compliant or tagged PDF in a sequence of its content

testinghtmltopdf · August 2, 2021, 3:23pm

we get bytes and we need to create PDF in same sequence

asad.ali · August 2, 2021, 5:57pm

The Aspose.Pdf.Document can be initialized using Stream and you can save it into Stream as well. You can read bytes to create a FileStream in order to initialize the Document object and after saving it into output Stream, you can obtain bytes from that Stream. Please feel free to let us know in case you need more information.

testinghtmltopdf · August 2, 2021, 6:39pm

we are reading in to stream and later initializing textfragment and paragraph to read through but is there any way we can read in same sequence as in bytes?

asad.ali · August 3, 2021, 8:52am

@testinghtmltopdf

Can you please share sample code snippet for better understandings? We will further proceed to assist you accordingly.

testinghtmltopdf · August 5, 2021, 4:32pm

as aspose.pdf is not supporting ada complaint PDF , we are reading bytes from PDF and manually creating pdf which will be ADA complaint.

how can i maintain same sequence that i have in original PDF ?
we are using paragraph absorber

asad.ali · August 5, 2021, 7:08pm

@testinghtmltopdf

Aspose.PDF does support ADA compliance and it offers the capability to generate Tagged PDFs. As requested earlier, can you please share the complete use case with a sample PDF and code snippet so that we can test the scenario in our environment and proceed further to assist you accordingly?

testinghtmltopdf · August 5, 2021, 10:53pm

we can take any PDF that has table and text which is not ADA complaint

var taggeddocument = new Document();
ItaggedContent taggedcontent = taggeddocument.TaggedContent;
StructureElement rootelement = taggedContent.RootElement;

Document doc= new Document(" READ ANY PDF");
doc.Pages.Count();

foreach(var page in doc.Pages)
{

ParagraphAbsorber absorber = new ParagraphAbsorber();
absorber.Visit(Page);
PageMarkup markup = absorber.PageMarkUps[0];

loop for section in makup.Sections
{
loop for paragraph in sections.Paragraphs
{
ParagraphElementr p = taggedContent.CreateParagraphElement;
string s= paragraph.Text.Replace(System.Envirnment.newlie, String.Empty);
loop for fragment in paragraph.Fragments
{ p.StructureTextState.FontStyle = fragment.textSate.FontStyle;
}
p.SetText(s);
rootelement.AppenChild§;
}}}
taggedocument.Save(“taggedpdf”);

testinghtmltopdf · August 5, 2021, 10:55pm

Above code created tagged PDF gives tagged PDF, but reads everything as text.

i understand we can have tableaborber too.

if original pdf has table , it should read table and sequence in original PDF is maintained in tagged PDF, how to do that?

testinghtmltopdf · August 6, 2021, 5:03pm

please provide inputs because not sure how to read original PDF in same order

asad.ali · August 9, 2021, 1:07pm

@testinghtmltopdf

We need to further investigate whether this requirement is feasible to achieve or not. For this purpose, we have logged an investigation ticket as PDFNET-50359 in our issue tracking system. We will further look into its details and keep you posted with the status of its resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.

testinghtmltopdf · September 8, 2021, 3:13pm

any update ?

asad.ali · September 8, 2021, 9:00pm

@testinghtmltopdf

The ticket has recently been logged in our issue tracking system and is pending for analysis. We will surely investigate and resolve the ticket on a first come first serve basis and let you know as soon as we have some definite updates regarding its resolution. Please spare us some time.

We are sorry for the inconvenience.