Hi,
my test document contains 2 text boxes which both contain a phrase. When I convert to the document to HTML the first word of this phrase is split up into multiple span elements.
The resulting HTML:
<div class="stl_03 stl_04"> <div class="stl_01" style="left:1.6667em;top: 0.6559em; "><span class="stl_05 stl_06 stl_07">Evaluation Only. Created with Aspose.PDF. Copyright 2002-2018 Aspose Pty Ltd. </span></div> <div class="stl_01" style="left:8.3268em;top: 6.9276em; z-index:2; "><span class="stl_08 stl_09 stl_10">W</span><span class="stl_08 stl_09 stl_07">a</span><span class="stl_08 stl_09 stl_11" style="word-spacing:0.0053em;">arde in tekstvak 1 </span></div> <div class="stl_01" style="left:8.3268em;top: 16.4355em; z-index:22; "><span class="stl_08 stl_09 stl_10">W</span><span class="stl_08 stl_09 stl_07">a</span><span class="stl_08 stl_09 stl_11" style="word-spacing:0.0053em;">arde in tekstvak 2 </span></div> </div>
The document is structured like this (extracted with Adobe Acrobat)
<Document xml:lang="nl-NL"> <Article> <Artikel> <NormalParagraphStyle>Waarde in tekstvak 1</NormalParagraphStyle> </Artikel> <Artikel> <NormalParagraphStyle>Waarde in tekstvak 2</NormalParagraphStyle> </Artikel> </Article> </Document>
The code I used to convert the document to HTML:
Document pdfDoc = new Document(sourcePdfPath);
HtmlSaveOptions htmlSaveOptions = new HtmlSaveOptions();
htmlSaveOptions.SplitIntoPages = false;
pdfDoc.Save(targetHtmlPath, htmlSaveOptions);
Can I use the SDK to ensure that each phrase is converted to a single span element?
Kind regards,
Stefaan