version 23.8:
issue 1: characters were duplicated:
image.png (20.2 KB)
issue2: cell data were merged together to one
image.png (17.7 KB)
issue 3: 2 cells were merged to 1
image.png (20.2 KB)
pdf file:
#5993504(QWP002-PCO-B1)Vendor Gateway 2024.5.14(GZHL2404016509LW)-TRF.pdf (76.6 KB)
@softboy
I used this code and got the output docx document and compared it with the original pdf.
var doc = new Document(dataDir + "5993504.pdf");
doc.Save(dataDir + "5993504-out.docx");
I don’t have those discrepancies (see screenshot) that are indicated here.
What version of the library are you using?
we use 23.8 version,
the following is the one which i tried with you latest 24.5.1 version:
DocSaveOptions saveOptions = new DocSaveOptions
{
// Specify the output format as DOCX
Format = DocSaveOptions.DocFormat.DocX,
// Set other DocSaveOptions params
Mode = DocSaveOptions.RecognitionMode.EnhancedFlow
};
image.png (28.0 KB)
@softboy
Thanks for the explanation - this version reproduces the problem.
In this case, I believe the transformation was correct. If, say, the words were separated by spaces (or at least one space), one would expect a division into two cells. But in the given case, one cell per word that takes up the entire width is a completely expected solution and I do not consider this wrong.
@softboy
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): PDFNET-57355
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.