Whenever I discover text that needs to be replaced in my main document, how could I insert a pdf document here?

@panCognity Sure, we will keep you updated and let you know once the issues are resolved or we have more information for you.

1 Like

The issues you have found earlier (filed as WORDSNET-25732) have been fixed in this Aspose.Words for .NET 23.9 update also available on NuGet.

1 Like

Thank you! Sorry for being late to reply. I had an accident. Anyway, I will try it out and get back to you If I have any inquiries.

1 Like

The problem still remains. Do you want me to open a deferent ticket with the issue?

@panCognity There were reported two issue in this topic:

WORDSNET-25732 - Font size is changed after converting PDF to DOCX
WORDSNET-25731 - Content is damaged after converting PDF to DOCX.

The WORDSNET-25732 has been resolved. WORDSNET-25731 is not resolved yet.

1 Like

Ok. I will wait to be resolved. Thank you for your answer.

1 Like

Documents.zip (2.3 MB)

Hi.

I convert all pdf files to docx files. Then I replace all placeholders of main file (which is converted from ΤΣΙΓ_ΝΔ_3_0025.pdf file) with additional docs to it, so there is a bigger docx file. When finished, I save both a docx file and a pdf file. Finally, I compress pdf file, so it has a smaller size. I saw that the “ΤΣΙΓ_ΝΔ_3_0025.docx” file has some issues in tables (lines, fields). Is there a way I can fix those problems?

@panCognity Could you please specify where the problem is? Your documents are quite big and it is not quite clear what problem you mean. It would be great if you simplify the example to demonstrate the problem and provide simple code that will allow us to reproduce the problem.

ΤΣΙΓ_ΝΔ_3_0025_Output.zip (4.9 MB)

I am sending you the conversion to docx code, the code for find and replace text in the document, the code for compressing the output file and the output file itself. In the output file, I have drawn a red rectangle where the problem occurs. It is exactly the same in the converted docx file, before I save it to a pdf. There are other files that are much larger than this one. However, the problem remains the same. As I had some issues with greek fonts and images (There is an open ticket for this (see your reply: WORDSNET-25731 has not yet been resolved.) Therefore, I have chosen to convert all pdf files to docx files first, replace the placeholders in docx file and then save it to both docx and pdf file, as you can see from the code I have attached.

Main template is ΤΣΙΓ_ΝΔ_3_0025.pdf file. All other pdf files will be embedded in the ΤΣΙΓ_ΝΔ_3_0025.pdf file, replacing placeholders.

@panCognity Thank you for additional information. Unfortunately, I cannot reproduce the problem on my side. Could you please create a simple console application, which includes all required documents and resource and allows us to reproduce the problem on our side.

Also, please note, Aspose.Words is designed to work with MS Word documents. MS Word documents are flow documents and they have structure very similar to Aspose.Words Document Object Model. On the other hand PDF documents are fixed page format documents. While conversion PDF document to MS Word document Fixed Page Document structure into the Flow Document Object Model. Unfortunately, such conversion does not guaranty 100% fidelity. So it is not always possible to retain PDF document layout to MS Word document or after PDF-DOCX-PDF roundtrip.

Agwges.zip (17.3 KB)

Project_Horizon.zip (2.1 MB)

Attached is the console app (Agwges.zip) and also sample files my application uses (Project_Horizon.zip).

After you inspect this problem, I would like to be advised also on how could I improve compression, when I have larger files.

@panCognity Thank you for additional information. The problem is not in Aspose.Words. It is caused by Aspose.PDF while conversion from PDF to DOCX. The problem can be reproduced by the following simple code:

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@"C:\\Temp\\Ææêé_îâ_3_0025.pdf");
Aspose.Pdf.DocSaveOptions saveOptions = new Aspose.Pdf.DocSaveOptions
{
    Format = Aspose.Pdf.DocSaveOptions.DocFormat.DocX,
    Mode = Aspose.Pdf.DocSaveOptions.RecognitionMode.Flow,
    RelativeHorizontalProximity = 5.5f,
    // Enable the value to recognize bullets during conversion process
    RecognizeBullets = true
};
pdfDocument.Save(@"C:\Temp\out.docx", saveOptions); // The problem is already there in the DOCX produced by Aspose.PDF

Aspose.Words.Document doc = new Aspose.Words.Document(@"C:\Temp\out.docx");
doc.Save(@"C:\Temp\out.pdf");

So you should report the problem to Aspose.PDF team in the appropriate forum.

I am afraid there is no way to speed up the document processing, since you are converting from PDF to DOCX. As I have mentioned models of these formats are quite different. So the conversion process is quite complex and time consuming.

I have understood this. That’s why I asked if there is any workaround/solution to this. Could you suggest the appropriate forum? I want to make sure that I will report it in the right place. Thank you in advance!

@panCognity If the problem occurs after processing document using Aspose.PDF you should report the problem in Aspose.PDF forum:
https://forum.aspose.com/c/pdf/10

If the problem is caused by Aspose.Words, you should post the problem here in this forum.

1 Like

Thank you! I appreciate it! :slight_smile:

1 Like