Some runtime instances the spaces are removed from some lines

We have a issue with the docx to pdf conversion in Aspose Words 20.5. On some DOCX documents on some of our runtime instances we are experience a issue with specific lines from the DOCX file having spaces being removed.

The way our environment is set up is we have 2 instances of aspose running on each server. On some of our servers aspose inside one of the instances on those servers has this bug.

We have managed to isolate the issue to happen either in the load document code or the save PDF code. I put a example below on what we are doing to convert the docx to PDF. The docx file is the exact same on all instances.

Document doc = new Document(pathDOCX)
doc.save(pathPDF, SaveFormat.PDF);

The affected lines are the exact same across all instances that are experiencing this issue. Not all instances are experiencing this issue.

We also found while googling this issues this article that contained something that might be related Replace Text in PDF|Aspose.PDF for .NET

Aspose.PDF for .NET supports the feature to search and replace text inside the PDF file. However recently some customers encountered issues during text replace when particular TextFragment is replaced with smaller contents and some extra spaces are displayed in resultant PDF or in case the TextFragment is replaced with some longer string, then words overlap existing page contents. So the requirement was to introduce a mechanism that once the text inside a PDF document is replaced, the contents should be re-arranged.

We are wondering if the code that does that spacing change might be what could be causing this issue?

We are unable to produce a non-production file that causes this issue. Because of this we can not give file samples.

Is there any guidelines on any additional tests we should run?

Thanks,
~Ryan

@RyanWilliamsUSC

Please ZIP and attach your input Word document and output PDF along with screenshots of problematic output sections. We will investigate the issue and provide you more information on it.

@tahir.manzoor

The issue has only happened to live documents from our clients. We have not been able to reproduce the issue in non-production documents. Because of the sensitive nature of our clients document that are experiencing this issue we can not share the document with anyone including most people internal to my company and external companies.

The issue happens when the DOCX to PDF conversion happens on 2 of our instances. The binary we run and the DOCX file are identical on ALL of our instances but only 2 instances experience this bug. Both instances share a physical machine with a working instance.

Since Aspose is a black box to us we are wondering if you have any guidance for us on how to troubleshoot without us sending you the files. I have been unable to reproduce this issue outside of our production environment with non-production data.

Some additional questions that might be relevant:

Does Aspose use any libraries that might not be thread safe? Since we have 2 instances of our document server that uses Aspose on a given machine if Aspose is using non-thread safe libraries that could be the cause of this issue. Only one of the 2 instances on a given server has this issue.

Have there been any bug fixes related to DOCX to PDF conversion since Words 20.5? If yes then it might be worth it to update Aspose.

Thanks,

Ryan

@RyanWilliamsUSC

Please note that Aspose.Words is secure and thread safe. We suggest you please use the latest version of Aspose.Words for Java 20.9 and let us know how it goes on your side.

If you still face problem, please ZIP and attach your input Word document, problematic output PDF and expected output PDF here for testing.

We have made this forum thread as private. Now only you and Aspose staff members can access this forum thread. You can share your documents in this thread.

@RyanWilliamsUSC

Please post your documents via private message . In order to send a private message with attachment, please click on my name and find “Message” button. Please check the attached image. send message.png (20.7 KB)

@tahir.manzoor

We are going to try to see if upgrading aspose works first. I will update this after we test this issue again in our production environment with updated Aspose.

Unless I am able to reproduce this issue outside of our production environments I will be unable to send you sample files. I am unable to send any files from our clients due to the information contained having legal protections restricting disclosure (Similar to HIPPA).

If you have other suggestions on how I can troubleshoot this issue I will appreciate it.

@RyanWilliamsUSC

Please note that Aspose.Words requires TrueType fonts when rendering document to fixed-page formats (JPEG, PNG, PDF or XPS). You need to install fonts that are used in your document on the machine where you are converting documents to PDF. Please refer to the following articles:
Using TrueType Fonts
Manipulating and Substitution TrueType Fonts

Unfortunately, it is difficult to say what the problem is without documents. We need your documents for investigation. Once we have documents, we will start investigation over your issue and provide you more information on it.

@tahir.manzoor

Thanks for your suggestions. I do understand it is near impossible to troubleshoot without documents.

On the fonts thing each server has a fonts directory containing TTFs that is shared with both instances on a given server. The fonts available to aspose are the fonts installed on the host AND the fonts in this directory. Could it be possible if one of the aspose instances held a lock on a given font? Would Aspose log this event happening?

Also is it possible for us to turn on internal logging of aspose on our end? We might be able to send you logs in the absence of the files.

@RyanWilliamsUSC

Yes, you can achieve it using Aspose.Words. We suggest you please use multi-threading. The only thing you need to make sure is that always use separate Document instance for each thread. One thread should use one Document object.

@tahir.manzoor

At present time we have each document generation task that uses Aspose be in its own worker thread within a given java runtime. On each “physical” server or host we have 2 instances of our application. Both instances on a host share a application server but are executing separately. Only one of these instances on each of the affected hosts have this issue.

I do need to validate that each document instance is only accessed by one thread at a time. This should be the case but we do not store it in a thread safe way (As in only one thread can access at once). I am going to look into this.

Could having 2 instance of aspose running in different JVMs but sharing the safe file system cause font access issues?

Also could it be anything else given it only affects some hosts and only one of the 2 instances on a given effected host?

Is there any way we can turn on internal logging for aspose so we can send you logs?

@tahir.manzoor

Our class encapsulating Document is only ever accessed by a worker thread and is only ever held by the same given worker thread. When we create a worker thread its input is a object containing all of the data it needs if we are generating a report and a file path to save the DOCX at OR a file path to load a DOCX from and a file path to save the PDF to.

The document object only exists for a given worker thread.

We do use a static initializer to initialise our Aspose license. I am not sure if this combined with us loading the license file 2 times on a given node even though each instance loads it once could cause this issue. I am also not sure if we need once license instance per thread or if one for the application is fine.

At the class level:

private final static License WORD_LICENSE = new License();

In a static initializer

WORD_LICENSE.setLicense(AsposeWordHelper.class.getResourceAsStream("/aspose/Aspose.Total.Java.lic"))

This is the only time the license is accessed. Could we be missing a step with loading our license triggering any DRM you might have?

@RyanWilliamsUSC

You do not need to set the license multiple times. The license only needs to be set once per application domain.

In your case, we suggest you please get the font resources and set it for separate instance. Hope this helps you.

Following code example shows how to get the font resources and set it.

Document doc = new Document(getMyDir() + "Rendering.docx");

// Retrieve the array of environment-dependent font sources that are searched by default
// For example this will contain a "Windows\Fonts\" source on a Windows machines
// We add this array to a new ArrayList to make adding or removing font entries much easier
ArrayList fontSources = new ArrayList(Arrays.asList(FontSettings.getDefaultInstance().getFontsSources()));

// Add a new folder source which will instruct Aspose.Words to search the following folder for fonts
FolderFontSource folderFontSource = new FolderFontSource("C:\\MyFonts\\", true);

// Add the custom folder which contains our fonts to the list of existing font sources
fontSources.add(folderFontSource);

// Convert the ArrayList of source back into a primitive array of FontSource objects
FontSourceBase[] updatedFontSources = (FontSourceBase[]) fontSources.toArray(new FontSourceBase[fontSources.size()]);

// Apply the new set of font sources to use
FontSettings.getDefaultInstance().setFontsSources(updatedFontSources);

doc.save(getArtifactsDir() + "Rendering.SetFontsFoldersSystemAndCustomFolder.pdf");

If you still face problem, please create a simplified Java application and share it with us along with document for testing. We will investigate the issue and provide you more information on it.