Document conversion inserting null characters

MDnetwork · January 14, 2014, 10:35am

Hello,

Having a problem when converting word documents where in the process of converting a Word document to .doc or .docx Aspose inserts null characters. The attached “word.png” shows the text prior to conversion and the “aspose.png” shows what it looks like after. Does anyone know if there is a setting or something that can be done to prevent the null (hex 00) from inserting in between each character?

Thanks guys

awais.hafeez · January 15, 2014, 3:29am

Hi Warren,

Thanks for your inquiry. To ensure a timely and accurate response, please attach the following resources here for testing:

Your input Word document you’re getting this problem with.
The Aspose.Words generated output document (.docx) which shows the undesired behavior.
Please share piece of code you used for conversion.
What version of Aspose.Words for .NET are you currently using?

As soon as you get these pieces of information ready for me, I’ll start investigation into your issue and provide you more information.

Best regards,

MDnetwork · January 15, 2014, 9:20am

Thanks for your repy Awais Hafeez.

I tested Aspose.Words 11.6.1.0 and the latest 13.12.0.0 with the same results. The code to replicate the problem is simply to load and save the docx to doc. If open the “HeaderSample.docx” with Microsoft Word and save to doc I get results in the binary in the binary image “WordConversion.png”. On the other hand converting the document using Aspose I get the results in the binary image “AsposeConversion.png”. The new doc created by Aspose has a hex 00 in between each character.

Code:

Aspose.Words.Document doc = new Aspose.Words.Document(@"C:\Test Docs\HeaderSample.docx");
doc.Save(@"C:\Test Docs\HeaderSample.doc", Aspose.Words.SaveFormat.Doc);

Thanks

awais.hafeez · January 16, 2014, 3:02am

Hi Warren,

Thanks for the additional information. I tested the scenario and have managed to reproduce the same problem on my side. For the sake of correction, I have logged this problem in our issue tracking system as WORDSNET-9544. Our development team will further look into the details of this problem and we will keep you updated on the status of correction. We apologize for your inconvenience.

Best regards,

MDnetwork · January 16, 2014, 9:38am

Thanks Awais, look forward to hearing back with a resolution soon.

awais.hafeez · January 16, 2014, 10:49am

Hi Warren,

The null characters you’re referring to is actually Unicode string. Aspose.Words always writes document text in Unicode. Could you please share why do you need binary of exported document? Why do you want to directly access the byte codes in DOC file?

Best regards,

MDnetwork · January 16, 2014, 11:02am

Awais,

We export Word documents to various systems and the Unicode is causing errors on import. Is there a reason Aspose writes the document text in Unicode yet a Microsoft Word saved document does not? And yes would it be possible to access the bytes?

Thanks

awais.hafeez · January 20, 2014, 9:24am

Hi Warren,

Thanks for your inquiry. I am in communication with our development team and will get back to you as soon as I have any further information about this.

Best regards,

MDnetwork · January 23, 2014, 1:10pm

Hi Awais, just checking in to see if you have any update. Thanks

awais.hafeez · January 24, 2014, 2:58am

Hi Warren,

Thanks for your inquiry. Unfortunately, this issue is not resolved yet. Our development team has completed the analysis of this issue and the root cause has been identified. Unfortunately, we have not heard back from the development team yet. We will be sure to inform you as soon as we have any updates for you. We apologize for the delay.

Best regards,

MDnetwork · February 5, 2014, 9:32am

Hi Awais. Anything new to report about the resolution to this problem we are having?

Thanks.

awais.hafeez · February 6, 2014, 4:43am

Hi Warren,

Thanks for your inquiry. We discussed the problem and the fix to this issue may definitely come onto the product roadmap in the future but unfortunately at the moment we cannot provide you any reliable estimate. We also request you please elaborate your requirements in detail so that we may be able to give you a workaround/solution. For example, it would be great if you please provide answers to the following questions:

Could you please give us the exact details of the problems you’re encountring during importing Aspose.Words generated documents into your system?
Why do you need direct byte access to text inside binary Word document?
Is it possible for you to use more “user-friendly” formats such as WordML or Rtf instead?

Best regards,

MDnetwork · February 6, 2014, 6:07am

We have several clients that import our MS Word “DOC” files into their Medical EMR system. Their import process extracts the text from the MS Word file using a legacy proprietary program. This program is extracting the text incorrectly due to the NULL characters. This is something we have no control over.

Our clients’ import process works correctly when we use MS Word to convert the DOCX to DOC. This is because the MS Word file doesn’t contain those NULL characters from the unicode text.

We cannot use any other format other than MS Word DOC due to the legacy proprietary program that our clients use. I wish there was an alternative but there isn’t.

Is there a way that we could change that unicode text somehow to remove the NULL characters? Perhaps a method of directly modifying the bytes of the file?

Any work around is greatly appreciated until an option is added to Aspose.Words API.

Thanks

awais.hafeez · February 7, 2014, 4:05am

Hi Warren,

Thanks for the additional information. I have passed this information to our development team. We will keep you informed of any developments and let you know once it is fixed.

Best regards,

MDnetwork · March 7, 2014, 9:00am

Hello, just checking in to see if there might be an estimated date of when this might be corrected in the next release and what release that might be.
Thanks,
Warren

awais.hafeez · March 9, 2014, 1:05pm

Hi Warren,

Thanks for your inquiry. Unfortunately, your issue is not resolved yet and there is no ETA available. The implementation of the fix of this issue has been postponed. We will inform you via this thread as soon as this is resolved. We apologize for your inconvenience.

Best regards,

MDnetwork · July 14, 2014, 4:13pm

Just checking to see if there is an ETA to getting this implemented. Thanks, Warren

awais.hafeez · July 15, 2014, 10:36pm

Hi Warren,

Thanks for your inquiry. Unfortunately, at the moment we cannot provide you any reliable estimate and it is not likely to be fixed in the near future. The implementation of the fix to this issue has been postponed till a later date and we cannot push it into production right now because there are many other important issues we have to work on. We will inform you via this thread as soon as this issue is resolved. We apologize for your inconvenience.

PS: If this issue is important to you, and for the fast resolution of this issue, please have a look at enhanced support options - e.g. purchasing Priority Support will allow you to post your issues in our Priority Support forum and raise the priority of these issues directly with our development teams, if possible, we will then aim to get a resolution to your issue as soon as we can. Many Priority Support customers find that this leads to their issue being fixed in the next release of the software.

If you would like to take advantage of Enhanced Support then please request a quote in our purchase forum -https://forum.aspose.com/c/purchase/6*

Best regards,