Error saving to HTML: "hexadecimal value 0x01- is an invalid character"

Hi,

I'm getting an invalid character error when I try and save some PDF documents to HTML using Aspose.Pdf.

'', hexadecimal value 0x01, is an invalid character.; at System.Xml.XmlUtf8RawTextWriter.InvalidXmlChar(Int32 ch, Byte* pDst, Boolean entitize)

Is there a way of filtering out invalid characters, or validating the original PDF? Code snippet below:

Aspose.Pdf.Document pdfDoc = new Aspose.Pdf.Document(inFilepath);
outFilepath = Path.Combine(workingDir, string.Format("{0}.html", Path.GetFileNameWithoutExtension(inFilename)));
pdfDoc.Save(outFilepath, Aspose.Pdf.SaveFormat.Html);


Many thanks!

Hi Aisha,


Thanks for using our products.

Can you please share the sample PDF files causing this exception, so that we can test the scenario at our end. The problem seems to be related to specific files that you are using.

We are sorry for this inconvenience.

Thank you Nayyer. Here are two examples of PDF docs that gave this error.

Hi Aisha,


Thanks for sharing the resource files.

I have tested the scenario where I have tested the scenario with Aspose.Pdf for .NET 7.7.0 and have used the following code snippet to convert PDF files to HTML format and I am unable to notice any exception. However I have observed some formatting issues in resultant files.

TestDoc1_LP.pdf
Text on image present on second page is missing and also some paragraphs are missing at the bottom of page. This problem has been logged as PDFNEWNET-35015 in our issue tracking system.

Houses+of+Parliament.pdf
There are many formatting issues in resultant HTML. I have separately logged this issue as PDFNEWNET-35016.

We will further look into the details of these problems and will keep you updated on the status of correction. Please be patient and spare us little time. We are sorry for your inconvenience.


PS, For your reference, I have also attached the resultant HTML files generated over my end.

The issues you have found earlier (filed as PDFNEWNET-35016) have been fixed in Aspose.Pdf for .NET 9.5.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

The issues you have found earlier (filed as PDFNEWNET-35015) have been fixed in Aspose.Pdf for .NET 16.10.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.