Converting a MHTML file to PDF generates empty PDF file

VasilisGiannoutsos · October 17, 2019, 5:47pm

Aspose.Words generates an empty PDF from the attached mhtml file (I use Enterprise license).
Can you please check if my code does something wrong or there is an issue in product?

Here ist the code:

private static string _inputPath = “…\TestPath”;
private static string _pattern = “*.mhtml”;
public static void ReadAndSaveWordDoc()
{

		foreach (FileInfo wordFile in new DirectoryInfo(_inputPath).GetFiles(_pattern, SearchOption.AllDirectories))
        {
            try
            {
                Console.WriteLine(string.Format("Processing file '{0}'.", wordFile.FullName));
                string testWordDocPath = wordFile.FullName;

                SetLicense(Properties.Resources.Aspose_Total);
                WordDocs.Saving.PdfSaveOptions options = new WordDocs.Saving.PdfSaveOptions();
                WordDocs.LoadOptions loadOptions = new WordDocs.LoadOptions();
                loadOptions.LoadFormat = WordDocs.LoadFormat.Mhtml;
                WordDocs.Document wordDoc = new WordDocs.Document(testWordDocPath, loadOptions);
                
				string outputPDF = Path.GetFileNameWithoutExtension(testWordDocPath) + ".pdf";
        
				string savePath = Path.GetDirectoryName(testWordDocPath);
				options.Compliance = WordDocs.Saving.PdfCompliance.Pdf15;
				options.JpegQuality = 90;            
				options.PageIndex = 0;
				options.PageCount = wordDoc.PageCount;
				options.ZoomBehavior = WordDocs.Saving.PdfZoomBehavior.FitWidth;
				options.CustomPropertiesExport = WordDocs.Saving.PdfCustomPropertiesExport.Standard;
				wordDoc.CustomDocumentProperties.Add("CV_TextLayerGood ", true);            
				
				string savePdfPath = Path.Combine(savePath, outputPDF);
				wordDoc.Save(savePdfPath, options);    
            }
            catch (Aspose.Words.FileCorruptedException)
            {
                Console.WriteLine(string.Format("File '{0}' is corrupted.", Path.GetFileNameWithoutExtension(wordFile.FullName)));
                continue;
            }
            catch (Aspose.Words.IncorrectPasswordException)
            {
                Console.WriteLine(string.Format("File '{0}' is password protected.", Path.GetFileNameWithoutExtension(wordFile.FullName)));
                continue;
            }
            catch (Exception)
            {                    
                throw;
            }
        }
    }

}

The mhtml file attached in the zip folder.

MHTMLToPDF.zip (3.7 KB)

tahir.manzoor · October 17, 2019, 6:22pm

@VasilisGiannoutsos

We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-19394. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

VasilisGiannoutsos · November 1, 2019, 1:20pm

Can you estimate the delivery date?

tahir.manzoor · November 1, 2019, 1:44pm

@VasilisGiannoutsos

We try our best to deal with every customer request in a timely fashion, we unfortunately cannot guarantee a delivery date to every customer issue. We work on issues on a first come, first served basis. We feel this is the fairest and most appropriate way to satisfy the needs of the majority of our customers.

Currently, your issue is pending for analysis and is in the queue. Once we complete the analysis of your issue, we will then be able to provide you an estimate.

VasilisGiannoutsos · November 7, 2019, 2:17pm

Thank you for the feedback. Is it possible to explain me what is the cause of the empty PDF converted especially from the html?
Is it possible to have a workaround until we get the fix?

tahir.manzoor · November 7, 2019, 4:00pm

@VasilisGiannoutsos

Please use the following code example to get the desired output. Hope this helps you.

LoadOptions loadOptions = new LoadOptions();
loadOptions.LoadFormat = LoadFormat.Html;
Aspose.Words.Document doc = new Aspose.Words.Document(MyDir + "Vs._Nr.  VN.mhtml", loadOptions);
if (doc.FirstSection.Body.FirstParagraph.ToString(SaveFormat.Text).Trim().StartsWith("From"))
    doc.FirstSection.Body.FirstParagraph.Remove();


PdfSaveOptions options = new PdfSaveOptions();
options.Compliance = PdfCompliance.Pdf15;
options.JpegQuality = 90;
options.PageIndex = 0;
options.PageCount = doc.PageCount;
options.ZoomBehavior = PdfZoomBehavior.FitWidth;
options.CustomPropertiesExport = PdfCustomPropertiesExport.Standard;
doc.CustomDocumentProperties.Add("CV_TextLayerGood ", true);

doc.Save(MyDir + "19.10.pdf", options);

aspose.notifier · August 13, 2020, 8:35am

The issues you have found earlier (filed as WORDSNET-19394) have been fixed in this Aspose.Words for .NET 20.8 update and this Aspose.Words for Java 20.8 update.