How can i Save pdf to html not generate additional folder file?


#1
i use below c# code to convert pdf stream to html file, but it generate another folder file with this html file like below:,
<a class="attachment" href="/uploads/discourse_instance3/31426">aspose issue.jpg</a> (25.8 KB)
 
how can i let it only generate html file because convert pptx or docx to html file only generate one html file? thanks
 var data = FileProvider.DownloadFile(FileDownMess);
 MemoryStream fs = new MemoryStream();
 data[0].MemoryStream.Position = 0;
 fs.Write(data[0].MemoryStream.ToArray(), 0, data[0].MemoryStream.ToArray().Length);
 Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(fs);
 Aspose.Pdf.HtmlSaveOptions htmlOptions = new Aspose.Pdf.HtmlSaveOptions();                  
 htmlOptions.SplitIntoPages = false;
// Save the document
 string pdfPath = @"C:\AsposeTest\testPdf.html";
 pdfDocument.Save(pdfPath, htmlOptions);

#2

@Owen_Sun

Thanks for contacting support.

Please use code snippet given in following documentation article in order to achieve your requirements. In case you face any issue, please feel free to let us know.


#3

@asad.ali

thanks for your answer, it works now, btw, i also want to convert excel to html not generate additional folder file, i use below code, can’t works, still generate additional folder file, whether can implement it like pdf to html? if can, could you give me the link? thanks

Workbook excel = new Workbook(fs);
Aspose.Cells.HtmlSaveOptions opts = new Aspose.Cells.HtmlSaveOptions();
opts.ExportImagesAsBase64 = true;
string excelPath = @“C:\AsposeTest\testExcelnew.html”;
excel.Save(excelPath, opts);


#4

@Owen_Sun,

Regarding Aspose.Cells, you may add a few lines (see the lines in bold) to accomplish the task:
e.g
Sample code:

Workbook excel = new Workbook(fs);
Aspose.Cells.HtmlSaveOptions opts = new Aspose.Cells.HtmlSaveOptions();
opts.ExportImagesAsBase64 = true;
opts.ExportActiveWorksheetOnly = true;
opts.ExportSingleTab = false;
string excelPath = @“C:\AsposeTest\testExcelnew.html”;
excel.Save(excelPath, opts);

Hope, this helps a bit.


#5

@Amjad_Sahi
it works but can only export one sheet using your code, thanks


#6

@Owen_Sun,

Yes, this is what MS Excel also does. If a spreadsheet has multiple sheets and you need to render single HTML (with all resources embedded) for it, it is not possible even in MS Excel. Aspose.Cells follows Ms Excel standards and specifications in rendering Excel to HTML file format, so by default it will create folder containing the resource files against worksheets in the workbook. But you may still choose the following option/approach to accomplish the task and cope with it:

Try to export every worksheet (in the workbook) to single HTML and then group these individual HTMLs to one (final) HTML by yourselves via e.g some tag control or using your own code. In a loop, you may set active for each sheet and then render separate HTML file (based on every worksheet) via Aspose.Cells APIs. Please note, when exporting every worksheet to separate HTML, you would need to export image as base64 format (you will use HtmlSaveOptions class here) otherwise it will create folders.

Hope, this helps a bit.


#7
@Amjad_Sahi
    i solve it by another method using below code:
    Workbook excel = new Workbook(fs);
    string excelPath = @"C:\AsposeTest\testExcelnew321.mht";
    excel.Save(excelPath, Aspose.Cells.SaveFormat.MHtml); 

but another question is that when i convert docx to MHTML using below code, I found that every page's header and footer is missing, do you have any method to keep the page header and footer with content? thanks
 Aspose.Words.Document docx = new Aspose.Words.Document(fs);
 string outFn = @"C:\AsposeTest\test123.mht";
 docx.Save(outFn, Aspose.Words.SaveFormat.Mhtml);

if i convert docx to html using below code, some content's picture will miss style,can't display normally, can i have any method to fix it? thanks
                            Aspose.Words.Document docx = new Aspose.Words.Document(fs);
                            HtmlFixedSaveOptions options = new HtmlFixedSaveOptions();

                            options.PageIndex = 0;
                            options.PageCount = docx.PageCount;
                            options.ExportEmbeddedImages = true;
                            options.ExportEmbeddedCss = true;
                            options.ExportEmbeddedSvg = true;
                            options.ExportEmbeddedFonts = true;
                            options.NumeralFormat = NumeralFormat.System;
                            options.UseHighQualityRendering = true;
                            options.SaveFormat = Aspose.Words.SaveFormat.HtmlFixed;
                            options.PageHorizontalAlignment = HtmlFixedPageHorizontalAlignment.Center;
                            
                            string outFn = @"C:\AsposeTest\Test1122.html";
                            docx.Save(outFn, options);

#8

@Owen_Sun,

Good to know that you have sorted it out now. And, yes, MHTML is another option for you, it is single file format with self embedded resources in it.

Regarding your other issue for Aspose.Words API, kindly provide your sample document and output file(s) to show the issue, we will check it soon.


#9
@Amjad_Sahi

below is my sample document and generate related html:

testdocandhtml.zip (628.7 KB)

in the html Prerequisites word is not match with document's original location, it have right offset,if have multiple page with pictures, pictures will hide some part due to the offset, how can i deal with it using code according to last mentioned code? thanks

#10

@Owen_Sun

Please note that Aspose.Words mimics the behavior of MS Word. If you convert your HTML to DOCX using MS Word, you will get the same output.


#11
@tahir.manzoor
i have another problem, whether can add watermark to converted html from pdf file directly?  i try use previous add watermark to converted pdf, change to the memorystream using html stream, not work, below is my code:
     byte[] byteArrDOC = null;
                    Aspose.Pdf.Document pdfDocumentbyteArrDOC = null;
                    Aspose.Pdf.TextStamp textStampbyteArrDOC = null;
                    MemoryStream memStream = null;
 Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(fs);
                            Aspose.Pdf.HtmlSaveOptions newOptions = new Aspose.Pdf.HtmlSaveOptions();

                            //// Enable option to embed all resources inside the HTML
                            newOptions.PartsEmbeddingMode = Aspose.Pdf.HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;


                            //// This is just optimization for IE and can be omitted 
                            newOptions.LettersPositioningMethod = Aspose.Pdf.HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
                            newOptions.RasterImagesSavingMode = Aspose.Pdf.HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
                            newOptions.FontSavingMode = Aspose.Pdf.HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
                            string pdfPath = @"C:\AsposeTest\testPdf.html";

 using (FileStream fileStream = System.IO.File.OpenRead(pdfPath))
                            {
                                memStream = new MemoryStream();
                                memStream.SetLength(fileStream.Length);
                                fileStream.Read(memStream.GetBuffer(), 0, (int)fileStream.Length);
                            }

     byteArrDOC = ObjectToByteArray(memStream);
     pdfDocumentbyteArrDOC = new Aspose.Pdf.Document(new MemoryStream(byteArrDOC)); //this line have exception ("Startxref not found")

#12

@Owen_Sun

You would need to add a watermark to the source PDF document and save it to a memory stream and then convert it to HTML document with Aspose.PDF for .NET API. Please visit below documentation atrticles for your kind reference and feel free to contact us if you need any further assistance.

Working with Stamps and Watermarks


#13
@Farhan.Raza
i can use your solution to add watermark to pdf document, but if i add watermark to excel file, i convert excel file to pdf file then add watermark, save to html file, it will miss the all sheets format and only show all sheet's content one by one in one page, if i save to mht file, it will show a mess of code, could you give me an advise to solve it?
code is as below:
                            MemoryStream outputforpdf = new MemoryStream();
                            Workbook excel = new Workbook(fs);
                            excel.Save(outputforpdf, Aspose.Cells.SaveFormat.Pdf);
                            byteArrDOC = ObjectToByteArray(outputforpdf);
                            pdfDocumentbyteArrDOC = new Aspose.Pdf.Document(new MemoryStream(byteArrDOC));

                            string textStampContent = string.Format("{0}-{1}", "Aspose.Words", DateTime.Now.ToLongTimeString());
                            
                            textStampbyteArrDOC = new Aspose.Pdf.TextStamp(textStampContent);
                            //set whether stamp is background
                            
                            textStampbyteArrDOC.Background = false;

                            //set origin
                        
                            textStampbyteArrDOC.Height = 100;
                            textStampbyteArrDOC.Width = 500;
                            textStampbyteArrDOC.HorizontalAlignment = Aspose.Pdf.HorizontalAlignment.Center;
                            textStampbyteArrDOC.VerticalAlignment = Aspose.Pdf.VerticalAlignment.Center;
                            //rotate stamp
                            textStampbyteArrDOC.RotateAngle = 45;

                            //set text properties
                            textStampbyteArrDOC.TextState.Font = FontRepository.FindFont("Arial");
                            textStampbyteArrDOC.TextState.FontSize = 14.0F;
                            textStampbyteArrDOC.TextState.ForegroundColor = Aspose.Pdf.Color.Gray;
                            textStampbyteArrDOC.TextState.StrokingColor = Aspose.Pdf.Color.Gray;

                            //add stamp to particular page
                            for (var i = 1; i <= pdfDocumentbyteArrDOC.Pages.Count; i++)
                            {
                                pdfDocumentbyteArrDOC.Pages[i].AddStamp(textStampbyteArrDOC);
                            }
                            MemoryStream temOutputForPdf = new MemoryStream();
                            pdfDocumentbyteArrDOC.Save(temOutputForPdf);

                            //convert to mht
                            excel = new Workbook(temOutputForPdf);
                            string excelPath = @"C:\AsposeTest\xlswithwatermark.mht";
                            excel.Save(excelPath, Aspose.Cells.SaveFormat.MHtml);

#14

@Owen_Sun

Please elaborate the problem while sharing respective files as ZIP, along with some screenshots so that we may investigate further.


#15

@Farhan.Raza

please see the below attachment excel file and converted html file, thanks
AsposeTest.zip (253.0 KB)


#16

@Owen_Sun

Thank you for sharing the data.

We have modified the code snippet to narrow down the problem and figure out which API is causing the problem. Kindly try below code snippet and then elaborate the issue along with screenshots and expected output so that we may proceed further.

FileStream fs = new FileStream(dataDir + "testES_5.4.3_to_6.7.1_upgrade_plan.xlsx", FileMode.Open, FileAccess.Read);
MemoryStream outputforpdf = new MemoryStream();
Aspose.Cells.Workbook excel = new Aspose.Cells.Workbook(fs);
excel.Save(outputforpdf, Aspose.Cells.SaveFormat.Pdf);
var byteArrDOC = ObjectToByteArray(outputforpdf);
var pdfDocumentbyteArrDOC = new Aspose.Pdf.Document(new MemoryStream(byteArrDOC));

string textStampContent = string.Format("{0}-{1}", "Aspose.Words", DateTime.Now.ToLongTimeString());

var textStampbyteArrDOC = new Aspose.Pdf.TextStamp(textStampContent);
//set whether stamp is background
textStampbyteArrDOC.Background = false;

//set origin

textStampbyteArrDOC.Height = 100;
textStampbyteArrDOC.Width = 500;
textStampbyteArrDOC.HorizontalAlignment = Aspose.Pdf.HorizontalAlignment.Center;
textStampbyteArrDOC.VerticalAlignment = Aspose.Pdf.VerticalAlignment.Center;
//rotate stamp
textStampbyteArrDOC.RotateAngle = 45;

//set text properties
textStampbyteArrDOC.TextState.Font = FontRepository.FindFont("Arial");
textStampbyteArrDOC.TextState.FontSize = 14.0F;
textStampbyteArrDOC.TextState.ForegroundColor = Aspose.Pdf.Color.Gray;
textStampbyteArrDOC.TextState.StrokingColor = Aspose.Pdf.Color.Gray;

//add stamp to particular page
for (var i = 1; i <= pdfDocumentbyteArrDOC.Pages.Count; i++)
{
    pdfDocumentbyteArrDOC.Pages[i].AddStamp(textStampbyteArrDOC);
}
MemoryStream temOutputForPdf = new MemoryStream();
pdfDocumentbyteArrDOC.Save(temOutputForPdf);

Aspose.Pdf.Document document = new Document(temOutputForPdf);
HtmlSaveOptions options = new HtmlSaveOptions();
options.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
options.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
document.Save(dataDir + "PDF.html" , options);