Loading document throws Startxref not found exception

We have the exact same issue. We are evaluating Aspose.PDF as a possible replacement for ActivePDF DocConverter but struggling. We can load PDFs and export PDFs to images, but we cannot convert any other type of document to a PDF.

We have tried .doc, .docx, .xslx, .png all with the same result.

var pdfDoc = new Aspose.Pdf.Document(tempStoragePath);

We don’t get any further than that. That path is valid, the file exists and it fails both on the server and when attempting to use it locally. We’ve tried both loading via path and loading via stream and both produce the Startxref error.

Hi Dustin,


Thanks for using our API’s.

Can you please share some details regarding the issue you are facing. Are you getting any exception while rendering PDF document to other format or the resultant output is not correct.

Furthermore, please share if you are facing issue while manipulating each document or its appearing for certain set of file/files. In order to further investigate the issues you are facing, please share the resource files, so that we can test the scenario in our environment. We are sorry for this inconvenience.
Hi Dustin,

dustinhorne:
We have the exact same issue. We are evaluating Aspose.PDF as a possible replacement for ActivePDF DocConverter but struggling. We can load PDFs and export PDFs to images, but we cannot convert any other type of document to a PDF.

We have tried .doc, .docx, .xslx, .png all with the same result.

var pdfDoc = new Aspose.Pdf.Document(tempStoragePath);

We don’t get any further than that. That path is valid, the file exists and it fails both on the server and when attempting to use it locally. We’ve tried both loading via path and loading via stream and both produce the Startxref error.


In addition to above reply. Please note generally Aspose.Pdf for .NET throws Startxref not found exception when it encounter not supported file format. Please note Aspose.Pdf for .NET does not support doc, docx, xls and xsl file format as input. For PNG(image) to PDF conversion please check this documentation link for details.

Moreover, to manipulate doc/docx file format you need to use Aspose.Words for .NET and use Aspose.Cells for .NET for xls/xlsx file format.


Please feel free to contact us for any further assistance.

Best Regards,

You’re correct, we realized late Friday afternoon that we needed Words and Cells to do the document conversion and though some fenangling we were able to get the image to convert as we realized that it doesn’t actually do conversion at all, but rather you have to create a pdf document and insert the image into it (this is one place Active PDF’s DocConverter shines is that it does true conversions, its only drawback is poor quality image extraction of a single page).

We also had a little difficulty with inconsistent implementation as we had inadvertently gotten onto the Words documentation. With words you can simply pass in the path to a plain text file into the Document object to create it, while you have to add it as a section with Aspose .PDF. With both classes being called “Document” it lead to confusion with different developers looking at the documentation.

dustinhorne:
You’re correct, we realized late Friday afternoon that we needed Words and Cells to do the document conversion and though some fenangling we were able to get the image to convert as we realized that it doesn’t actually do conversion at all, but rather you have to create a pdf document and insert the image into it
Hi,

Thanks for sharing the details.

Do you mean you are facing any issue while converting Image files to PDF format ? If so is the case, then please share the resource file, so that we can test the conversion in our environment.

dustinhorne:
We also had a little difficulty with inconsistent implementation as we had inadvertently gotten onto the Words documentation. With words you can simply pass in the path to a plain text file into the Document object to create it, while you have to add it as a section with Aspose .PDF. With both classes being called “Document” it lead to confusion with different developers looking at the documentation.
Yes you are correct. Aspose.Pdf and Aspose.Words both have Document class and in case you are using both API’s, you need to provide complete class reference with namespace i.e. Aspose.Pdf.Document.

Hi Dustin,

Thanks for your feedback and please ignore image related comments of Nayyer. We are sorry for the confusion, yes you are correct we need to create a new PDF and add image into Page paragraph for image conversion.

Moreover, It is recommended to use Aspose.Pdf(new generator), it can be used for both creating a PDF from scratch or manipulate existing PDF document. It is more improved and efficient as compared to old generator. Please note same documentation works if you want to create PDF form scratch. You need to start with Document() overload without a parameter and create a page, page is equivalent to section object of old generator, and proceed as per your requirements. Please check sample code for reference.

// Load source PDF document
Aspose.Pdf.Document doc = new Aspose.Pdf.Document();
doc.Pages.Add();

// Initializes a new instance of the Table
Aspose.Pdf.Table table1 = new Aspose.Pdf.Table();
table1.Left = 100;

// Set the table border color as LightGray
table1.Border = new Aspose.Pdf.BorderInfo(Aspose.Pdf.BorderSide.All, .5f, Aspose.Pdf.Color.FromRgb(System.Drawing.Color.LightGray));

// set the border for table cells
table1.DefaultCellBorder = new Aspose.Pdf.BorderInfo(Aspose.Pdf.BorderSide.All, .5f, Aspose.Pdf.Color.FromRgb(System.Drawing.Color.LightGray));

// create a loop to add 10 rows
for (int row_count = 1; row_count < 10; row_count++)
{
    // add row to table
    Aspose.Pdf.Row row = table1.Rows.Add();

    // add table cells
    row.Cells.Add("Column (" + row_count + ", 1)");
    row.Cells.Add("Column (" + row_count + ", 2)");
    row.Cells.Add("Column (" + row_count + ", 3)");
}

// Initializes a new instance of the Table
Aspose.Pdf.Table table = new Aspose.Pdf.Table();
table.Left = 200;
table.Top = 400;

// Set the table border color as LightGray
table.Border = new Aspose.Pdf.BorderInfo(Aspose.Pdf.BorderSide.All, .5f, Aspose.Pdf.Color.FromRgb(System.Drawing.Color.LightGray));

// set the border for table cells
table.DefaultCellBorder = new Aspose.Pdf.BorderInfo(Aspose.Pdf.BorderSide.All, .5f, Aspose.Pdf.Color.FromRgb(System.Drawing.Color.LightGray));

// create a loop to add 10 rows
for (int row_count = 1; row_count < 10; row_count++)
{
    // add row to table
    Aspose.Pdf.Row row = table.Rows.Add();

    // add table cells
    row.Cells.Add("Column (" + row_count + ", 1)");
    row.Cells.Add("Column (" + row_count + ", 2)");
    row.Cells.Add("Column (" + row_count + ", 3)");
}

// Add table object to the first page of the input document
doc.Pages[1].Paragraphs.Add(table1);
doc.Pages[1].Paragraphs.Add(table);

// Save the updated document containing table objects
doc.Save(myDir + "document_with_table.pdf");

Please feel free to contact us for any further assistance.

Best Regards,

Hello Experts!

I came to this old thread as searching for the same issue!

Question: Is it possible to create a PDF file from a HTML template using Aspose? If yes, what could be the reason for this exception.

Note: The same HTML file successfully generates a PDF with other component we are exploring!

Thanks.

@samCDAY

Would you kindly share your sample HTML with us in ZIP Format. We will test the scenario in our environment and address it accordingly.

reviewLetter.zip (1.4 KB)

@samCDAY

We were unable to notice any issue while using following code snippet with Aspose.PDF for .NET 20.3:

var objLoadOptions = new Aspose.Pdf.HtmlLoadOptions(dataDir);
// Set Page Margins
objLoadOptions.PageInfo.Margin = new MarginInfo(0, 0, 0, 0);
var doc = new Aspose.Pdf.Document(dataDir + "reviewLetter.html", objLoadOptions);
doc.Save(dataDir + "out20.3.pdf");

out20.3.pdf (128.7 KB)

Would you please try using latest version with shared code snippet and in case you still face any issue, please feel free to let us know.

Thanks for quick response on this. Yes the recommended code worked for file from disk!

Now, what do we need to make it work with loading from stream? We got blank file when we tried.

Thanks.

@samCDAY

Would you please share the code snippet with which you are converting HTML to PDF using streams. We will test the scenario in our environment and address it accordingly.

The original problem was that we I did not set HtmlLoadOptions, it is working by given it:

public byte[] GenerateByFile(string fileName)
{
var objLoadOptions = new Aspose.Pdf.HtmlLoadOptions(fileName);
// Set Page Margins
objLoadOptions.PageInfo.Margin = new Aspose.Pdf.MarginInfo(0, 0, 0, 0);
Document pdfDocument = new Document(fileName, objLoadOptions);

        var outStream = new MemoryStream();

        pdfDocument.Save(outStream);
        var bytes = outStream.ToArray();
        return bytes;
    }

However, for our production requirement we have stream/data generated on the fly and in memory. The below code did not work when we tried to load it from stream. We understand the HtmlLoadOptions can load from file only and there must be another way.

public byte[] Generate(string bodyHtml)
{
var stream = new MemoryStream();
var writer = new StreamWriter(stream);
writer.Write(bodyHtml);
writer.Flush();
var objLoadOptions = new Aspose.Pdf.HtmlLoadOptions();
// Set Page Margins
objLoadOptions.PageInfo.Margin = new Aspose.Pdf.MarginInfo(0, 0, 0, 0);
Document pdfDocument = new Document(stream, objLoadOptions);

        var outStream = new MemoryStream();

        pdfDocument.Save(outStream);
        var bytes = outStream.ToArray();
        return bytes;

}
//This will result a blank file.

@samCDAY

We tested the scenario using streams as well and were unable to notice the issue. Please check the code snippet that we used:

StringBuilder htmlPage = new StringBuilder();
htmlPage.Append(File.ReadAllText(dataDir + "reviewLetter.html"));
byte[] bytes = Encoding.UTF8.GetBytes(htmlPage.ToString());
var streamHtml = new MemoryStream(bytes);
var objLoadOptions = new Aspose.Pdf.HtmlLoadOptions();
// Set Page Margins
objLoadOptions.PageInfo.Margin = new MarginInfo(0, 0, 0, 0);
var doc = new Aspose.Pdf.Document(streamHtml, objLoadOptions);
doc.Save(dataDir + "out20.3.pdf");

The HtmlLoadOptions takes an argument of the path where those resources are located which are being used in HTML content e.g. images.

Would you kindly share a sample console application which is able to replicate the issue. We will again test the scenario in our environment and address it accordingly.

Thanks for prompt responses and inputs!

The recommended code snippet worked.

Thanks!

1 Like

@samCDAY

It is good to know that given suggestions worked for you. Please keep using our API and in case you need further assistance, please feel free to ask.

Hi,

Please find attached the sample html and pdf generated from it. Why do we see the gray background above the name “Alexander Burgess TEST!” Is it because of the unlicensed version or due to some css/html not supported? Is there a list of non-supported css attributes?Documents.zip (645.5 KB)

Thanks.

@samCDAY

We were able to notice the background issue even with licensed version of the API. The issue seems related to positioning of DIV elements in HTML. However, we need to investigate it further to determine the reasons of issue and rectify them. For the purpose, we have logged an issue as PDFNET-47933 in our issue tracking system. We will look into details of it and keep you posted with the status of its resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.

Thanks for the information Asad. Looking forward to the solution.

Hi Asad,

Any update on the issue? I am attaching latest version of the HTML which has all use cases and generated pdf. Can you please take a look and provide information on what are the chances of addressing all challenges?

Appreciate the efforts put in by the team to help us finalize our choice for the component.

Regards.

AsposeIssues9Apr2020.zip (1.3 MB)