Conversion of doc- xls- ppt and pdf to pdf/a

Hi,


I would like to check how to i convert doc, xls, ppt, html and pdf to pdf/a?

do i have to do it step by step?

for example,

convert doc into pdf 1st

after convert to pdf/a?

if so what function should i use?

Just an update.


I have tried to convert doc and xls to pdf/a without any issue.

But when i tried to convert ppt to pdf/a compliance using Aspose.Slide, all the validator that i used stated that the document does not conform to pdf/a standards.

the same for converting HTML to pdf/a.

code use for ppt conversion is
//Instantiate a Presentation object that represents a presentation file
Presentation pres = new Presentation(DATA_DIR + “sample.pptx”);
Aspose.Slides.Export.PdfOptions opts = new Aspose.Slides.Export.PdfOptions();

opts.EmbedFullFonts = true;

opts.Compliance = Aspose.Slides.Export.PdfCompliance.PdfA1b;
//Save the presentation to PDF with default options
pres.Save(DATA_DIR + “ConvertedPPT.pdf”, Aspose.Slides.Export.SaveFormat.Pdf, opts);


Code for for html conversion is as such
HtmlLoadOptions htmlLoadOptions = new HtmlLoadOptions();
htmlLoadOptions.PageInfo.Margin.Bottom = 10;
htmlLoadOptions.PageInfo.Margin.Top = 20;
htmlLoadOptions.PageInfo.Width = 1500f;
string logFile = “log_” + DateTime.Now + “.xml”;
Aspose.Pdf.Document doc = new Aspose.Pdf.Document(new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)), htmlLoadOptions);
doc.Convert(“log.xml”, PdfFormat.PDF_A_1B, ConvertErrorAction.Delete);
//Save the pdf document
doc.Save(DATA_DIR + “ConvertedHTML.pdf”);

i have uploaded the xml for the conversion output for the ppt and pdf to pdf/a


Hi,

i have upload a test result for using this validator to validate the converted file

please note that the error for converting html string to pdf/a and pdf to pdf/a is the same as shown in pdfError.txt

i have also additionally tried to convert an existing pdf file into pdf/a compliance which i met with the same error as trying to convert the html string to pdf/a compliance.

i have uploaded the pdf file used

Hi Benjamin,


Thanks for your interest in Aspose. I have tested your shared code for HTML to PDFA conversion using Aspose.Pdf for .NET 11.6.0 and verified the PDFA compliance of resultant PDFA file with Adobe Preflight tool without any issue. Please find attached output PDFA for reference.

In reference to PPT to PDFA conversion, you may try this with the collaboration of Aspose.Slides and Aspose.Pdf. In first step convert PPT to PDF using Aspose.Slides and later convert the resultant PDF to PDFA using Aspose.Pdf. Hopefully it will help you to accomplish the task.

Furthermore, please note we follow Adobe Preflight for validating PDF/A conformance. As.all tools on the market have their own “representation” of PDF/A conformance. Please check this article on PDF/A validation tools for reference. We chose Adobe products for verifying how Aspose.Pdf produces PDF files because Adobe is at the center of everything connected to PDF.

Please feel free to contact us for any further assistance.

Best Regards,

Hi Tilal,


Thanks for replying. i have tried the 2 step way for ppt. now there is the following error

Validating file “output.pdf” for conformance level pdfa-1b
xmp:CreateDate :: Wrong value type. Expected type ‘Date’.
xmp:ModifyDate :: Wrong value type. Expected type ‘Date’.
dc:creator :: Wrong value type. Expected type ‘seq’.
dc:title :: Wrong value type. Expected type ‘lang alt’.
dc:description :: Wrong value type. Expected type ‘lang alt’.
The required XMP property ‘pdfaid:part’ is missing.
The required XMP property ‘pdfaid:conformance’ is missing.
The XMP property ‘xmp:CreatorTool’ is not synchronized with the document information entry ‘Creator’.
The XMP property ‘xmp:CreateDate’ is not synchronized with the document information entry ‘CreationDate’.
The XMP property ‘xmp:ModifyDate’ is not synchronized with the document information entry ‘ModDate’.
The XMP property ‘pdf:Producer’ is not synchronized with the document information entry ‘Producer’.
The value of the key N is 4 but must be 3.
The value of the key SMask is an image but must be None. (2)
The document does not conform to the requested standard.
The document doesn’t conform to the PDF reference (missing required entries, wrong value types, etc.).
The document contains transparency.
The document’s meta data is either missing or inconsistent or corrupt.
Done.

Code snippet used:

//Instantiate a Presentation object that represents a presentation file
Presentation pres = new Presentation(DATA_DIR + "sample.pptx");

Aspose.Slides.Export.PdfOptions opts = new Aspose.Slides.Export.PdfOptions();

opts.EmbedFullFonts = true; opts.EmbedTrueTypeFontsForASCII = true;

opts.TextCompression = Aspose.Slides.Export.PdfTextCompression.None;
//Save the presentation to PDF with default options
pres.Save(DATA_DIR + "ConvertedNewPPT.pdf", Aspose.Slides.Export.SaveFormat.Pdf, opts);

Aspose.Pdf.Document pptDocument = new Aspose.Pdf.Document(DATA_DIR + "ConvertedNewPPT.pdf");

pptDocument.Convert(DATA_DIR + "logPPPT.xml", PdfFormat.PDF_A_1B,
ConvertErrorAction.Delete);
pptDocument.OptimizeResources(new Aspose.Pdf.Document.OptimizationOptions() {
RemoveUnusedObjects = true,
RemoveUnusedStreams = true,
AllowReusePageContent = true,
CompressImages = true
});
// Save output document
pptDocument.Save(DATA_DIR + "output.pdf");


And when i tried to open the supposely converted pdf output.pdf, my Adobe Reader detect it as normal pdf.

i have attached the xml file which is the log file for this conversion

And there is an option during ppt conversion to pdf to set compliance to pdf/a. Would suggest your development team to try fix the issue so that conversion from ppt/pptx to pdf/a would be simpler and cleaner

Hi,


i got hold of pc with adobe pro dc and did a preflight… and the result still failed. refer to attached image…

Code snippet used:
Aspose.Slides.License license = new Aspose.Slides.License();

//Pass only the name of the license file embedded in the assembly
license.SetLicense(“Aspose.Total.lic”);

Aspose.Pdf.License pdflicense = new Aspose.Pdf.License();
pdflicense.SetLicense(“Aspose.Total.lic”);
pdflicense.Embedded = true;

//Instantiate a Presentation object that represents a presentation file
Presentation pres = new Presentation(DATA_DIR + “sample.pptx”);

//Save the presentation to PDF with default options
pres.Save(DATA_DIR + “ConvertedNewPPT.pdf”, Aspose.Slides.Export.SaveFormat.Pdf);

Aspose.Pdf.Document pptDocument = new Aspose.Pdf.Document(DATA_DIR + “ConvertedNewPPT.pdf”);
pptDocument.Convert(DATA_DIR + “logPPPT.xml”, PdfFormat.PDF_A_1B, ConvertErrorAction.Delete);
pptDocument.Save(DATA_DIR + “ConvertedNewPPT.pdf”);

further more i tested using the following old code for conversion of pptx to pdf/a


refer to attached image for test result

Code snippet:
Aspose.Slides.License license = new Aspose.Slides.License();

//Pass only the name of the license file embedded in the assembly
license.SetLicense(“Aspose.Total.lic”);

//Instantiate a Presentation object that represents a presentation file
Presentation pres = new Presentation(DATA_DIR + “sample.pptx”);
Aspose.Slides.Export.PdfOptions opts = new Aspose.Slides.Export.PdfOptions();
opts.EmbedTrueTypeFontsForASCII = false;

opts.EmbedFullFonts = true;
opts.Compliance = Aspose.Slides.Export.PdfCompliance.PdfA1b;
//Save the presentation to PDF with default options
pres.Save(DATA_DIR + “ConvertedOLDPPT.pdf”, Aspose.Slides.Export.SaveFormat.Pdf, opts);
Hi Benjamin,

bczm8703:
Hi Tilal,

Thanks for replying. i have tried the 2 step way for ppt. now there is the following error

Validating file "output.pdf" for conformance level pdfa-1b
xmp:CreateDate :: Wrong value type. Expected type 'Date'.
xmp:ModifyDate :: Wrong value type. Expected type 'Date'.
dc:creator :: Wrong value type. Expected type 'seq'.
dc:title :: Wrong value type. Expected type 'lang alt'.
dc:description :: Wrong value type. Expected type 'lang alt'.
The required XMP property 'pdfaid:part' is missing.
The required XMP property 'pdfaid:conformance' is missing.
The XMP property 'xmp:CreatorTool' is not synchronized with the document information entry 'Creator'.
The XMP property 'xmp:CreateDate' is not synchronized with the document information entry 'CreationDate'.
The XMP property 'xmp:ModifyDate' is not synchronized with the document information entry 'ModDate'.
The XMP property 'pdf:Producer' is not synchronized with the document information entry 'Producer'.
The value of the key N is 4 but must be 3.
The value of the key SMask is an image but must be None. (2)
The document does not conform to the requested standard.
The document doesn't conform to the PDF reference (missing required entries, wrong value types, etc.).
The document contains transparency.
The document's meta data is either missing or inconsistent or corrupt.
Done.

Code snippet used:

//Instantiate a Presentation object that represents a presentation file
Presentation pres = new Presentation(DATA_DIR + "sample.pptx");

Aspose.Slides.Export.PdfOptions opts = new Aspose.Slides.Export.PdfOptions();

opts.EmbedFullFonts = true; opts.EmbedTrueTypeFontsForASCII = true;

opts.TextCompression = Aspose.Slides.Export.PdfTextCompression.None;
//Save the presentation to PDF with default options
pres.Save(DATA_DIR + "ConvertedNewPPT.pdf", Aspose.Slides.Export.SaveFormat.Pdf, opts);

Aspose.Pdf.Document pptDocument = new Aspose.Pdf.Document(DATA_DIR + "ConvertedNewPPT.pdf");

pptDocument.Convert(DATA_DIR + "logPPPT.xml", PdfFormat.PDF_A_1B,
ConvertErrorAction.Delete);
pptDocument.OptimizeResources(new Aspose.Pdf.Document.OptimizationOptions() {
RemoveUnusedObjects = true,
RemoveUnusedStreams = true,
AllowReusePageContent = true,
CompressImages = true
});
// Save output document
pptDocument.Save(DATA_DIR + "output.pdf");


And when i tried to open the supposely converted pdf output.pdf, my Adobe Reader detect it as normal pdf.

i have attached the xml file which is the log file for this conversion


I have tested the scenario and noticed that Aspose.Pdf convert incorrect PDFA from PDF generated from Aspose.Slides, so logged a ticket PDFNEWNET-40786 in our issue tracking system for further investigation and rectification. We will keep you updated about the issue resolution progress.

bczm8703:

And there is an option during ppt conversion to pdf to set compliance to pdf/a. Would suggest your development team to try fix the issue so that conversion from ppt/pptx to pdf/a would be simpler and cleaner

Yes you are right direct conversion issue should be investigated and fixed. We are already looking into Aspose.Slides feature of PPT(X) to PDFA and will update you our findings soon.

We are sorry for the inconvenience caused.

Best Regards,

hi. another question.


Aspose.Pdf.Document.Convert first parameter is for the location to save the log file.

i am wondering is there an option to NOT log this conversion?

Hi Benjamin,

Thanks for your inquriy. I am afraid there is no option available to convert PDF to PDFA without a log file. However if you do not want to keep the log file then you can delete it as following. Hopefully it will help you to accomplish the task.

...
documentpdf.Convert(myDir + "log.xml", PdfFormat.PDF_A_1B,
ConvertErrorAction.Delete);

if (File.Exists(myDir + "log.xml"))
{
    File.Delete(myDir + "log.xml");
}
...

Please feel free to contact us for any further assistance.

Best Regards,

bczm8703:

And there is an option during ppt conversion to pdf to set compliance to pdf/a. Would suggest your development team to try fix the issue so that conversion from ppt/pptx to pdf/a would be simpler and cleaner

Hi Benjamin,

Thanks for your patience. After initial investigation of direct conversion feature of PDF to PDFA1b in Aspose.Slides, we have logged a ticket SLIDESNET-37465 in our issue tracking system for further analysis and rectification. We will notify you as soon as it is resolved.

We are sorry for the inconvenience.

Best Regards,


Hi Benjamin,

Adding more to Tilal’s comments, you may consider using Stream object to save conversion log and then close the Steam instance.

[C#]

// load existing PDF file
Document pdfDocument = new Document("c:/pdftest/source.pdf");
pdfDocument.Convert(new MemoryStream(), PdfFormat.PDF_A_1B, ConvertErrorAction.Delete);
pdfDocument.Save("c:/ pdftest / Resultant.pdf");

The issues you have found earlier (filed as SLIDESNET-37623) have been fixed in this update.