Hello,
I have problems converting PDF-files to PDF/A3 B. Éspecially the formatting of tables is often weird. Sometimes a word/words of a sentence is/are moved up or down or moved over other text. Somestimes hyperlinks are shown as a blue block - it seems that the color of the text is now the background-color as well.
This happens mostly with HTML- as well as Word-files that are converted into PDF (Aspose.Words, which workes fine).
.NET 3.5, tested with 11.4.0
Hi Bea,bea.grosse-venhaus:
I have problems converting PDF-files to PDF/A3 B. Éspecially the formatting of tables is often weird. Sometimes a word/words of a sentence is/are moved up or down or moved over other text.
as PDFNEWNET-40443 in our issue tracking system. We will
further look into the details of this problem and will keep you updated on the
status of correction. Please be patient and spare us little time. We are sorry
for this inconvenience.
bea.grosse-venhaus:Somestimes hyperlinks are shown as a blue block - it seems that the color of the text is now the background-color as well.I have also tried converting EMail_2016-145.html file to PDF/A_3b format and I am unable to notice any issue. For your reference, I have also attached the output generated over my end.
This happens mostly with HTML- as well as Word-files that are converted into PDF (Aspose.Words, which workes fine).
Aspose.Words.Document worddoc = new Aspose.Words.Document(@"C:\pdftest\EMail_2016-145.html", new Aspose.Words.LoadOptions( Aspose.Words.LoadFormat.Html, "",""));
worddoc.Save(@"C:\pdftest\EMail_2016-145.pdf", Aspose.Words.SaveFormat.Pdf);
var pdfDocument = new Document(@"C:\pdftest\EMail_2016-145.pdf");
pdfDocument.Convert(@"C:\pdftest\Besprechung_Source.txt", PdfFormat.PDF_A_3B, ConvertErrorAction.Delete);
pdfDocument.Save(@“C:\pdftest\EMail_2016-145_PDF_A_3b.pdf”);
Thank you very much for your fast reply.
private byte[] ConvertHtmlToPdf(byte[] fileBytes, SPWeb referateWeb, string folderUrl)
{
using(MemoryStream memoryStream = new MemoryStream(fileBytes))
{
Aspose.Pdf.HtmlLoadOptions htmlOptions = new Aspose.Pdf.HtmlLoadOptions(folderUrl);
htmlOptions.CustomLoaderOfExternalResources = uri =>
{
string fileNameSrc = uri.Replace(“cid:”, “”);
fileNameSrc = fileNameSrc.Substring(0, fileNameSrc.IndexOf(’@’));
//referateWeb is a SPWeb-object
SPFile file = referateWeb.GetFile(Path.Combine(folderUrl, fileNameSrc));
if(file.Exists)
{
FilesToDelete.Add(file);
LoadOptions.ResourceLoadingResult result = new LoadOptions.ResourceLoadingResult(file.OpenBinary());
return result;
}
return new LoadOptions.ResourceLoadingResult(new byte[] { });
};
Document htmlDocument = new Document(memoryStream, htmlOptions) { IgnoreCorruptedObjects = true };
htmlDocument.OptimizeResources(Document.OptimizationOptions.All());<span style="color:blue;">using</span>(<span style="color:#2b91af;">MemoryStream</span> documentStream = <span style="color:blue;">new</span> <span style="color:#2b91af;">MemoryStream</span>()) { htmlDocument.Save(documentStream); <span style="color:blue;">return</span> documentStream.ToArray(); } } }</pre></div>
Hi Bea,
Hi Bea,
bea.grosse-venhaus:To sum it up:Besides the formatting problems that occur when converting DOCX files to PDF/A 3B (and you confirmed), we also habe problems when converting RTF and HTML files to PDF/A 3B. These files are created within an Outlook-AddIn, saved in a SharePoint library und afterwards converted.The formattings problem occur in the DOCX, RTF as well as HTML file.Images in RTF files can get blown up.The process is as follows :RTF-E-Mail-> RTF-file-> PDF (shared class file)-> PDF/A 3B (shared class file) Formatting is wrong, pictures can get blown upHi Bea,Thanks for your patience. I have tested the scenario using EMail_2016-78.rtf file and have managed to reproduce above stated issues. For the sake of correction, I have logged it as PDFNEWNET-40523 in our issue tracking system.Ihave tested the scenario and have observed that some special characters are rendering in PDF file when converting HTML file to PDF format using Aspose.Words, so I have intimated my fellow workers from respective team to further look into this matter and reply accordingly.bea.grosse-venhaus:HTML-E-Mail-> HTML-file of body-> PDF (with previous method I shared) Formatting is wrong (espacially in tables), hyperlinks can be blue-> PDF/A 3B (shared class file)I have also noticed a file Ursprungsmail WG Neue Pressemitteilung - Martin Lohse wird neuer Wissenschaftlicher Vorstand des MDC_PDF3B008.pdf in earlier attachments but its source file is missing. Can you please share the input file and some details regarding the issues which you are facing for this document, so that we can further look into this scenario. We are sorry for this inconvenience.
bea.grosse-venhaus:HTML-E-Mail-> HTML-file of body-> PDF (with previous method I shared) Formatting is wrong (espacially in tables), hyperlinks can be blue-> PDF/A 3B (shared class file)Hi Bea,In a separate attempt of converting HTML file to PDF/A_3b format, the text at bottom of file in resultant file is garbled (characters are overlapping). For the
sake of correction, I have logged this problem as PDFNEWNET-40524 in
our issue tracking system. We will further look into the details of this
problem and will keep you updated on the status of correction. Please be
patient and spare us little time. We are sorry for this inconvenience.
Hi Bea,
Thanks for your reply. The customer uses MS Office 2010, I don’t know if this makes the difference…
Next problem: I just tried to convert a MSG-file to PDF with the newest DLL’s but now the Concatenate()-method does not work anymore, the “target”-MemoryStream is 0 bytes (it works with PDF 11.4.0 .Net 3.5) - I use the method to create one PDF for the E-Mail message ansd all attached files.
AND…also in PDF douments some pictures get blown up…
Hi Bea,
Can you please also share the input email (MSG) file and your complete code to reproduce this issue at our end?
Best Regards,
I already did that when I started this thread
Hi Bea,bea.grosse-venhaus:AND...also in PDF douments some pictures get blown up...
The issues you have found earlier (filed as PDFNET-40524,PDFNET-40638) have been fixed in Aspose.PDF for .NET 19.10.
The issues you have found earlier (filed as WORDSNET-13352) have been fixed in this Aspose.Words for .NET 23.8 update also available on NuGet.