@hasanirmak
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Hi Sergei
I have checked it with the variant you suggested and get the following results:
Aspose PDF version 24.7: I get 11 out of 107 documents as PDF_A_2A valid.
With Aspose PDF version 24.10: I get 11 out of 107 documents as PDF_A_2A valid.
I can see that there are various errors in the xml report, but with ConvertErrorAction.Delete it doesn’t seem to have been able to fix or delete them, and there are still very few PDF_A_2A valid documents to be seen.
Do you have any other suggestions to get better results.
A possible reason may be the lack of fonts. In short: the original document may not describe any of the fonts used, but the PDF/A document must have a font description.
You can see that this is happening in the reports. This situation (lack of font description in the document) corresponds to lines like:
<Problem Severity="Error" Clause="6.3.4" ObjectID="1151" Page="32" Convertable="False">Font 'ArialMT' is not embedded</Problem>
If Convertable=“True” the font description was found, if false, then no.
You can see ways to solve this problem in one of my answers in this forum thread. PDF conversion loosing data - #8
(where I provide the development team’s answer)
@hasanirmak
Sorry, I forgot that this is a private topic.
I will also provide the response from the development team here.
The source document has two problems that prevent its conversion to PDF/A-2b. First, it has XMP metadata associated with its pages, that contains some non-standard properties. This problem will be addressed in the 24.4 version, the pages metadata will now be updated to contain only standard entries.
Second, the document doesn’t contain a definition for the font “MinionPro-Regular”, therefore it isn’t possible to make a valid PDF/A document unless the missing font or its substitution is provided. To create a valid PDF/A-2b document beginning with the 24.4 version the customer may use one of the following techniques:
Use a default substitution font instead (for all text that uses MinionPro-Regular, the font will be changed to the Times New Roman):
var tempDoc = new Aspose.Pdf.Document(dataDir + "Produksjonsformat.pdf");
tempDoc.Form.Type = Aspose.Pdf.Forms.FormType.Standard;
var options = new PdfFormatConversionOptions(PdfFormat.PDF_A_2B);
// Replace the inaccessible MinionPro-Regular with the default substitution font (Times New Roman)
options.FontEmbeddingOptions.UseDefaultSubstitution = true;
tempDoc.Convert(options);
if (!tempDoc.Validate(new MemoryStream(), PdfFormat.PDF_A_2B))
Console.WriteLine("not validate");
tempDoc.Save(dataDir + "Produksjonsformat-converted.pdf");
// Replace the inaccessible MinionPro-Regular with the user chosen font
FontRepository.Substitutions.Add(new SimpleFontSubstitution("MinionPro-Regular", "Arial"));
var tempDoc = new Aspose.Pdf.Document(dataDir + "Produksjonsformat.pdf");
tempDoc.Form.Type = Aspose.Pdf.Forms.FormType.Standard;
var options = new PdfFormatConversionOptions(PdfFormat.PDF_A_2B);
tempDoc.Convert(options);
if (!tempDoc.Validate(new MemoryStream(), PdfFormat.PDF_A_2B))
Console.WriteLine("not validate");
tempDoc.Save(dataDir + "Produksjonsformat-converted.pdf");
Provide the external font definition for the MinionPro-Regular font if it’s not installed in the system:
// Add the folder containing the MinionPro-Regular font definition to the list of font sources
FontRepository.Sources.Add(new FolderFontSource("path_to_the_folder_with_the_font"));
var tempDoc = new Aspose.Pdf.Document(dataDir + "Produksjonsformat.pdf");
tempDoc.Form.Type = Aspose.Pdf.Forms.FormType.Standard;
var options = new PdfFormatConversionOptions(PdfFormat.PDF_A_2B);
tempDoc.Convert(options);
if (!tempDoc.Validate(new MemoryStream(), PdfFormat.PDF_A_2B))
Console.WriteLine("not validate");
tempDoc.Save(dataDir + "Produksjonsformat-converted.pdf");
i have also searched for “PDF conversion loosing data” in the forum and got one relevant, which tries to manipulate the document in order to make it pdf_a valid
in our case the document cant be manipulated and it should be converted as it is
I have used the suggestions regarding font insertion and use the following code and have been able to validate a large number of them in pdf/a_2a.
FontRepository.Sources.Add(new FolderFontSource(ConfigHelper.FontPath));
foreach (var font in ConfigHelper.ListofFonts)
{
FontRepository.Substitutions.Add(new SimpleFontSubstitution(font.Key, font.Value)); //the font.value for all of the fonts is for now Arial
}
var temp_stream = new MemoryStream();
var result = new ConvertedResult();
try
{
var documentStream = org_doc.InMemoryStream;
temp_doc = new Document(documentStream);
temp_doc.Form.Type = AsposeForm.FormType.Standard;
var options = new PdfFormatConversionOptions(settings.PdfSaveSettings.PdfFormat);
options.FontEmbeddingOptions.UseDefaultSubstitution = true;
options.ErrorAction = ConvertErrorAction.Delete;
options.LogFileName = $"{org_doc.OutPutPath}\\{org_doc.Name}_report.xml";
bool isValidated = temp_doc.Validate(new MemoryStream(), settings.PdfSaveSettings.PdfFormat);
int count=0;
while (isValidated == false && count < 2)
{
temp_doc.Convert(options);
//temp_doc.Convert($"{org_doc.OutPutPath}\\{org_doc.Name}_report.xml", settings.PdfSaveSettings.PdfFormat, ConvertErrorAction.Delete);
isValidated = temp_doc.Validate(new MemoryStream(), settings.PdfSaveSettings.PdfFormat);
count++;
}
using (var resultPdfStream = new MemoryStream())
{
temp_doc.Save(resultPdfStream);
var valid_Message = "";
isValidated = temp_doc.Validate(new MemoryStream(), settings.PdfSaveSettings.PdfFormat);
if (isValidated)
{
valid_Message = $" It is {settings.PdfSaveSettings.PdfFormat} valid after {count} iterations.";
RunConvert.pdfAValid++;
}
else
{
valid_Message = $" It is not valid after {count} iterations.";
}
result.ConvertedDocument = resultPdfStream.ToArray();
result.Result = Result.Successful;
result.Message = $"PDF Conversion was successful. {valid_Message}";
}
For some of them, however, it is not valid the first time, and I have to pass it through the converter twice until it is finally valid.
I have converted 107 documents → 3 of them are not converted at all and generate errors (see error file), some of them are converted despite font adjustment, but are not pdf/a_2a valid (see report file).
Now the question arises based on your example, why i could not get a valid pdf/a_2a the first time, and only after the second conversion (e.g. 31 from 92 docs were affected).
How do I manage to bring the remaining (not valid) into a valid state, including those where there is a permission problem?
@hasanirmak
I have already completed the work this week and unfortunately I will not be able to answer today. I will study the issue on Monday and write to you.
@hasanirmak
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): PDFNET-58481
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
@hasanirmak
I checked the conversion of those three documents that are in the Errors archive.
I checked in Windows, in the .Net 6 project, Aspose.Pdf library version 24.10. The code was used:
For two files, the result passes Adobe Preflight validation, which is a model for us. For the document that does not pass the check after conversion (“02. OpenText Archive Server 10.1.1 Storage Platforms Release Notes.pdf”), I created a task for the PDFNET-58481 development team. For the conversion, I used only one pass.
Please check whether the given code works properly in your environment (for two documents that I converted).
we are using .Net Framework 4.8 with Aspose.Pdf library version 24.7 and 24.10, can you please check this as well and see if it is reproducible.
We can’t switch the whole project to Core in a hurry.
However, in this case, when I created an empty .Net Framework 4.8 project and connected the library with Nuget converting of the mentioned 2 documents was successful.
I will attach them to make sure that we are talking about the same documents. 01. OpenText Archive Server 10.1.1 Release Notes.pdf (194.2 KB)
I habe now updated the solution to 25.1 using the recomended
Aspose.PDF for .NET Framework 4.0 25.1 (DLLs only) , but i still get Error’s with some pdf files including the one mentioned in your answer.
The Error Message message occurs while converting and results in exception error :
{“Unable to cast object of type ‘#=zjtuHkLJyRmgrxygqkbXXLDRbFimV’ to type ‘#=zlb5r82XT9HdpaZOfEWz57niCk3MG’.”}|System.InvalidCastException|
Here is the code i use to convert and save the document:
public static ConvertedResult ConvertDirectPathToPDFA(BaseDocument org_doc, AsposeSettings settings)
{
var result = new ConvertedResult();
try
{
var doc = new Document(org_doc.FullPath);
doc.Convert($"{org_doc.OutPutPath}\\{org_doc.Name}_report.xml", settings.PdfSaveSettings.PdfFormat, settings.PdfASaveSettings.ErrorAction);
bool isValidated = doc.Validate(new MemoryStream(), settings.PdfSaveSettings.PdfFormat);
var temp_doc = org_doc.OutPutPath + "\\" + org_doc.Name + ".pdf" ;
var valid_Message = "";
if (isValidated)
{
valid_Message = $" It is {settings.PdfSaveSettings.PdfFormat} valid.";
RunConvert.pdfAValid++;
}
else
{
valid_Message = $" It is not valid.";
}
doc.Save(temp_doc);
result.ConvertedDocument = GetMemoryStream(temp_doc).ToArray();
result.Result = Result.Successful;
result.Message = $"PDF Conversion was successful. {valid_Message}";
}
catch (Exception ex)
{
result.Result = Result.Error;
result.Message = "ConvertToPDF was not successful";
Logger.Error(ex);
}
return result;
}
i 've got the following error message while converting:
System.InvalidCastException: ‘Unable to cast object of type ‘#=zjtuHkLJyRmgrxygqkbXXLDRbFimV’ to type ‘#=zlb5r82XT9HdpaZOfEWz57niCk3MG’.’
i 've got the following error message:
System.InvalidCastException: ‘Unable to cast object of type ‘#=zjtuHkLJyRmgrxygqkbXXLDRbFimV’ to type ‘#=zlb5r82XT9HdpaZOfEWz57niCk3MG’.’
i have also realised that there are valid result differences depending on whether i validate it after the conversion or after save.
using after converting
var doc = new Document(org_doc.FullPath);
doc.Convert($"{org_doc.OutPutPath}\\{org_doc.Name}_report.xml", settings.PdfSaveSettings.PdfFormat, settings.PdfASaveSettings.ErrorAction);
bool isValidated = doc.Validate(new MemoryStream(), settings.PdfSaveSettings.PdfFormat);
var temp_doc = org_doc.OutPutPath + "\\" + org_doc.Name + ".pdf" ;
var valid_Message = "";
if (isValidated)
{
valid_Message = $" It is {settings.PdfSaveSettings.PdfFormat} valid.";
RunConvert.pdfAValid++;
}
else
{
valid_Message = $" It is not valid.";
}
doc.Save(temp_doc);
result.ConvertedDocument = GetMemoryStream(temp_doc).ToArray();
result.Result = Result.Successful;
result.Message = $"PDF Conversion was successful. {valid_Message}";
using after doc.Save
var doc = new Document(org_doc.FullPath);
doc.Convert($"{org_doc.OutPutPath}\\{org_doc.Name}_report.xml", settings.PdfSaveSettings.PdfFormat, settings.PdfASaveSettings.ErrorAction);
var temp_doc = org_doc.OutPutPath + "\\" + org_doc.Name + ".pdf" ;
doc.Save(temp_doc);
bool isValidated = doc.Validate(new MemoryStream(), settings.PdfSaveSettings.PdfFormat);
var valid_Message = "";
if (isValidated)
{
valid_Message = $" It is {settings.PdfSaveSettings.PdfFormat} valid.";
RunConvert.pdfAValid++;
}
else
{
valid_Message = $" It is not valid.";
}
result.ConvertedDocument = GetMemoryStream(temp_doc).ToArray();
result.Result = Result.Successful;
result.Message = $"PDF Conversion was successful. {valid_Message}";
converting results looks like this for case 1 and 2 with the same two documents:
case 1:
Sucessful=> File Name: 00-Submitting Indexer sizing requests - iManage Support.pdf -Message: PDF Conversion was successful. It is PDF_A_2A valid. -Time: 13 seconds.
Sucessful=> File Name: 00-Work 10 Indexer Powered by RAVN - Study Guide.pdf -Message: PDF Conversion was successful. It is not valid. -Time: 1 seconds.
case 2:
Sucessful=> File Name: 00-Submitting Indexer sizing requests - iManage Support.pdf -Message: PDF Conversion was successful. It is not valid. -Time: 14 seconds.
Sucessful=> File Name: 00-Work 10 Indexer Powered by RAVN - Study Guide.pdf -Message: PDF Conversion was successful. It is PDF_A_2A valid. -Time: 2 seconds.
why do i get differences and what makes more sense, where should i validate it, only after converting?
In many cases, the complete formation of a PDF document is performed when calling the Save() method - therefore, the result obtained after calling Save() can be considered more accurate.
For performance, you can use saving to memory.
doc.Save(new MemoryStream());
Regarding the already created tasks, nothing new, unfortunately.
Regarding your post about throwing an exception during conversion - please create a request in a separate topic so as not to overload the current one.
Document.Validate() should be used after calling Document.Save(). This is not obvious and should be mentoned in the documentation and possibly taken into account during further changes to the library.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.