We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Error while trying to convert PDF to XML

Hello,
I Have a .net core project that uses aspose.pdf version 21.7 and I’m trying to convert a pdf file to a xml file and extract its content. When I run my code on a windows environment, it works. But when I change it to a linux environment ( Amazon Linux 2 ) I keep getting a error.

Here is an example of the code that im using:
fileBytes are the pdf byte array

string newPath = System.IO.Path.Combine(workFolder, Guid.NewGuid() + “.xml”);

License license = new License();
license.SetLicense(“Aspose.Pdf.lic”);
Stream fileStream = new MemoryStream(fileBytes);
Document pdfDocument = new Document(fileStream);

// Set output format
ExcelSaveOptions option = new ExcelSaveOptions();
option.MinimizeTheNumberOfWorksheets = minimizeTheNumberOfWorksheets;
option.InsertBlankColumnAtFirst = insertBlankColumnAtFirst;
option.ScaleFactor = scaleFactor;
pdfDocument.Save(newPath, option);

string content = File.ReadAllText(newPath);
File.Delete(newPath);

And I get the following error:
exception Object reference not set to an instance of an object.
Stacktrace:

at #=zHSCBrEm4MU2lcr3JXjwt5i0gDdpAuNsSiojTnaF7Nq1F.#=zoImXG2BOX2Qv()
at #=zQBYeJ$O5zoYs4C$JZYuFCXtquWfR0J$MzWEp_DwLel50yNd88g==.#=zvuA0zN8=(#=zIiLJFlmpyWhFpRxTGDlVzgjD$q0YZ_iIoFA_ygw1OswO #=zm3eKfM0=, #=ztYWYwnALNhuNwST_2qf1gVsyoxW9eiXHbU__MyPU2vN1 #=zDCk_6RtBORjf, #=zcWI4kWkSlHDJrILbyVXAIte_j_$621h_Sw== #=zxVD1aAk=)
at #=zXzdHs2c9gwU7I0R1mRvFnrdA0lE9HHt9VleRqJ_tOVpM3_CghA==.#=zvuA0zN8=(#=zIiLJFlmpyWhFpRxTGDlVzgjD$q0YZ_iIoFA_ygw1OswO #=zm3eKfM0=, #=ztYWYwnALNhuNwST_2qf1gVsyoxW9eiXHbU__MyPU2vN1 #=zDCk_6RtBORjf)
at #=zQ1QRfnYGY_Cf6TD93PUWH$ywpSJCHHAX6VHK60NvO0CI.#=z7U6SjlinAWt_in3BrQ==(#=zIiLJFlmpyWhFpRxTGDlVzgjD$q0YZ_iIoFA_ygw1OswO #=zm3eKfM0=, List1 #=zFhiZL34=) at #=zQ1QRfnYGY_Cf6TD93PUWH$ywpSJCHHAX6VHK60NvO0CI.#=zLQr6x5KvRealA_X7DQ==(#=zIiLJFlmpyWhFpRxTGDlVzgjD$q0YZ_iIoFA_ygw1OswO #=zm3eKfM0=, List1 #=zFhiZL34=)
at #=zQ1QRfnYGY_Cf6TD93PUWH$ywpSJCHHAX6VHK60NvO0CI.#=zRFmAjLfGKo03(#=zIiLJFlmpyWhFpRxTGDlVzgjD$q0YZ_iIoFA_ygw1OswO #=zm3eKfM0=, List1 #=zX6QibqsMQDh5) at #=zQ1QRfnYGY_Cf6TD93PUWH$ywpSJCHHAX6VHK60NvO0CI.#=ztFDpuUo=(#=zLddFn9Q2DICjd859gbl8nwmP8mvQ #=zWSWMwpI=, #=zELMwkqufAmt6AlqpGYl3U96Ze5J8rVt7VQ== #=zrKM3PGU=, #=zIiLJFlmpyWhFpRxTGDlVzgjD$q0YZ_iIoFA_ygw1OswO #=zm3eKfM0=) at #=zHE0hpllHKmxdg7IfhSSuNVtcFIS0wmiEBg==.#=z9eyn$ls=(Int32 #=zVB1rgdk=, IList1 #=ztUv4fQKA$K5gWEGxgg==, #=zgu2311$AmpO2 #=zMcH9ap0=)
at #=zHE0hpllHKmxdg7IfhSSuNVtcFIS0wmiEBg==.#=ztFDpuUo=()
at #=zIQshx$k1WL1J5YDCvS_Lkwn0kV4e.#=zvR9KMq4aEDHn(Document #=zWSWMwpI=, #=zIiLJFlmpyWhFpRxTGDlVzgjD$q0YZ_iIoFA_ygw1OswO& #=zrpHMiH8dF7sWN4Ojkg==, UnifiedSaveOptions #=z48c_1lM=, Int32& #=z4CoMxxO3EEJD, Boolean #=zxe8hevM=)
at #=zIQshx$k1WL1J5YDCvS_Lkwn0kV4e.#=zLu0vit7_hyx5s4K_Kg==(Document #=zEf3DAEXah6Y7, #=zIiLJFlmpyWhFpRxTGDlVzgjD$q0YZ_iIoFA_ygw1OswO& #=zm3eKfM0=, UnifiedSaveOptions #=z48c_1lM=, Int32& #=zdrvoQ9Lw2IfX, Boolean #=zxe8hevM=)
at (Object , Object[] )
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zwak55gkoeFd4lXwJkmnACioHB5qy(MethodBase #=zXxxDIBA=, Object #=zKodf9pI=, Object[] #=zs6r$9C0=, Boolean #=zdq50OEA=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zp26A7lNDRqor9BMclnDXuI4qhcEBto55N7P9548=(MethodBase #=zXxxDIBA=, Boolean #=zKodf9pI=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zI3ob6H4h9K_cPPondrrqZ20=(#=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w= #=zXxxDIBA=, #=qfTjkMLvhcrB7qxIOrZgPBNk_hKdLj9eQcG6djEAg8HU= #=zKodf9pI=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zTPwCuOOAHqzt2JnYyg==()
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zEzTjmt9VV5Xqq_KESfLe6Im3N43$OntCCA==(Boolean #=zXxxDIBA=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zPEgoD$$lbXlw9S9YzLM2Ox0mnAJrhUThWtrfWtjGzTJ3(Object #=zXxxDIBA=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=z8oBKkxDjVXLBXYrt7fQRRAY=()
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zTczS9N$wrPaSprJCwqbcyxo=(#=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w= #=zXxxDIBA=, #=qfTjkMLvhcrB7qxIOrZgPBNk_hKdLj9eQcG6djEAg8HU= #=zKodf9pI=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zTPwCuOOAHqzt2JnYyg==()
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zEzTjmt9VV5Xqq_KESfLe6Im3N43$OntCCA==(Boolean #=zXxxDIBA=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zPEgoD$$lbXlw9S9YzLM2Ox0mnAJrhUThWtrfWtjGzTJ3(Object #=zXxxDIBA=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=z8oBKkxDjVXLBXYrt7fQRRAY=()
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zGOVlxmhFPx3VCWM83k9RCmG6qcNnK3cTmtsYLo8=(Object #=zXxxDIBA=, UInt32 #=zKodf9pI=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zEzTjmt9VV5Xqq_KESfLe6Im3N43$OntCCA==(Boolean #=zXxxDIBA=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zEzTjmt9VV5Xqq_KESfLe6Im3N43$OntCCA==(Boolean #=zXxxDIBA=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=z6lNr3FJrSGwtPSOoANYHHQIbYoU8MH$jTUxWk8I=()
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zY1Zbb_qmSpp6492OmgQL03ZtUFROo8cz38pjorA=(Object[] #=zXxxDIBA=, Type[] #=zKodf9pI=, Type[] #=zs6r$9C0=, Object[] #=zdq50OEA=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zHfr0rLig_ZNWwbr8vx5mZ70=(Stream #=zXxxDIBA=, String #=zKodf9pI=, Object[] #=zs6r$9C0=, Type[] #=zdq50OEA=, Type[] #=zx3KnP8k=, Object[] #=ze2YZvio=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zInT_IqovepTP3mcHeUlRmbA=(Stream #=zXxxDIBA=, String #=zKodf9pI=, Object[] #=zs6r$9C0=)
at #=qe$ho3bMkY9_r70Ox5qCeRAMOByG2fDOiOE_jE3vat0w=.#=zBTjOW2jWCnSuiT4NTNDNH4M=(Stream #=zXxxDIBA=, String #=zKodf9pI=, Object[] #=zs6r$9C0=)
at #=zuNpRk0KtGqCC$jcC$9pOdwc=.#=zU9Q7PKJnRQ9i(Document #=zwT9Qjlo=, Stream #=z94FTxwBexupB, ExcelSaveOptions #=z8uw8i3pYb5fQvAInuA==, DocSaveOptions #=zqB9KEMrMJpTv)
at #=zuNpRk0KtGqCC$jcC$9pOdwc=.#=zmuvS0so=(Document #=zwT9Qjlo=, Stream #=zzLWoftMiV2$R, ExcelSaveOptions #=z48c_1lM=)
at Aspose.Pdf.Document.#=zxUcMrARBMU09(Stream #=zzLWoftMiV2$R, SaveOptions #=z48c_1lM=)
at Aspose.Pdf.Document.#=zxUcMrARBMU09(String #=zzPdnIPDxw5oo, SaveOptions #=z48c_1lM=)
at Aspose.Pdf.Document.Save(String outputFileName, SaveOptions options)

Thank you.

@paulofbdc

The issue seems related to missing fonts in the system. Would you please make sure to install the Microsoft TrueType Fonts in the Linux (msttcorefonts package)? In case issue still persists, please share your sample source PDF with us so that we can test the scenario in our environment and address it accordingly.

Hello,

I tried installing msttcorefonts and it still didn’t work. On Windows it still works, the error only happens when I run it on Amazon Linux. Im attaching a sample of the pdf that we are using.

Thank you for your help.
sample.pdf (80.8 KB)

We have tested the scenario in our environment under Linux and MAC. We did not notice any issue. The following code snippet was used to test the case:

string newPath = @"/Users/mudassirkhan/Downloads/"+"generated.xml";
string path = @"/Users/mudassirkhan/Downloads/";

//            Stream fileStream = new MemoryStream(fileBytes);
Document pdfDocument = new Document(path + "sample.pdf");

// Set output format
ExcelSaveOptions option = new ExcelSaveOptions();
option.MinimizeTheNumberOfWorksheets = true;
option.InsertBlankColumnAtFirst = true;
option.ScaleFactor=2.0;
pdfDocument.Save(newPath, option);

Can you please make sure that fonts are placed in “/usr/share/fonts/truetype/msttcorefonts” directory as Aspose.PDF for .NET scans this folder on Linux like operating systems. In case operating system has other default folder/directory for fonts, you should use following line of code before performing any operation using Aspose.PDF.

Aspose.Pdf.Text.FontRepository.Sources.Add(new FolderFontSource("<user's path to ms fonts>"));

Furthermore, please make sure that libgdiplus package is installed and updated. Please feel free to let us know in case you still notice any issues.