Save DOC to PDF converts Arial to ArialMT

Dear Aspose-Team,

we have a font problem converting a DOC to a PDF. We’re using this function with formatPDFA=false:


private synchronized boolean createPDFByAspose(Document doc, String target, boolean formatPDFA)
{
try
{
PdfSaveOptions options = new PdfSaveOptions();
if (formatPDFA)
options.setCompliance(PdfCompliance.PDF_A_1_B);

doc.save(target, options);

return true;
}
catch (Exception ex)
{
logger.logError(ex,
getName() + “: createPDFByAspose() failed! " + doc.getOriginalFileName() + “->”
+ target + " (” + formatPDFA + “)”);
return false;
}
}


When you open the PDF, mark some text and take a look to the font properties you see, that there is used ArialMT instead of Arial (Font_Property_in_PDF.png). The conversion was done on an Windows Server 2003 and Aspose.Words 13.9 for Java.
When saving the document in Word 2010 as pdf, the font properties says, that the pdf uses Arial, like the original word document.

What do I have to do, that Aspose.Words uses Arial in the pdf and not ArialMT?

Kind reagrds,
Frank

Hi Frank,


Thanks for your inquiry. Pdf specification says that PostScript font names should be used. ‘ArialMT’ is just a PostScript name of the corresponding Arial font. This is the expected behavior and Aspose.Words outputs the font names to Pdf according to the specifications. If we can help you with anything else, please feel free to ask.

Best regards,

Dear Awais,

thanks for you fast reply. Is there a way to bypass these specifications with Aspose.Words? Can I include the TrueType fonts like Rendering in C#|Aspose.Words for .NET to bypass the specification? If not, is there a way to do this with Aspose.PDF?

Best regards,
Frank

Hi Frank,


Thanks for your inquiry. I am in communication with our development team and will get back to you soon.

Best regards,

Hi Frank,


Thanks for being patient. Unfortunately, in this case, there is no way in Aspose.Words that you can use to bypass PDF specification. Please note that PDF specification clearly demands writing PostScript font names and we can think of no reason for violating these specifications. We think “Arial” font written by Microsoft Word 2013 may just be a bug which will be fixed in future updates.
Frank:
If not, is there a way to do this with Aspose.PDF?
Secondly, I am moving your request in Aspose.Pdf forum. My colleagues from Aspose.Pdf component team will answer you shortly.

Best regards,

Hi Frank,

Thanks for your inquiry. I’m afraid Aspose.Pdf doesn’t support DOC to PDF conversion. However, as a workaround, you can first convert DOC to PDF using Aspose.Words and later replace ArialMT with Arial in the resultant PDF file and remove unused fonts using Aspose.Pdf as follows. Hopefully, it will help you to accomplish your requirements.

Document doc = new Document(myDir + "Arial.pdf");

TextFragmentAbsorber absorber = new TextFragmentAbsorber(new TextEditOptions(TextEditOptions.FontReplace.RemoveUnusedFonts));

doc.Pages.Accept(absorber);

foreach (TextFragment textFragment in absorber.TextFragments)
{
    if (textFragment.TextState.Font.FontName == "ArialMT")
    {
        textFragment.TextState.Font = FontRepository.FindFont("Arial");
    }
}

doc.Save(myDir + "testout.pdf");

Please feel free to contact us for any further assistance.

Best Regards,

Hi Frank,

Adding more to Tilal’s comments, the code snippet shared above is related to Aspose.Pdf for .NET. Whereas I have observed that you are working over Java platform and I am afraid Aspose.Pdf for Java currently does not support an overload of TextFragmentAbsorber(…) which can accept TextEditOptions as argument.

For the sake of correction, I have logged it in our issue tracking system as PDFNEWJAVA-33798. We will investigate this issue in details and will keep you updated on the status of a correction.

We apologize for your inconvenience.

Thanks for the comments. I’ve tested the code with a trial version of Aspose.PDF 4.3.1 for Java.

When you look at the document attached and use it as the input document, Aspose.PDF allways says that the font is Arial when looking to the variable fontName while running the code:

Document pdf = new Document(“C:/arial.pdf”);

TextFragmentAbsorber absorber = new TextFragmentAbsorber();

TextSearchOptions opt = new TextSearchOptions(true);

absorber.setTextSearchOptions(opt);

pdf.getPages().accept(absorber);

TextFragmentCollection col = absorber.getTextFragments();

for(Iterator iterator = col.iterator(); iterator.hasNext():wink:
{
TextFragment textFragment = iterator.next();

String fontName = textFragment.getTextState().getFont().getFontName();

if (fontName.equals(“ArialMT”))
{
textFragment.getTextState().setFont(FontRepository.findFont(“Arial”));
textFragment.getTextState().getFont().isEmbedded(true);
}
}

pdf.save(“C:/testout.pdf”);

How could that be? Is there anything wrong in the code?

Hi Frank,


I have tested the scenario using Aspose.Pdf for Java 4.3.0 in Eclipse Juno application running over Windows 7 (x64) with JDK 1.7 and as per my observations, Arial and Arial Bold font names are being returned.

Dear Nayyer,

this test shows exactly what I mean! When you open the original PDF in an PDF Viewer like PDF-XChange Viewer, mark some text and open the text properties it shows ArialMT (look at the attachment).

Why does the code in my and your test returns Arial instead of ArialMT?

Kind regards,
Frank


Hi Frank,


Thanks for sharing the details.

I have again tested the scenario and I am able to reproduce the same issue that incorrect font name is being returned. For the sake of correction, I have logged this
issue as
PDFNEWJAVA-33828 in our issue tracking system. We will
further look into the details of this problem and will keep you updated on the
status of correction. Please be patient and spare us little time. We are sorry
for this inconvenience.

Hi Nayyer,

I’m very sorry but I still have a question. I’m running the following code with the arial.pdf document attached in the original post:

Document pdf = new Document(“C:/arial.pdf”);

TextFragmentAbsorber absorber = new TextFragmentAbsorber();

TextSearchOptions opt = new TextSearchOptions(true);

absorber.setTextSearchOptions(opt);

pdf.getPages().accept(absorber);

TextFragmentCollection col = absorber.getTextFragments();

for(Iterator iterator = col.iterator(); iterator.hasNext():wink:
{
TextFragment textFragment = iterator.next();

String fontName = textFragment.getTextState().getFont().getFontName();

if (fontName.equals(“Arial”))
{
textFragment.getTextState().setFont(FontRepository.findFont(“Times New Roman”));
textFragment.getTextState().getFont().isEmbedded(true);
}
}

pdf.save(“C:/testout.pdf”);

All the text with the font Arial should be changed to font Times New Roman. But when I look at the testout.pdf all the fonts are still Arial. Do I make something wrong?

Kind regards,
Frank

hsp2000:

I’m very sorry but I still have a question. I’m running the following code with the arial.pdf document attached in the original post:

Document pdf = new Document(“C:/arial.pdf”);

TextFragmentAbsorber absorber = new TextFragmentAbsorber();

TextSearchOptions opt = new TextSearchOptions(true);

absorber.setTextSearchOptions(opt);

pdf.getPages().accept(absorber);

TextFragmentCollection col = absorber.getTextFragments();

for(Iterator iterator = col.iterator(); iterator.hasNext():wink:
{
TextFragment textFragment = iterator.next();

String fontName = textFragment.getTextState().getFont().getFontName();

if (fontName.equals(“Arial”))
{
textFragment.getTextState().setFont(FontRepository.findFont(“Times New Roman”));
textFragment.getTextState().getFont().isEmbedded(true);
}
}

pdf.save(“C:/testout.pdf”);

All the text with the font Arial should be changed to font Times New Roman. But when I look at the testout.pdf all the fonts are still Arial. Do I make something wrong?
Hi Frank,

Thanks for contacting support.

I have tested the scenario and I am able to
notice the same problem. For the sake of correction, I have separately logged this issue
as PDFNEWJAVA-33836 in our issue tracking system. We will further
look into the details of this problem and will keep you updated on the status
of correction. Please be patient and spare us little time. We are sorry for
this inconvenience.

Hi Frank,

Thanks for your patience.

The development team has further investigated the issue PDFNEWJAVA-33828 reported earlier and as per our current estimates, we plan to get this issue resolved in Aspose.Pdf for Java 4.7.0 (which will be published in Q1-2014). Please be patient and spare us a little time.

The issues you have found earlier (filed as PDFNEWJAVA-33828) have been fixed in Aspose.Pdf for Java 4.6.0.

The issues you have found earlier (filed as PDFNEWJAVA-33798) have been fixed in Aspose.Pdf for Java 9.0.0.

The issues you have found earlier (filed as ) have been fixed in this update.