Aspose.Pdf produces inefficient XPS

Brian_THOMAS · October 28, 2016, 11:19am

In case https://forum.aspose.com/t/34912 I reported that when Aspose.Words saves xps content from a docx, the produced xps is inefficient, and that each glyph of each string appears in its own xps tag.

I'd like to report similar behaviour from Aspose.Pdf. In the attached Bundle2.zip there's a Word document named Single Page.docx which is the source document for these tests.

From inside WinWord I exported that docx file to create two files: Single Page Exported from WinWord.pdf and (for reference) Single Page Exported from WinWord.xps

Then I opened the pdf file inside Aspose.Pdf and Saved to produce the attached file FromAsposePdf.xps

If you compare the content of the two xps files, the one produced by aspose looks like this, where the string "Document" is represented a character at a time:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

  
    
      
        
        
        
        
        
        
        
        
        . . .

In the xps from WinWord, an entire string is placed at once, which is much more efficient:


  
    
      
    
    <Glyphs Name="a0" BidiLevel="0" Fill="#FF000000"
      FontUri="/Resources/0B2C1876-228A-2D5D-D07B-A95EB73A2F0B.odttf"  FontRenderingEmSize="11.04" StyleSimulations="None" OriginX="270.29"
      OriginY="45.72" UnicodeString="Document 1"
      Indices=",61.957;,53.261;,42.391,0,0;,51.087,0,0;,80.435,0,0;,50,0,0;,51.087,0,0;,33.696,0,0;,22.826,0,0;,50,0,0">
    
    . . .

Is there an Optimise output setting for Aspose.Pdf to make it behave like Aspose.Words?

codewarior · October 31, 2016, 2:42pm

Hi,

Thanks for using our API’s.

We are looking into this matter and will get back to you soon. We are sorry for this inconvenience.

codewarior · November 13, 2016, 12:36pm

Hi,

Thanks for your patience and sorry for the delayed response.

I have tried replicating the issue of difference in content of XPS files generated with Aspose.Words and Apsose.Pdf but I am afraid I am unable to notice any issue. As per my observations, when trying to view the contents of XPS files in notepad application, special characters appear in the document. Can you please share some details on how you are viewing the contents, so that we can further look into this matter. We are sorry for this inconvenience.

Brian_THOMAS · November 14, 2016, 4:45am

Thanks for looking at this.

If you try to open xps files in notepad you will not be able to make sense of the display!

To investigate this issue you will need to google for OpenXMLEditor.vsix which is a VisualStudio plug-in that lets you explore docx and xps files. I did mention it when I posted https://forum.aspose.com/t/34912

Then you need to open the supplied Single Page Exported from WinWord.pdf inside aspose,pdf and save as xps.

Open the new xps file inside visual studio and you'll see the content is expressed very inefficiently.

Until recently, aspose.words exhibited the same behaviour, but you fixed it.

The inefficient behaviour is that each character is expressed in its own tag in the xps. The more efficient behaviour is that a single tag contains an entire string of text.

codewarior · November 15, 2016, 5:09am

Hi,

Thanks for sharing the details.

I have logged above stated problem as an investigation ticket PDFNET-41789 in our issue tracking system. We will further look into the details of this problem and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

Brian_THOMAS · June 12, 2017, 11:14am

Hi

Please can I have an update on this issue?

Thanks

codewarior · June 12, 2017, 3:17pm

Hi,

Thanks for contacting support.

I am afraid the earlier reported issue is not yet resolved as the team has been busy fixing other previously reported high priority issues. However the team will surely start investigating this problem as per their development schedule and as soon as we have some details regarding this problem, we will let you know.

Please be patient and spare us little time. We are sorry for this delay and inconvenience.

Brian_THOMAS · November 6, 2017, 11:13am

Hi
Do you have an update on this for me please?

Thanks

imran.rafique · November 6, 2017, 5:40pm

@Brian_THOMAS,

The linked ticket ID PDFNET-41789 is not resolved yet. We have logged an ETA request under the same ticket ID PDFNET-41789. We will let you know once a significant progress has been made in this regard.