Combine adjacent runs with identical formatting?


#1

Is there a method to combine all adjacent runs that have identical formatting?

The reason I see a need for this is that a mail-merge operation on a run containing one field creates three runs- one before the field, one for the field and one after the field- all with identical formatting. This sometimes creates problems if the file is converted to pdf with Aspose.pdf if the field replacement contains spaces, and a space falls on a soft line break as in the following example:

[I'm substituting curly braces for XML less/greater than brackets because this editor eats them!]

original text: text_before {mergefield} text_after

after m-merge: text_before fld fill with blanks text_after //fld value has spaces

Suppose there's a soft linebreak at the end of the field value i.e. after "... blanks"

when saved as formatAsposePdf XML, it becomes 3 segments with identical formatting:

{segment ..}text_before {/segment}

{segment ..}fld fill with blanks{/segment}

{segment ..} text_after{/segment} // leading space in text

When PDF is created, the second line of the text fragment will start with a space instead of at the left margin.

The attached files provide a real example that demonstrates the problem [even if my analysis of the cause is wrong :)]. The file temp5.doc was generated with aspose.word. The first word on line 2, "initial", was a mailmerge field. In the word document it is correctly left justified. temp5.pdf there is a space in the first column of the second line.

The problem of multiple runs is manifested differently on the transition from line 2 to line 3: the text "Option A", in parentheses, is from a merge-field. In the word document the breaks after "Option" and "A)" is kept together. In the PDF, "A" appears before the linebreak and the adjacent closing paren appears on the next line.

How can I combine adjacent runs with identical formatting, or otherwise work around this problem?

Thanks.

-Charu Tevari


#2

On re-examining the pre-pdf XML, it appears that the algorithm for deciding which run ends up containing spaces adjacent to filled merge fields is more complex than I assume in my note above.

The problems resulting from too many runs around a soft line break remain and I look forward to their resolution.

Thanks.

-Charu


#3

Have you guys had a chance to look at this problem- manifested as incorrect formatting of soft linebreaks in text when converted to PDF? [see attached files temp5.doc, temp5.xml, and temp5.pdf].

Is this more of a PDF layout problem? Should I cross post this to the Aspose.PDF forum?

Thanks.

-Charu


#4

That is a known problem. We are discussing it right now in contact with Aspose.Pdf team. I will inform you of the results.


#5

Combining runs with identical formatting is certainly on our task list. However, this will only cure the problem you mentioned partially. The real problem is in the Aspose.Pdf’s line breaking algorithm. Obviously it does some things like starting a new line with a space or wrapping just a sentence ending dot onto a new line. Therefore I’m transferring this matter to the Aspose.Pdf team for them to improve the line breaking algorithm.