Fill empty space with special chars

Hi,

We are generating contracts with aspose.words.
The law in some countries require us to fill empty space at the end of paragraphs with characters, to prevent someone to insert text after sign.

I’m searching a way to insert dash or stars in every paragraph who are left aligned (excluding titles which are centered or right aligned), without causing the paragraph to create a new line.

I’ve absolutly no idea how to compute the remaining free space.
Can you show me the right direction ?

Thanks.

Hi Carlotti,

Thanks for your inquiry. Could you please attach your input and expected output Word documents here for our reference. We will then provide you more information about your query along with code.

Hi,

Here is a sample with input and expected output.

Thanks for your help.

Hi Carlotti,

Thanks for sharing the document. The Aspose.Words.Layout namespace provides classes that allow to access information such as on what page and where on a page particular document elements are positioned, when the document is formatted into pages.

Please check “RenderedDocument” example project in Aspose.Words for .NET examples repository at GitHub. Please use following code example to get the required output.

Hope this helps you. Please let us know if you have any more queries.

Document doc = new Document(MyDir + "Input.docx");
//Get the third paragraph of document
Paragraph paragraph = (Paragraph)doc.GetChild(NodeType.Paragraph, 2, true);
RenderedDocument layoutDoc = new RenderedDocument(doc);
int lines = layoutDoc.GetLayoutEntitiesOfNode(paragraph).Count;
while (true)
{
    layoutDoc = new RenderedDocument(doc);
    if (layoutDoc.GetLayoutEntitiesOfNode(paragraph).Count == lines + 1)
    {
        paragraph.LastChild.Remove();
        break;
    }
    paragraph.AppendChild(new Run(doc, "-"));
}
doc.Save(MyDir + "Out.docx");

Hi,

This look fine functionally, but performance are insanely wrong : up to 7 seconds per line on a strong computer. It’s more than 1 min per page !

I’ve tried to parallelize calls, but I’ve some error in the new LayoutEnumerator() call of the renderer constructor

Additionaly, I’ve fixed a memory issue : your mLayoutToNodeLookup static dictionary must be cleared between each test (i do it with implementing RenderedDocument:idisposable, and by using() at each call, but you can simply remove the static keyword of this dictionary)
Another thing, I will not compare with " == lines + 1", but with " != lines", as it sometimes increment from 1 to 3 directly.

Can you provide something faster ?

Thanks.

Hi Carlotti,

Thanks for your inquiry. We are working over your query and will get back to you soon.

Hi Carlotti,

Thanks for your patience. Please use following code example to achieve your requirements. Hope this helps you.

Please note that when you create a LayoutCollector and specify a Document document object to attach to, the collector will record mapping of document nodes to layout objects when the document is formatted into pages.

LayoutEnumerator class enumerates page layout entities of a document. You can use this class to walk over the page layout model. Available properties are type, geometry, text and page index where entity is rendered, as well as overall structure and relationships. Use combination of GetEntity and Current move to the entity which corresponds to a document node.

Document doc = new Document(MyDir + "Input.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
//Get the third paragraph of document
Paragraph paragraph = (Paragraph)doc.GetChild(NodeType.Paragraph, 2, true);
paragraph.AppendChild(new Run(doc, "-"));
builder.MoveTo(paragraph.Runs[paragraph.Runs.Count - 1]);
BookmarkStart bmStart = builder.StartBookmark("bm");
builder.EndBookmark("bm");
LayoutCollector layoutCollector = new LayoutCollector(doc);
LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);
var renderObject = layoutCollector.GetEntity(bmStart);
layoutEnumerator.Current = renderObject;
RectangleF location = layoutEnumerator.Rectangle;
float TopPosition = location.Top;
int i = 1;
while (true)
{
    paragraph.AppendChild(new Run(doc, "-"));
    builder.MoveTo(paragraph.Runs[paragraph.Runs.Count - 1]);
    bmStart = builder.StartBookmark("bm" + i);
    builder.EndBookmark("bm" + i);
    i++;
    layoutCollector = new LayoutCollector(doc);
    layoutEnumerator = new LayoutEnumerator(doc);
    renderObject = layoutCollector.GetEntity(bmStart);
    layoutEnumerator.Current = renderObject;
    location = layoutEnumerator.Rectangle;
    if (TopPosition != location.Top)
    {
        break;
    }
}
paragraph.LastChild.Remove();
doc.Range.Bookmarks.Clear();
doc.Save(MyDir + "Out.docx");

Hi,

Thanks for your help, but it doesn’t change anything on the performance;

I try another approach : compute “remaining space” in the current paragraph, and insert a bunch of character in one call, in order to reduce the number of render.

Unfortunately, I’m lost in units translation…
I use this code to compute a caracter measure :

var formattedText = new Aspose.Pdf.Facades.FormattedText("-", Color.Black, paragraph.ParagraphFormat.Style.Font.Name,
Aspose.Pdf.Facades.EncodingType.Winansi, false, (float)paragraph.ParagraphFormat.Style.Font.Size);
var charSpace = formattedText.TextWidth;

Then in the while loop, I remove margin and location to the page width, in order to compute the number of characters I need to insert.

double remainingSpace = paragraph.ParentSection.PageSetup.PageWidth - paragraph.ParentSection.PageSetup.RightMargin - location.Right;
double nbChar = remainingSpace / charSpace;

// insert
paragraph.AppendChild(new Run(doc, String.Concat(Enumerable.Repeat("-", Math.Max((int)nbChar, 1)))));

This is creating twice as much caracters, but I don’t understand whats wrong.

Here is my current code :

Document doc = new Document(inputPath);

int worked = 0;
//Get the third paragraph of document
object verrou = new object();
int paragraphIndex = 0;
bool run = true;
while (run)
// ParallelUtils.While(() => run, () =>
{
    Paragraph paragraph = null;
    DocumentBuilder builder = new DocumentBuilder(doc);
    lock (verrou)
    {
        paragraphIndex++;
        paragraph = (Paragraph)doc.GetChild(NodeType.Paragraph, paragraphIndex, true);
    }
    if (paragraph == null)
    {
        run = false;
        break;
        //return;
    }

    if (paragraph.ParentStory.NodeType == NodeType.Body // dans le corps du texte
    && paragraph.Runs.Count > 0 // on ignore les lignes vides
    && paragraph.ParagraphFormat.Style.Font.Underline == Underline.None // on ignore les éléments sousligné, sans doutes des titres
    && // on ne prends que les alignements gauche et justifiés
    (paragraph.ParagraphFormat.Alignment == ParagraphAlignment.Left || paragraph.ParagraphFormat.Alignment == ParagraphAlignment.Justify))
    {
        //detect current last char position
        builder.MoveTo(paragraph.Runs[paragraph.Runs.Count - 1]);
        // insert empty bookmark
        BookmarkStart bmStart = builder.StartBookmark("bm");
        builder.EndBookmark("bm");
        // render
        LayoutCollector layoutCollector = new LayoutCollector(doc);
        LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);
        var renderObject = layoutCollector.GetEntity(bmStart);
        if (renderObject == null)
            continue; //return;
        layoutEnumerator.Current = renderObject;
        // detect item position
        RectangleF location = layoutEnumerator.Rectangle;
        float topPosition = location.Top;
        var formattedText = new Aspose.Pdf.Facades.FormattedText("-", Color.Black, paragraph.ParagraphFormat.Style.Font.Name,
        Aspose.Pdf.Facades.EncodingType.Winansi, false, (float)paragraph.ParagraphFormat.Style.Font.Size);
        var charSpace = formattedText.TextWidth + 1;
        // paragraph.ParentStory.
        int i = 1;
        // while previous top position == current top position, insert BM
        while (topPosition == location.Top)
        {

            double remainingSpace = paragraph.ParentSection.PageSetup.PageWidth - paragraph.ParentSection.PageSetup.RightMargin - location.Right;
            double nbChar = remainingSpace / charSpace;

            // insert
            paragraph.AppendChild(new Run(doc, String.Concat(Enumerable.Repeat("-", Math.Max((int)nbChar, 1)))));
            // calc again
            builder.MoveTo(paragraph.Runs[paragraph.Runs.Count - 1]);
            bmStart = builder.StartBookmark("bm" + i);
            builder.EndBookmark("bm" + i);
            i++;
            layoutCollector = new LayoutCollector(doc);
            layoutEnumerator = new LayoutEnumerator(doc);
            renderObject = layoutCollector.GetEntity(bmStart);
            layoutEnumerator.Current = renderObject;
            location = layoutEnumerator.Rectangle;
        }
        paragraph.LastChild.Remove();
        // del inserted bookmarks
        doc.Range.Bookmarks.Clear();
    }
    paragraphIndex++;
    worked++;
    Debug.WriteLine(DateTime.Now.ToString("H:mm:ss zzz") + " : " + worked);
}
//);

doc.Save(inputPath);

Hi Carlotti,

Thanks for your inquiry. Please note that MS Word document is flow document and does not contain any information about its layout into lines and pages. Therefore, technically there is no “Page”, “Line” concept in Word document. Pages and lines are created by Microsoft Word on the fly.

The shared document does not contain monospace fonts. So the each character’s width may difference from others. To check the character’s position we need to build page layout. We have modified the code example. This code example builds the page layout only twice. Hope this helps you. Please let us know if you have any more queries.

Document doc = new Document(MyDir + "Input.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
//Get the third paragraph of document
Paragraph paragraph = (Paragraph)doc.GetChild(NodeType.Paragraph, 2, true);
RenderedDocument layoutDoc = new RenderedDocument(doc);
LayoutCollection<LayoutEntity> lines = layoutDoc.GetLayoutEntitiesOfNode(paragraph);
int characters = 0;
if (lines.Count > 1)
{
    characters = lines[0].Text.Length;
    for (int i = 0; i < characters; i++)
    {
        paragraph.AppendChild(new Run(doc, " -"));
        builder.MoveTo(paragraph.Runs[paragraph.Runs.Count - 1]);
        builder.StartBookmark("bm" + i);
        builder.EndBookmark("bm" + i);
    }
    LayoutCollector layoutCollector = new LayoutCollector(doc);
    LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);
    var renderObject = layoutCollector.GetEntity(doc.Range.Bookmarks["bm0"].BookmarkStart);
    layoutEnumerator.Current = renderObject;
    RectangleF location = layoutEnumerator.Rectangle;
    float TopPosition = location.Top;
    ArrayList nodes = new ArrayList();
    foreach (Bookmark bookmark in paragraph.Range.Bookmarks)
    {
        if (bookmark.Name.StartsWith("bm"))
        {
            renderObject = layoutCollector.GetEntity(bookmark.BookmarkStart);
            layoutEnumerator.Current = renderObject;
            location = layoutEnumerator.Rectangle;
            if (TopPosition != location.Top)
            {
                Node currentNode = bookmark.BookmarkStart;
                while (currentNode != paragraph.LastChild)
                {
                    currentNode = currentNode.NextSibling;
                    if (currentNode.NodeType == NodeType.Run)
                    {
                        nodes.Add(currentNode);
                    }
                }
                foreach (Node node in nodes)
                {
                    node.Remove();
                }
                break;
            }
        }
    }
}
doc.Range.Bookmarks.Clear();
paragraph.LastChild.Remove();
doc.Save(MyDir + "Out.docx");

Hi,

It take me time to figure how your code is working. You insert a bunch of “-”, then remove those which are on a new line. That’s a very good idea.
I’ve adapted your code to not be dependant of a multiline paragraph, and I’ve set an absolute number (150) of “-” to fill the paragraph.
This also remove the multine row rendering.

I’ve also improved the second part to remove the additional “-”. It’s much faster now.
I’m sad that this is not parallelizable, but it’s far better than before.
Thanks !

Here is the actual code :

private static void FillParagraph(Paragraph paragraph, DocumentBuilder builder, Document doc, string caracter)
{
    // 150char is a row of "-" with Times New Roman 10. Insert the character plus a bookmark for index
    for (int i = 0; i < 150; i++)
    {
        paragraph.AppendChild(new Run(doc, caracter));
        builder.MoveTo(paragraph.Runs[paragraph.Runs.Count - 1]);
        builder.StartBookmark("bm" + i);
        builder.EndBookmark("bm" + i);
    }
    // render
    LayoutCollector layoutCollector = new LayoutCollector(doc);
    LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);

    // get initial endofline
    var renderObject = layoutCollector.GetEntity(doc.Range.Bookmarks["bm0"].BookmarkStart);
    layoutEnumerator.Current = renderObject;
    RectangleF location = layoutEnumerator.Rectangle;
    float topPosition = location.Top;

    // iterate through bookmark to find the first which is on a new line
    Node endNode = null;
    foreach (Bookmark bookmark in paragraph.Range.Bookmarks)
    {
        if (bookmark.Name.StartsWith("bm"))
        {
            renderObject = layoutCollector.GetEntity(bookmark.BookmarkStart);
            layoutEnumerator.Current = renderObject;
            location = layoutEnumerator.Rectangle;

            // if current bm is on a new line
            if (topPosition != location.Top)
            {
                endNode = bookmark.BookmarkStart;
                break;
            }
        }
    }
    // if an end has been found
    if (endNode != null)
    {
        while (paragraph.LastChild != endNode) // del everything after this bookmark. Iterate from end to start.
            paragraph.LastChild.Remove();
        paragraph.LastChild.Remove(); // del last start bookmark
    }
    doc.Range.Bookmarks.Clear(); // clear bookmark, keep Runs
}

Hi Carlotti,

Thanks for your feedback. It is nice to hear from you that you have achieved your requirement. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.