Application Fields/Bookmarks Loosing Data ( Replaced with ?)

vhattarki · February 10, 2009, 7:35am

This is urgent !!!
This question was original posted in Editor and has been asked to put now on Words.
Original Post Link has sample image showing the exact problem…

Attached is the sample document, try open this in Aspose.Editor all the fields get replaced with “?”
Let me know if there is any way to get the data behind the “?”. I am actually doing this for 1200 pages document so performance is important !!!
If this is something that you can provide a sample code would be really nice. Based on original post it looks like it is possible (… Aspose.Words to translate fields into actual data )
Thanks,
Vinay Hattarki

vhattarki · February 10, 2009, 9:08am

I have tried the code below but its not working as I expected. I have few FieldFormTextInput which have no name to it. I as well tried to assign new name to them and tried finding the bookmark. Its not replacing the bookmark as expected.
Please update with the fix I am looking for to replace the “?” with the actual data. I refered this link as well.
https://forum.aspose.com/t/112097

Dim lformFields As Aspose.Words.Fields.FormFieldCollection = mobjMainAsposeDoc.Range.FormFields
Dim lobjDocBuilder As Aspose.Words.DocumentBuilder = New DocumentBuilder(mobjMainAsposeDoc)
Dim liFieldNo As Integer = 1
For Each lField As Aspose.Words.Fields.FormField In lformFields
 If lField.Type = Fields.FieldType.FieldFormTextInput Then
 If lField.Name <> "" Then
If liFieldNo > 1 Then
lobjDocBuilder.MoveToBookmark(lField.Name)
End If
Else
lField.Name = "FieldNo" & liFieldNo
lobjDocBuilder.MoveToBookmark(lField.Name)
End If
lobjDocBuilder.Write(lField.Result)
'lField.Result = String.Empty
liFieldNo = liFieldNo + 1
End If
Next

alexey.noskov · February 10, 2009, 9:15am

Hi
Thanks for your request. You can try using the following code to unlink fields in your document and remove bookmarks:

public void Test152()
{
    // Open document
    Document doc = new Document(@"Test152\in.doc");
    // Unlink fields
    UnlinkFields(doc);
    // Also we should remove bookmarks
    doc.Range.Bookmarks.Clear();
    // Save the document
    doc.Save(@"Test152\out.doc");
}
private void UnlinkFields(Document doc)
{
    // Get collection of FieldStart nodes
    NodeCollection fieldStarts = doc.GetChildNodes(NodeType.FieldStart, true);
    // Get collection of FieldSeparator nodes
    NodeCollection fieldSeparators = doc.GetChildNodes(NodeType.FieldSeparator, true);
    // And get collection of FieldEnd nodes
    NodeCollection fieldEnds = doc.GetChildNodes(NodeType.FieldEnd, true);
    // Loop through all FieldStart nodes
    foreach (FieldStart start in fieldStarts)
    {
        // Search for FieldSeparator node. it is needed to remove field code from the document
        Node curNode = start;
        while (curNode.NodeType != NodeType.FieldSeparator && curNode.NodeType != NodeType.FieldEnd)
        {
            curNode = curNode.NextPreOrder(doc);
            if (curNode == null)
                break;
        }
        // Remove all nodes between Fieldstart and FieldSeparator (of FieldEnd, depending from field type)
        if (curNode != null)
        {
            RemoveSequence(start, curNode);
        }
    }
    // Now we can remove FieldStart, FieldSeparator and FieldEnd nodes
    fieldStarts.Clear();
    fieldSeparators.Clear();
    fieldEnds.Clear();
}
/// 
/// Remove all nodes between start and end nodes, except start and end nodes
/// 
/// The start node
/// The end node
private void RemoveSequence(Node start, Node end)
{
    Node curNode = start.NextPreOrder(start.Document);
    while (curNode != null && !curNode.Equals(end))
    {
        // Move to next node
        Node nextNode = curNode.NextPreOrder(start.Document);
        // Check whether current contains end node
        if (curNode.IsComposite)
        {
            if (!(curNode as CompositeNode).ChildNodes.Contains(end) &&
            !(curNode as CompositeNode).ChildNodes.Contains(start))
            {
                nextNode = curNode.NextSibling;
                curNode.Remove();
            }
        }
        else
        {
            curNode.Remove();
        }
        curNode = nextNode;
    }
}

Hope this could help you.
Best regards.

vhattarki · February 10, 2009, 9:30am

From what I understand from this code is basically its removing the nodes and clearing out the bookmarks. Will it not clear out the data as well which is on field??
I wish to keep the data as it is which is behind the “?” and show the data instead of “?”
Please update.

alexey.noskov · February 10, 2009, 9:41am

Hi
Thanks for your inquiry. This code will not remove values of fields and text between BookmarkStart and BookmarkEnd. So content of the document will not be distorted.
Best regards.

vhattarki · February 10, 2009, 10:45am

This actually takes out the data that is aprat from the TextField. I converted this C# code to vb.net code and it seems its taking off more things than expected results.
Any idea why its doing that ?

alexey.noskov · February 10, 2009, 2:30pm

Hi
Thanks for your request. Could you please attach the document you are using for testing and output document? I will investigate the issue and provide you more information.
Best regards.

vhattarki · February 10, 2009, 3:01pm

Hello,
btw…I have actually attached the sample document with my previous posts. I think I have fixed the previous issue. Please review the code below which does show the actual data and takes out the ("?"). I see only one problem in this. I am actually processing 1200 pages document and it uses Shapes class to draw the line shapre I am enable to get the line show in the Editor.
Is there any way to get the line displayed by tweeking in the vb code below? Also if you can review the code below and suggest anything.
Thanks

Private Sub AsposeUnlinkFormFieldData()
Dim lFieldStarts As Aspose.Words.NodeCollection = mobjMainAsposeDoc.GetChildNodes(NodeType.FieldStart, True)
Dim lFieldSeparators As Aspose.Words.NodeCollection = mobjMainAsposeDoc.GetChildNodes(NodeType.FieldSeparator, True)
Dim lFieldEnds As Aspose.Words.NodeCollection = mobjMainAsposeDoc.GetChildNodes(NodeType.FieldEnd, True)
For Each lStart As Aspose.Words.Fields.FieldStart In lFieldStarts
Dim lCurNode As Aspose.Words.Node = lStart
While (lCurNode.NodeType <> NodeType.FieldSeparator And lCurNode.NodeType <> NodeType.FieldEnd)
lCurNode = lCurNode.NextPreOrder(mobjMainAsposeDoc)
If IsNothing(lCurNode) = True Then
Exit While
End If
End While
If IsNothing(lCurNode) = False Then
AsposeRemoveSequence(lStart, lCurNode)
End If
Next
lFieldStarts.Clear()
lFieldSeparators.Clear()
lFieldEnds.Clear()
End Sub
Private Sub AsposeRemoveSequence(ByVal pStart As Aspose.Words.Node, ByVal pEnd As Aspose.Words.Node)
Dim lCurNode As Aspose.Words.Node = pStart.NextPreOrder(pStart.Document)
While (IsNothing(lCurNode) = False)
Dim lNextNode As Aspose.Words.Node = lCurNode.NextPreOrder(pStart.Document)
If lCurNode.IsComposite = False Then
If Trim(lCurNode.Range.Text).ToUpper = "FORMTEXT" Then
lCurNode.Range.Delete()
End If
If lCurNode.NodeType = NodeType.FormField Then
lCurNode.Remove()
End If
End If
lCurNode = lNextNode
End While
End Sub

alexey.noskov · February 11, 2009, 12:44am

Hi
Thanks for your request. Here is analog of my code in VB:

Private Sub UnlinkFields(ByVal doc As Document)
'Get collection of FieldStart nodes
Dim fieldStarts As NodeCollection = doc.GetChildNodes(NodeType.FieldStart, True)
'Get collection of FieldSeparator nodes
Dim fieldSeparators As NodeCollection = doc.GetChildNodes(NodeType.FieldSeparator, True)
'And get collection of FieldEnd nodes
Dim fieldEnds As NodeCollection = doc.GetChildNodes(NodeType.FieldEnd, True)
'Loop through all FieldStart nodes
For Each start As FieldStart In fieldStarts
'Search for FieldSeparator node. it is needed to remove field code from the document
Dim curNode As Node = start
While (Not curNode.NodeType.Equals(NodeType.FieldSeparator) And Not curNode.NodeType.Equals(NodeType.FieldEnd))
curNode = curNode.NextPreOrder(doc)
If (curNode Is Nothing) Then
Exit While
End If
End While
'Remove all nodes between Fieldstart and FieldSeparator (of FieldEnd, depending from field type)
If (Not curNode Is Nothing) Then
RemoveSequence(start, curNode)
End If
Next
'Now we can remove FieldStart, FieldSeparator and FieldEnd nodes
fieldStarts.Clear()
fieldSeparators.Clear()
fieldEnds.Clear()
End Sub
''' 
''' Remove all nodes between start and end nodes, except start and end nodes
''' 
''' The start node
''' The end node
Private Sub RemoveSequence(ByVal startNode As Node, ByVal endNode As Node)
Dim curNode As Node = startNode.NextPreOrder(startNode.Document)
While (Not curNode Is Nothing And Not curNode.Equals(endNode))
'Move to next node
Dim nextNode As Node = curNode.NextPreOrder(startNode.Document)
'Check whether current contains end node
If (curNode.IsComposite) Then
If (Not (DirectCast(curNode, CompositeNode)).ChildNodes.Contains(endNode) And _
Not (DirectCast(curNode, CompositeNode)).ChildNodes.Contains(startNode)) Then
nextNode = curNode.NextSibling
curNode.Remove()
End If
Else
curNode.Remove()
End If
curNode = nextNode
End While
End Sub

Regarding line, could you please show me code you are using to insert this shape? Also please attach document that shows what is expected output.
Best regards.

vhattarki · February 11, 2009, 9:48am

This is actually the final test and urgent!!!
I used your vb.net code provided and it seems to work just fine. I have same code running on one of the large document which I am appending on the fly which gives error below while trying to open the document in Editor. Where as I was able to save that document and open that 670 page document in MSWord. ( I was able to get small documents open in Editor.) It looks like there is some text form field still available which is not getting removed but it should have shown as “?”

FileCorruptedException : The document does not appear to be a valid WordprocessingML document or contains unsupported elements.

Here is how I open the docuemnt in Editor once my object is saved.

objMainAsposeDoc.Save(System.IO.Path.GetTempPath & “PrintBySession.Doc”, SaveFormat.Doc)
AsposeEditor1.Open(System.IO.Path.GetTempPath & “PrintBySession.Doc”)

Also, It is still not getting the line shape on the document but line does show in MSWord. Here is how I
create the line.

Dim lobjLineDoc As New Aspose.Words.Document
Dim lobjDocBuilder As Aspose.Words.DocumentBuilder = New DocumentBuilder(lobjLineDoc)
Dim lWidth As Double = lobjDocBuilder.CurrentSection.PageSetup.PageWidth
Dim lLine As New Aspose.Words.Drawing.Shape(lobjLineDoc, Drawing.ShapeType.Line)
lLine.Width = lWidth / 2
’ ‘’ ''lLine.RelativeHorizontalPosition = Drawing.RelativeHorizontalPosition.Page
lLine.HorizontalAlignment = Drawing.HorizontalAlignment.Center
’ ‘’ ''lLine.RelativeVerticalPosition = Drawing.RelativeVerticalPosition.Paragraph
lLine.StrokeColor = Color.Black
lLine.Stroke.LineStyle = Drawing.ShapeLineStyle.Single
lLine.StrokeWeight = 1
'lobjDocBuilder.CurrentSection.PageSetup.SectionStart = SectionStart.Continuous
lobjDocBuilder.InsertNode(lLine)
lobjLineDoc.FirstSection.PageSetup.SectionStart = SectionStart.Continuous
lobjLineDoc.FirstSection.PageSetup.PaperSize = Aspose.Words.PaperSize.Legal
mobjMainAsposeDoc.AppendDocument(lobjLineDoc, ImportFormatMode.UseDestinationStyles)

Thanks,
Vinay Hattarki

alexey.noskov · February 11, 2009, 10:40am

Hi
Thanks for your request. Could you please attach this corrupted document for testing? I will investigate the issue and provide you more information.
Best regards,

vhattarki · February 11, 2009, 12:15pm

As requested attached is the 670 page document. This document is built in for loop where each part comes as a append from database. This does work fine when I am processing around 50-100 pages with different sample. But when I process this large 670 pages its giving error to open in Editor.
I have same code you provided running to unlink the fields. Document saves fine with Aspose.Words but issue with opening it in Aspose.Editor.
Thanks,

vhattarki · February 11, 2009, 12:19pm

I also tried opening this document with Demo Editor (Windows Forms Demo) gives error. Attached is the error gif image if this helps.
Thanks,
Vinay Hattarki

alexey.noskov · February 11, 2009, 2:26pm

Hi
Thank you for additional information. Please try saving the document in WML format instead of DOC.
objMainAsposeDoc.Save(System.IO.Path.GetTempPath & “PrintBySession.xml”, SaveFormat.WordML)
AsposeEditor1.Open(System.IO.Path.GetTempPath & “PrintBySession.xml”)
Hope this helps.
Best regards.

vhattarki · February 11, 2009, 2:46pm

Thanks for update!!!
I tried this and errors out on Editor.Open saying
FileCorruptedException was Caught…The document does not appear to be a valid WordprocessingML document or contains unsupported elements.
Did it work for you with sample document I provided, I hope not. I am even not able to open this doc with Demo Editor.
Any update on line/shape issue ?
Thanks,
Vinay Hattarki

alexey.noskov · February 11, 2009, 3:29pm

Yes, I can reproduce the problem on my side. I think you should ask this question in Aspose.Editor forum. Our colleagues will answer you shortly.
Have you tried to open this document without processing with code that removes fields and bookmarks? Can Aspose.Editor open it?
Best regards.

vhattarki · February 13, 2009, 12:27pm

Hello,
I actually tried to save large document attached to earlier post to TIFF as image and received error below.

System.Runtime.InteropServices.ExternalException**: A generic error occurred in GDI+. at System.Drawing.Image.SaveAdd(Image image, EncoderParameters encoderParams) at ڲ.ᫀ.᫈(Image ᫅, ᫗ ᫉) at ≠.䀂.䀈(Int32 Ԝ, Int32 Ԟ, Stream Ԏ) at ≠.䀂.䀇(Int32 Ԝ, Int32 Ԟ, Stream Ԏ, ImageFormat ڷ, ImageOptions ᩀ) at Aspose.Words.Document.SaveToImage(Int32 pageIndex, Int32 pageCount, Stream stream, ImageFormat imageFormat, ImageOptions options) at aspose.Words.Document.SaveToImage(Int32 pageIndex, Int32 pageCount, String fileName, ImageOptions options) at Verdict.frmPrintMinuteAspose.ubtnDisplay_Click(Object sender, EventArgs e)

Error occured in code below while trying to save the object to Tiff. If I try saving the object as DOC that works fine.

Dim lObjImageOption As New Aspose.Words.Rendering.ImageOptions
lObjImageOption.TiffCompression = TiffCompression.None
mobjMainAsposeDoc.SaveToImage(0, mobjMainAsposeDoc.PageCount, System.IO.Path.GetTempPath & "PrintBySession.TIF", lObjImageOption)

Thanks,
Vinay Hattarki

alexey.noskov · February 13, 2009, 2:37pm

Hi

Thanks for your request. I managed to reproduce the problem and created new issue #7536 in our defect database. I will notify you as soon as it is fixed.
You can try using Lzw compression:

Document doc = new Document(@"Test168\Aspose+670+Pages+PrintBySession.Doc");
ImageOptions opt = new ImageOptions();
opt.TiffCompression = TiffCompression.Lzw;
doc.SaveToImage(0, doc.PageCount, @"Test168\out.tif", opt);

Best regards.

romank · February 13, 2009, 8:04pm

Hi Vinay,
Thank you for your interest in Aspose.Words and Aspose.Editor.
I hope that by now you have realized that these are two different products, supported and developed by two different teams. It will help greatly if you distinguish whether your question is about Words or Editor and post in the appropriate forums.
Another thing I wanted to mention is that 1200 and even 670 pages documents are quite big for our products in some use cases.
A good maximum size document for Aspose.Editor is probably around 100 pages at the moment. Same guideline applies to Aspose.Words rendering to PDF or TIFF at the moment. As we optimize performance more we will be able to process bigger document with ease. Other conversions in Aspose.Words that do not require rendering are capable of handling much bigger documents.
Converting a 670 document ot TIFF without compression probably puts tremendous amount of strain on the system. We use .NET TIFF codec to write images in the TIFF format. I am not sure if it keeps all data in memory before writing or not, quite possibly it does. If this is the case then keeping so many uncompressed TIFF images in memory will certainly kill the system.

romank · February 13, 2009, 8:21pm

Hi Vinay,
Let me point out that just converting 10 pages of your document to TIFF with No compression works fine, but creates a TIFF file that is 42mb in size!
Do you realize that converting all 670 pages will create a TIFF file that is about 2814mb in size? That is 2.8Gb. Is that what you want?
Given that the exception occurs in the .NET code and only when processing this extremely big file with no compression, I conclude that the .NET TIFF codec has some sort of a limitation and throws an exception. So I doubt we can fix this problem. I am therefore closing the issue.
Is it possible for you to specify compression? A black and white fax ccitt3 or ccitt4 compression will work here. It created only a 220kb file instead of 42mb for 10 pages.