Hi,
I use Aspose word to manipulate a word file and then try to extract the manipulated text from that file using
document.ToString(SaveFormat.Text)
But it removes all the spaces. I need to keep spaces in the text when extracted. Is there anyway to do this?
@zhinous.shokrani Could you please attach your input, output and expected output documents here for our reference? We will check the issue and provide you more information.
The files that I try( I have issue with docx one, not preserving spaces when reading as text:
input2.pdf (1.5 MB)
input2.docx (71.3 KB)
Aspose.Words.Document document = new Aspose.Words.Document(@"c:\temp\input2.docx");
var text = document.ToString(Aspose.Words.SaveFormat.Text).Replace("\r", "");
I get this output(page 1 copied here) when I try with above code:
ONTARIO[SEAL]Court File Number
(Name of court)atForm 8: Application (General)Court office addressApplicant(s)Applicant(s) LawyerFull legal name:Name:Address:Address:Phone & fax:Phone & fax:Email:Email:Respondent(s)Respondent(s) LawyerFull legal name:Name:Address:Address:Phone & fax:Phone & fax:Email:Email:TO THE RESPONDENT(S):A COURT CASE HAS BEEN STARTED AGAINST YOU IN THIS COURT. THE DETAILS ARE SET OUT ON THE ATTACHED PAGES.THE FIRST COURT DATE IS (date)ATa.m. p.m. or as soon as possible after that time, at: (address)
NOTE: If this is a divorce case, no date will be set unless an Answer is filed. If you have also been served with a notice of motion, there may be an earlier court date and you or your lawyer should come to court for the motion.THIS CASE IS ON THE FAST TRACK OF THE CASE MANAGEMENT SYSTEM. A case management judge will be assigned by the time this case first comes before a judge.THIS CASE IS ON THE STANDARD TRACK OF THE CASE MANAGEMENT SYSTEM. No court date has been set for this case but, if you have been served with a notice of motion, it has a court date and you or your lawyer should come to court for the motion. A case management judge will not be assigned until one of the parties asks the clerk of the court to schedule a case conference or until a motion is scheduled, whichever comes first.IF, AFTER 365 DAYS, THE CASE HAS NOT BEEN SCHEDULED FOR TRIAL, the clerk of the court will send out a warning that the case will be dismissed within 60 days unless the parties file proof that the case has been settled or one of the parties asks for a case or a settlement conference. IF YOU WANT TO OPPOSE ANY CLAIM IN THIS CASE, you or your lawyer must prepare an Answer (Form 10 – a blank copy should be attached), serve a copy on the applicant(s) and file a copy in the court office with an Affidavit of Service (Form 6B). YOU HAVE ONLY 30 DAYS AFTER THIS APPLICATION IS SERVED ON YOU (60 DAYS IF THIS APPLICATION IS SERVED ON YOU OUTSIDE CANADA OR THE UNITED STATES) TO SERVE AND FILE AN ANSWER. IF YOU DO NOT, THE CASE WILL GO AHEAD WITHOUT YOU AND THE COURT MAY MAKE AN ORDER AND ENFORCE IT AGAINST YOU.
While If I try with pdf version of the form I get all the spaces:
Aspose.Pdf.Document document = new Aspose.Pdf.Document(@"c:\temp\input2.pdf");
var textAbsorber = new TextAbsorber(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure));
document.Pages.Accept(textAbsorber);
var text = textAbsorber.Text.Replace("\r", "");
Output for PDF has spaces:
ONTARIO
Court File Number
(Name of court)
Form 6B: Affidavit of Service
at sworn/affirmed
Court office address
Applicant(s)
Full legal name & address for service — street & number, municipality, Lawyer’s name & address — street & number, municipality, postal
postal code, telephone & fax numbers and e-mail address (if any). code, telephone & fax numbers and e-mail address (if any).
Respondent(s)
Full legal name & address for service — street & number, municipality, Lawyer’s name & address — street & number, municipality, postal
postal code, telephone & fax numbers and e-mail address (if any). code, telephone & fax numbers and e-mail address (if any).
My name is (full legal name)
I live in (municipality & province)
and I swear/affirm that the following is true:
1. On (date) , at (time) , I served (name of person to be served)
with the following document(s) in this case:
Date when document signed, issued,
Name of document Author (if applicable)
sworn, etc.
List the
documents
served
NOTE: You can leave out any part of this form that is not applicable.
@zhinous.shokrani Your document contains tables. So please try using TxtSaveOptions.PreserveTableLayout option:
Document doc = new Document(@"C:\Temp\in.docx");
TxtSaveOptions opt = new TxtSaveOptions();
opt.PreserveTableLayout = true;
string text = doc.ToString(opt);
1 Like
Thank you.
It’s good now when saving with table layout.
1 Like