We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Word are cutting when we make HTML2DOCX

Hello,


We have an error when we make conversion html to docx. When we have end line in MsWord the word is cut.

Could you give me an advice for avoid words cutting?

Code Source :

public void htmltodocxtest()
{
ZipFile zip = new ZipFile(@“D:\Temp\80098285_002\80098285_002.zip”);
using (MemoryStream htmlDoc = new MemoryStream())
{
foreach (ZipEntry zipEntry in zip)
{
if (!zipEntry.IsDirectory)
{
byte[] ret = null;
Stream entryStream = zip.GetInputStream(zipEntry);
ret = new byte[zipEntry.Size];
entryStream.Read(ret, 0, ret.Length);

if (zipEntry.Name.ToUpper().EndsWith(".HTML") || zipEntry.Name.ToUpper().EndsWith(".HTM"))
{
htmlDoc.Write(ret, 0, ret.Length);
}
}
}

// Conversion of HTML file
if (htmlDoc != null)
{
htmlDoc.Position = 0;
Aspose.Words.LoadOptions opt = new Aspose.Words.LoadOptions();
Aspose.Words.DocumentBuilder builder = new Aspose.Words.DocumentBuilder();

opt.LoadFormat = Aspose.Words.LoadFormat.Html;
opt.Encoding = Encoding.UTF8;
Aspose.Words.Document doc = new Aspose.Words.Document(htmlDoc, opt);
builder.Document = doc;
builder.PageSetup.Orientation = Orientation.Portrait;
builder.PageSetup.PaperSize = PaperSize.A4;

//Save
doc.Save(@“D:\Temp\80098285_002\res.docx”, Aspose.Words.SaveFormat.Docx);
}
}
}


regards
Maxime


Hi Maxime,


Thanks for your inquiry. Please note that Aspose.Word mimics the same behavior as MS Word does. If you load the html in MS Word, you will get the same issue.

In your case, we suggest you please replace non-breaking space with normal space as shown below. Hope this helps you.

Document doc = new Document(MyDir + “80098285_002.html”);<o:p></o:p>

doc.Range.Replace(ControlChar.NonBreakingSpace, " ", false, false);

doc.FirstSection.PageSetup.Orientation = Orientation.Portrait;

doc.FirstSection.PageSetup.PaperSize = Aspose.Words.PaperSize.A4;

doc.Save(MyDir + "Out.docx");

Thanks for this advice