Words for .NET output questions

doc-gen format error.zip (2.1 MB)

Please see the questions below from the client and help, thank you !

We are currently experiencing issues with Aspose when converting Excel and Word files to PDF, specifically differences in styling and layout distortion.

Please refer to the two attached files within the Excel and Word data for comparison:

  • OutPut_Expected.png: This shows the expected output.
  • OutPut_Aspose.png: This shows the output generated by Aspose.

We expect the files converted by Aspose to match the format and styling of OutPut_Expected.png identically. We have also included the conversion code snippet for your reference.

@saphira_linksoft_com_tw

Cause

The differences in styling and layout distortion when converting Excel and Word files to PDF using Aspose can arise from various factors, including font availability, document structure, and specific formatting settings in the source files.

Solution

To address these issues, consider the following steps:

  1. Check Font Availability: Ensure that all fonts used in the original documents are installed on the server where the conversion is taking place. Missing fonts can lead to unexpected styling changes.

  2. Use the Latest Version: Make sure you are using the latest version of Aspose.Words and Aspose.Cells. Updates often include bug fixes and improvements related to document conversion.

  3. Adjust Layout Options: Review and adjust the layout options in your conversion code. For example, you can set the PdfSaveOptions to preserve the layout more accurately.

  4. Test with Simplified Documents: If possible, create simplified versions of your documents to isolate the issue. This can help identify specific elements causing the distortion.

Code

Here’s a sample code snippet to convert a Word document to PDF while preserving layout:

Document doc = new Document("input.docx");
PdfSaveOptions options = new PdfSaveOptions();
options.SaveFormat = SaveFormat.Pdf;
options.PrettyFormat = true; // Optional: for better readability
doc.Save("output.pdf", options);

For Excel files, you can use:

Workbook workbook = new Workbook("input.xlsx");
workbook.Save("output.pdf", SaveFormat.Pdf);

Next Steps

  1. Implement the suggested changes and test the conversion again.
  2. If issues persist, please share the specific code snippet you are using for conversion, along with any additional details about the documents.
  3. Review the output files closely to identify any specific elements that are not rendering as expected.

Please proceed with these steps, and let me know if you need further assistance!

@saphira_linksoft_com_tw I cannot reproduce the problem on my side using the latest 25.7 version of Aspose.Words. Here is PDF document produced by MS Word on my side:
ms.pdf (155.0 KB)

Here is the output produced by Aspose.Words and the following simple code:

Document doc = new Document(@"C:\Temp\in.docx");
doc.Save(@"C:\Temp\out.pdf");

out.pdf (62.1 KB)

Though, the output differs from MS Word output a bit. To get the output closer to MS Word it is required to enable open type features. Aspose.Words.Shaping.Harfbuzz package provides support for OpenType features in Aspose.Words using the HarfBuzz text shaping engine. You should enabling open type features to get the expected result. To achieve this you should add reference to Aspose.Words Shaping Harfbuzz plugin and use the following code to convert your document:

Document doc = new Document(@"C:\Temp\in.docx");
doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
doc.Save(@"C:\Temp\out_HarfBuzz.pdf");

out_HarfBuzz.pdf (68.2 KB)

In addition, the problem can occur on your side because fronts used in your original document are not available in the environment where the document is converted to PDF. The fonts are required to build document layout. If Aspose.Words cannot find the font used in the document, the font is substituted . This might lead into fonts mismatch and document layout differences due to the different fonts metrics. You can implement IWarningCallback to get notifications when font substitution is performed.
Please see our documentation to learn where Aspose.Words looks for fonts:
https://docs.aspose.com/words/net/specifying-truetype-fonts-location/

I will move the topic into Aspose.Total category, My colleagues from Aspose.Cells will answer the question about Excel to PDF conversion.

@saphira_linksoft_com_tw,

I reviewed the issue using the sample Excel file you provided regarding the Aspose.Cells functionality for Excel to PDF conversion. It appears that the “Rows to repeat at top” option (found under the “Page Setup | Sheet” tab) is configured for a specific range of rows. As a result, some title rows are being repeated on the second page. Additionally, certain content using the “Leelawadee UI” font in some cells is not being displayed completely or properly. This occurs because the relevant rows and columns are set to “automatic.” When the file is opened in MS Excel, the auto-fit rows/columns option is applied to ensure the content is fully displayed. To address these issues, kindly consider revising the relevant code segment:
i.e.,

....
var ws = workbook.Worksheets[0];
ws.PageSetup.PrintGridlines = true;
var data = JsonConvert.DeserializeObject<Dictionary<string, string>>(json);

foreach (var pair in data)
{
        ws.Cells[pair.Key].PutValue(pair.Value ?? "");
}

with:

....
var ws = workbook.Worksheets[0];
ws.PageSetup.PrintGridlines = true;
ws.PageSetup.PrintTitleRows = null;
var data = Newtonsoft.Json.JsonConvert.DeserializeObject<Dictionary<string, string>>(json);

foreach (var pair in data)
{
      ws.Cells[pair.Key].PutValue(pair.Value ?? "");
}
ws.AutoFitColumns();    
ws.AutoFitRows();

Kindly add the suggested lines of code to your snippet, and it should work as expected. I tested it, and the output PDF file (attached) appears to be fine.
out1.pdf (59.2 KB)

Let us know if you still find any issue with Aspose.Cells APIs.

Please help the questions below from the client, thank you !

I don’t see support for the following syntax. Does this need to be installed separately?

I’ve already installed the corresponding fonts on Linux, and the layout is very close to what I expected. However, I still have the following issue (as shown in the attachment): Template_2.pdf (104.9 KB)

It should display ‘£ Applicable ¢ Non-applicable’ but instead it shows ‘▼’
‘Security over Immovable Property: £Applicable ¢Non-applica’"

@saphira_linksoft_com_tw Aspose.Words.Shaping.Harfbuzz package provides support for OpenType features in Aspose.Words using the HarfBuzz text shaping engine. You should enable open type features . To achieve this you should add reference to Aspose.Words Shaping Harfbuzz plugin and use the following code to convert your document:

Document doc = new Document(@"C:\Temp\in.docx");
doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
doc.Save(@"C:\Temp\out_HarfBuzz.pdf");

For Windows platforms no additional efforts are required for installing HarfBuzz because Aspose.Words.Shaping.Harfbuzz already includes compiled HarfBuzz library.

For other systems, Aspose.Words.Shaping.Harfbuzz relies on already installed HarfBuzz library. For instance, many Linux-based systems have HarfBuzz installed system-wide by default. If not, there is usually a package available for installing via package manager.

For example in the clear Ubuntu Docker image it is required to additionally install Harfbuzz using command like this:

RUN apt-get update && apt-get install -y libharfbuzz-dev

Good news: The bounding boxes are now appearing (see attachment for details on the regeneration results). Bad news: There are still slight discrepancies.

My current configuration has been fully adjusted according to your suggestions, as shown in
Template_0730_1.pdf (137.6 KB)

<PackageReference Include="Aspose.Cells" Version="25.6.0" />
<PackageReference Include="Aspose.Words" Version="25.7.0" />
<PackageReference Include="Aspose.Words.Shaping.HarfBuzz" Version="25.7.0" />

The program is configured as follows:

/// <summary>
/// Word文件轉PDF文件
/// </summary>
/// <param name="sourceStream">Word文件 Stream</param>
/// <returns>PDF文件 Stream</returns>
public static Stream Word2Pdf(Stream sourceStream)
{
    var fontSettings = new FontSettings();
    var doc = new Document(sourceStream);
    doc.FontSettings = fontSettings;
    doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
    sourceStream.Close();
    var destStream = new MemoryStream();
    doc.Save(destStream, SaveFormat.Pdf);
    destStream.Position = 0;
    return destStream;
}

The output boxes are appearing, but they are not what we expected.
Original sample:

Generated result:

Any update?
Thanks

@saphira_linksoft_com_tw Could you please provide the problematic input MS Word document here for testing? We will check conversion on our side and provide you more information.

Figure 1 shows a blank template. When inserting a new row of data to generate an Excel sheet, the new row does not automatically copy the formatting from the previous row (as shown in Figure 3).

If custom code is required to apply the cell styles (since it won’t automatically grab the style from the previous row), please let me know. Thank you!

Figure 1: Blank Template


Figure 2: Our Expected Result

Figure 3: The Actual Generated Excel Sheet, with no formatting

@saphira_linksoft_com_tw,

Thanks for the screenshots.

Aspose.Cells replicates the formatting and styles of MS Excel when new rows or records are added. We are unsure how you are adding or inserting rows into the template file’s sheet cells. Could you create a standalone VS.NET application (complete source code that compiles without errors and has no other dependencies), zip the project (you can exclude Aspose.Cells.Dll to reduce the size of the archive), and share it with us? We will check your issue soon. Additionally, please provide a sample Excel file that demonstrates your expected results.

Excel2PDF.zip (45.7 KB)
Please kindly see attachment.
Template.xlsx is a blank EXCELT template to which the template data will be inserted.
The data file .json will be written to Template.xlsx to generate the output. However, the expected format is “Document Output.xlsx.”

@saphira_linksoft_com_tw,

Thanks for the resource files.

I evaluated your issue thoroughly using your files and JSON data. You are not actually inserting rows but just pasting JSON data into existing cells into the worksheet. For your requirements, you have insert rows especially @ row number 24, so the styles/formatting should be copied automatically.

For your needs, you may insert rows accordingly for your desired data, e.g, insert 72 rows in total starting at row# 24. For example, please add a line (see the line in bold) to your code snippet, it will work fine and as expected.


var workbook = new Workbook(excelPath);
var ws = workbook.Worksheets[0];
ws.PageSetup.PrintGridlines = true;
var data = Newtonsoft.Json.JsonConvert.DeserializeObject<Dictionary<string, string>>(json);

//Insert (72) rows starting at 24th row
ws.Cells.InsertRows(23, 72);

int templateRow = 22;

foreach (var pair in data)
{
var cell = ws.Cells[pair.Key];
int row = cell.Row;
int col = cell.Column;
var rowHeight = ws.Cells.GetRowHeight(templateRow);
cell.PutValue(pair.Value ?? “”);
ws.Cells.SetRowHeight(row, rowHeight);

}

Hope, this helps a bit.

@saphira_linksoft_com_tw,

Moreover, to apply your custom formatting, i.e., “_-* #,##0.00_-;-* #,##0.00_-;_-* "-"??_-;_-@_-” properly for certain columns, you have to convert your JSON data to numeric type. You may just replace the line of code:

cell.PutValue(pair.Value ?? "");

with:

cell.PutValue(pair.Value ?? "",true);

it will work fine.