Docx to pdf conversion in c++

dani2496 · July 12, 2022, 3:45pm

Hello! I have a problem: when converting one docx file to pdf , the pdf changes the layout. I know that DOCX layout will change based on the output page size and margins. Is there a way to adjust these to match what is viewed in word?

alexey.noskov · July 12, 2022, 7:27pm

@dani2496 In most cases layout issues are caused by font substitution performed by Aspose.Words when it cannot find the fonts used in the document. You can implement IWarningCallback to get notification when some font is substituted.
Also, could you please attach your input and output documents here for testing? We will check the issue and provide you more information.

dani2496 · July 14, 2022, 3:17pm

Thanks for your answer.

I have another question:

Does Aspose provide the ability to query all the elements on the page?
(like the paragraph, table information, etc?)

Regards.

alexey.noskov · July 14, 2022, 3:29pm

@dani2496 Sure you can access everything in the document. Please see our documentation to learn more about Aspose.Words Document Object Model. You can use DocumentVisitor, for example, to go through all document elements. But you should note, that MS Word documents are flow document and there is no concept of page. However, if you need to process content of some particular page you can use Document.ExtractPages method to extract a page as a separate document.

dani2496 · July 14, 2022, 4:01pm

I have used the code from your documentation to get field information:

class FieldVisitor : public DocumentVisitor
{
public:
FieldVisitor()
{
mBuilder = MakeObject<System::Text::StringBuilder>();
}

String GetText()
{
return mBuilder->ToString();
}

VisitorAction VisitFieldStart(SharedPtr<FieldStart> fieldStart) override
{
mBuilder->AppendLine(String(u"Found field: ") + System::ObjectExt::ToString(fieldStart->get_FieldType()));
mBuilder->AppendLine(String(u"\tField code: ") + fieldStart->GetField()->GetFieldCode());
mBuilder->AppendLine(String(u"\tDisplayed as: ") + fieldStart->GetField()->get_Result());

return VisitorAction::Continue;
}

VisitorAction VisitFieldSeparator(SharedPtr<FieldSeparator> fieldSeparator) override
{
mBuilder->AppendLine(String(u"\tFound separator: ") + fieldSeparator->GetText());

return VisitorAction::Continue;
}

VisitorAction VisitFieldEnd(SharedPtr<FieldEnd> fieldEnd) override
{
mBuilder->AppendLine(String(u"End of field: ") + System::ObjectExt::ToString(fieldEnd->get_FieldType()));

return VisitorAction::Continue;
}

private:
SharedPtr<System::Text::StringBuilder> mBuilder;
};

void FieldCollection_()
{
auto doc = MakeObject<Document>(u"../Input.docx");

SharedPtr<FieldCollection> fields = doc->get_Range()->get_Fields();

// Iterate over the field collection, and print contents and type
// of every field using a custom visitor implementation.
auto fieldVisitor = MakeObject<FieldVisitor>();

{
SharedPtr<System::Collections::Generic::IEnumerator<SharedPtr<Field>>> fieldEnumerator = fields->GetEnumerator();
while (fieldEnumerator->MoveNext())
{
if (fieldEnumerator->get_Current() != nullptr)
{
fieldEnumerator->get_Current()->get_Start()->Accept(fieldVisitor);
if (fieldEnumerator->get_Current()->get_Separator() != nullptr)
{
fieldEnumerator->get_Current()->get_Separator()->Accept(fieldVisitor);
}
fieldEnumerator->get_Current()->get_End()->Accept(fieldVisitor);
}
else
{
std::cout << "There are no fields in the document." << std::endl;
}
}
}

std::cout << fieldVisitor->GetText() << std::endl;
}

but all the paragraph info, and the table info is missing.

When running the code, this is all that I get:

Found field: FieldPage
Field code: PAGE * MERGEFORMAT
Displayed as: 2
Found separator: ¶
End of field: FieldPage
Found field: FieldPage
Field code: PAGE * MERGEFORMAT
Displayed as: 2
Found separator: ¶
End of field: FieldPage
Found field: FieldPage
Field code: PAGE * MERGEFORMAT
Displayed as: 12
Found separator: ¶
End of field: FieldPage
Found field: FieldPage
Field code: PAGE * MERGEFORMAT
Displayed as: 13
Found separator: ¶
End of field: FieldPage

alexey.noskov · July 14, 2022, 4:13pm

@dani2496 In your visitor implementation you override only VisitFieldStart, VisitFieldSeparator and VisitFieldEnd as a result you process only FieldStart, FieldSeparator and FieldEnd nodes in your document. Try using the example provided in this article (DocStructureToText):
https://reference.aspose.com/words/cpp/class/aspose.words.document_visitor
You might note there is a VisitXXX method for each type of document’s nodes.

dani2496 · July 14, 2022, 4:19pm

Using the sample code from: returned better results.

Thank you!

dani2496 · July 22, 2022, 8:29pm

Thank you very much. It has worked as expected.

I have another question: to use aspose library, it is a must to use Studio 2017 or newer versions, or you can use Visual Studio 2012 or older version(do you use c++14 standard or newer )

Regards

alexey.noskov · July 23, 2022, 4:49am

@dani2496 Aspose.Words for C++ can be used to develop applications in any development environment which supports Microsoft Visual Studio v142 Platform Toolset. But VS 2017 and VS 2019 are explicitly supported. See our documentation for mote information.
Also starting from the most recent 22.7 version VS 2022 is also explicitly supported.

dani2496 · July 26, 2024, 1:16pm

Hi Alexey,

I have a question: for producing pdf/ua is it enough just to set the compliance to pdf/ua before saving the document as pdf, or is there any other requirements.

Thanks,

Dan

alexey.noskov · July 26, 2024, 1:24pm

@dani2496 Yes, it is enough to set pdf/ua compliance. Also, it might be require to set document title in document built-in document properties, since document title might be empty and it is required for accessibility.