Break PDF into Hierarchical Sections

Hello,

How can I use Aspose.PDF for Python via .NET to break a document into its Hierarchical Sections and then extract text in paragraphs and tables in order within each of these hierchical sections?

Maybe aspose.pdf.structure | Aspose.PDF for Python via .NET API Reference or something else.

Thanks!

Hello,

How can I use Aspose.PDF for Python via .NET to break a document into its Hierarchical Sections and then extract text in paragraphs and tables in order within each of these hierchical sections?

Maybe aspose.pdf.structure | Aspose.PDF for Python via .NET API Reference or something else.

Thanks!

@ln22

Would you kindly share your sample PDF along with some details of your expected output? We will test the scenario in our environment and address it accordingly.

EU_proposed_AI_regulation_40_Pages.pdf (977.1 KB)

I would want every header of each section isolationed and then know what text is within it along with hierarchy of headers.

Expected kinda like this (would be entire doc and not miss any text in doc):

{
“EXPLANATORY MEMORANDUM”: {
“1. CONTEXT OF THE PROPOSAL”: {
“1.1. Reasons for and objectives of the proposal”: [
“This explanatory memorandum accompanies the proposal for a Regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Artificial Intelligence (AI) is a fast evolving family of technologies that can bring a wide array of economic and societal benefits across the entire spectrum of industries and social activities. By improving prediction, optimising operations and resource allocation, and personalising service delivery, the use of artificial intelligence can support socially and environmentally beneficial outcomes and provide key competitive advantages to companies and the European economy. Such action is especially needed in high-impact sectors, including climate change, environment and health, the public sector, finance, mobility, home affairs and agriculture. However, the same elements and techniques that power the socio-economic benefits of AI can also bring about new risks or negative consequences for individuals or the society. In light of the speed of technological change and possible challenges, the EU is committed to strive for a balanced approach. It is in the Union interest to preserve the EU’s technological leadership and to ensure that Europeans can benefit from new technologies developed and functioning according to Union values, fundamental rights and principles.”,
“This proposal delivers on the political commitment by President von der Leyen, who announced in her political guidelines for the 2019-2024 Commission “A Union that strives for more”1, that the Commission would put forward legislation for a coordinated European approach on the human and ethical implications of AI. Following on that announcement, on 19 February 2020 the Commission published the White Paper on AI - A European approach to excellence and trust2. The White Paper sets out policy options on how to achieve the twin objective of promoting the uptake of AI and of addressing the risks associated with certain uses of such technology. This proposal aims to implement the second objective for the development of an ecosystem of trust by proposing a legal framework for trustworthy AI. The proposal is based on EU values and fundamental rights and aims to give people and other users the confidence to embrace AI-based solutions, while encouraging businesses to develop them. AI should be a tool for people and be a force for good in society with the ultimate aim of increasing human well-being. Rules for AI available in the Union market or otherwise affecting people in the Union should therefore be human centric, so that people can trust that the technology is used in a way that is safe and compliant with the law, including the respect of fundamental rights. Following the publication of the White Paper, the Commission launched a broad stakeholder consultation, which was met with a great interest by a large number of stakeholders who were largely supportive of regulatory intervention to address the challenges and concerns raised by the increasing use of AI.”
]
},
“2. LEGAL BASIS, SUBSIDIARITY AND PROPORTIONALITY”: {
“2.1. Legal basis”: [
“The legal basis for the proposal is in the first place Article 114 of the Treaty on the Functioning of the European Union (TFEU), which provides for the adoption of measures to ensure the establishment and functioning of the internal market.”
],
“2.2. Subsidiarity (for non-exclusive competence)”: [
“The proposal is based on the principle of subsidiarity. The objectives of the proposal cannot be sufficiently achieved by the Member States alone, but can be better achieved at Union level. The proposal aims to ensure a high level of protection of health and safety and of fundamental rights and freedoms of persons, while ensuring the proper functioning of the internal market.”
],
“2.3. Proportionality”: [
“The proposal is based on the principle of proportionality. The proposal does not go beyond what is necessary to achieve the objectives of the Treaty and the objectives of the Regulation.”
]
}
}
}

Pages from IPCC_AR6_WGII_Chapter03-2.pdf (8.7 MB)

I would want similar output from this pdf which the headers in the table of contents as a guide to split text with hierarchy intact.

@ln22

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFPYTHON-393

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.