HTML to RTF: extract partial fragment / string from document

I’m attempting to convert a fragment of HTML into RTF. I do not want an entire document, I just need the fragment converted. Currently I have the following:


Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.insertHtml(“

Paragraph right

” +
Implicit paragraph left” +
Div center
” +

Heading 1 left.

”);

doc.getFirstSection().getBody().getFirstChild().toString(SaveFormat.RTF)

Which is throwing a java.lang.IllegalStateException exception - Exporting fragments of a document in this format is not supported.

I’ve seen other forum posts attempting similar (Extract RTF from between two nodes) and there seems to be an issue logged for it - WORDSNET-9111 This appears to be why I cannot extract the fragment as RTF, but this issue is referred to in posts that are around four years old. Is this still a bug/missing feature? How can I extract some partial HTML into RTF (again, I do not want the entire document).

Thanks!

Hi,


Thanks for your inquiry. While using the latest version of Aspose.Words i.e. 17.4, we managed to reproduce this issue on our end. Your request has also been linked to the appropriate issue (WORDSNET-9111) and you will be notified as soon as it is supported. Sorry for the inconvenience.

Regarding WORDSNET-9111, this problem actually requires us to implement a new feature in Aspose.Words and we regret to share with you that implementation of this issue has been postponed for now. However, the fix of this problems may definitely come onto the product roadmap in the future. Unfortunately, we can not promise a resolution any time soon. We apologize for your inconvenience.

As a workaround, you can copy the selected content for which you want the RTF string to a blank Document and then save it to RTF format as follows:
<span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>
<span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>Document doc = <span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 0, 128); font-weight: bold;”>new <span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>Document();
<pre style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>DocumentBuilder builder = new DocumentBuilder(doc);

builder.insertHtml(

Paragraph right

+
Implicit paragraph left+
Div center
+

Heading 1 left.

);

Document temp = new Document();
temp.getFirstSection().getBody().appendChild(
temp.importNode(doc.getFirstSection().getBody().getFirstChild(),
true,
ImportFormatMode.KEEP_SOURCE_FORMATTING));

ByteArrayOutputStream baos = new ByteArrayOutputStream();
temp.save(baos, SaveFormat.RTF);

System.out.println(baos.toString(“UTF-8”));

Hope, this helps.

Best regards,

Unfortunately that seems to output more than I need… currently using iTextPDF I can extract the following:


\pard\plain\s0\qr\fi0\li0\ri0\plain\f0\fs24 Paragraph right\par\pard\plain\s0\fi0\li0\ri0\plain\f0\fs24\b Implicit paragraph left\b0\par\pard\plain\s0\qc\fi0\li0\ri0\plain\f0\fs24 Div center\par\pard\plain\s0\fi0\li0\ri0\plain\f0\fs48 Heading 1 left.\par

Whereas using your snippet, I get all this:

{\rtf1\ansi\ansicpg1252\uc0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deff0\adeff0{\fonttbl{\f0\froman\fcharset0\fprq2{*\panose 02020603050405020304}Times New Roman;}{\f1\froman\fcharset2\fprq2{*\panose 05050102010706020507}Symbol;}{\f2\fswiss\fcharset0\fprq2{*\panose 020b0604020202020204}Arial;}}{\colortbl;}{\stylesheet{\s0\snext0\sqformat\spriority0\aspalpha\aspnum\adjustright\ltrpar\li0\lin0\ri0\rin0\ql\faauto\rtlch\afs24\ltrch\fs24 Normal;}{*\cs10\additive\ssemihidden\spriority0 Default Paragraph Font;}{\s15\snext15\sqformat\spriority0\aspalpha\aspnum
\adjustright\ltrpar\li0\lin0\ri0\rin0\ql\faauto\rtlch\afs24\ltrch\fs24 Normal_0;}}{*\rsidtbl\rsid10976062}
{*\generator Aspose.Words for Java 15.10.0.0;}{\info\version1\edmins0\nofpages1\nofwords0\nofchars0\nofcharsws0}{\mmathPr\mbrkBin0\mbrkBinSub0\mdefJc1\mdispDef1\minterSp0\mintLim0\mintraSp0\mlMargin0\mmathFont0\mnaryLim1\mpostSp0\mpreSp0\mrMargin0\msmallFrac0\mwrapIndent1440\mwrapRight0}
\deflang1033\deflangfe2052\adeflang1025\jexpand\showxmlerrors1\validatexml1{*\wgrffmtfilter 013f}\viewkind1\viewscale100\fet0\ftnbj\aenddoc\ftnrstcont\aftnrstcont\ftnnar\aftnnrlc\widowctrl\nospaceforul\nolnhtadjtbl\alntblind\lyttblrtgr\dntblnsbdb\noxlattoyen
\wrppunct\nobrkwrptbl\expshrtn\snaptogridincell\asianbrkrule\htmautsp\noultrlspc\useltbaln\splytwnine\ftnlytwnine\lytcalctblwd\allowfieldendsel\lnbrkrule\nouicompat\nofeaturethrottle1\formshade\nojkernpunct\dghspace180\dgvspace180\dghorigin1800\dgvorigin1440\dghshow1\dgvshow1
\dgmargin\pgbrdrhead\pgbrdrfoot\sectd\sectlinegrid360\pgwsxn12240\pghsxn15840\marglsxn1800\margrsxn1800\margtsxn1440\margbsxn1440\guttersxn0\headery708\footery708\colsx708\ltrsect\pard\plain\itap0\s0\aspalpha\aspnum\adjustright\ltrpar\li0\lin0\ri0\rin0\ql\faauto\rtlch\afs24\ltrch\fs24{
\rtlch\afs24\ltrch\fs24\insrsid10976062\par}\pard\plain\itap0\s15\sa240\aspalpha\aspnum\adjustright\ltrpar\li0\lin0\ri0\rin0\qr\faauto\rtlch\afs24\ltrch\fs24{\rtlch\af0\alang1024\afs24\ltrch\fs24\lang1024\langnp1024\langfe1024\langfenp1024\f0\cs10 Paragraph right}
{\rtlch\af0\alang1024\afs24\ltrch\fs24\lang1024\langnp1024\langfe1024\langfenp1024\f0\cs10\par}{*\latentstyles\lsdstimax267\lsdlockeddef0\lsdsemihiddendef0\lsdunhideuseddef0\lsdqformatdef0\lsdprioritydef0{\lsdlockedexcept\lsdqformat1 Normal;\lsdqformat1 heading 1;\lsdsemihidden1\lsdunhideused1\lsdqformat1 heading 2;\lsdsemihidden1\lsdunhideused1\lsdqformat1 heading 3;
\lsdsemihidden1\lsdunhideused1\lsdqformat1 heading 4;\lsdsemihidden1\lsdunhideused1\lsdqformat1 heading 5;\lsdsemihidden1\lsdunhideused1\lsdqformat1 heading 6;\lsdsemihidden1\lsdunhideused1\lsdqformat1 heading 7;\lsdsemihidden1\lsdunhideused1\lsdqformat1 heading 8;
\lsdsemihidden1\lsdunhideused1\lsdqformat1 heading 9;\lsdsemihidden1\lsdunhideused1\lsdqformat1 caption;\lsdqformat1 Title;\lsdqformat1 Subtitle;\lsdqformat1 Strong;\lsdqformat1 Emphasis;\lsdsemihidden1\lsdpriority99 Placeholder Text;\lsdqformat1\lsdpriority1 No Spacing;
\lsdpriority60 Light Shading;\lsdpriority61 Light List;\lsdpriority62 Light Grid;\lsdpriority63 Medium Shading 1;\lsdpriority64 Medium Shading 2;\lsdpriority65 Medium List 1;\lsdpriority66 Medium List 2;\lsdpriority67 Medium Grid 1;\lsdpriority68 Medium Grid 2;
\lsdpriority69 Medium Grid 3;\lsdpriority70 Dark List;\lsdpriority71 Colorful Shading;\lsdpriority72 Colorful List;\lsdpriority73 Colorful Grid;\lsdpriority60 Light Shading Accent 1;\lsdpriority61 Light List Accent 1;\lsdpriority62 Light Grid Accent 1;\lsdpriority63 Medium Shading 1 Accent 1;
\lsdpriority64 Medium Shading 2 Accent 1;\lsdpriority65 Medium List 1 Accent 1;\lsdsemihidden1\lsdpriority99 Revision;\lsdqformat1\lsdpriority34 List Paragraph;\lsdqformat1\lsdpriority29 Quote;\lsdqformat1\lsdpriority30 Intense Quote;\lsdpriority66 Medium List 2 Accent 1;
\lsdpriority67 Medium Grid 1 Accent 1;\lsdpriority68 Medium Grid 2 Accent 1;\lsdpriority69 Medium Grid 3 Accent 1;\lsdpriority70 Dark List Accent 1;\lsdpriority71 Colorful Shading Accent 1;\lsdpriority72 Colorful List Accent 1;\lsdpriority73 Colorful Grid Accent 1;
\lsdpriority60 Light Shading Accent 2;\lsdpriority61 Light List Accent 2;\lsdpriority62 Light Grid Accent 2;\lsdpriority63 Medium Shading 1 Accent 2;\lsdpriority64 Medium Shading 2 Accent 2;\lsdpriority65 Medium List 1 Accent 2;\lsdpriority66 Medium List 2 Accent 2;
\lsdpriority67 Medium Grid 1 Accent 2;\lsdpriority68 Medium Grid 2 Accent 2;\lsdpriority69 Medium Grid 3 Accent 2;\lsdpriority70 Dark List Accent 2;\lsdpriority71 Colorful Shading Accent 2;\lsdpriority72 Colorful List Accent 2;\lsdpriority73 Colorful Grid Accent 2;
\lsdpriority60 Light Shading Accent 3;\lsdpriority61 Light List Accent 3;\lsdpriority62 Light Grid Accent 3;\lsdpriority63 Medium Shading 1 Accent 3;\lsdpriority64 Medium Shading 2 Accent 3;\lsdpriority65 Medium List 1 Accent 3;\lsdpriority66 Medium List 2 Accent 3;
\lsdpriority67 Medium Grid 1 Accent 3;\lsdpriority68 Medium Grid 2 Accent 3;\lsdpriority69 Medium Grid 3 Accent 3;\lsdpriority70 Dark List Accent 3;\lsdpriority71 Colorful Shading Accent 3;\lsdpriority72 Colorful List Accent 3;\lsdpriority73 Colorful Grid Accent 3;
\lsdpriority60 Light Shading Accent 4;\lsdpriority61 Light List Accent 4;\lsdpriority62 Light Grid Accent 4;\lsdpriority63 Medium Shading 1 Accent 4;\lsdpriority64 Medium Shading 2 Accent 4;\lsdpriority65 Medium List 1 Accent 4;\lsdpriority66 Medium List 2 Accent 4;
\lsdpriority67 Medium Grid 1 Accent 4;\lsdpriority68 Medium Grid 2 Accent 4;\lsdpriority69 Medium Grid 3 Accent 4;\lsdpriority70 Dark List Accent 4;\lsdpriority71 Colorful Shading Accent 4;\lsdpriority72 Colorful List Accent 4;\lsdpriority73 Colorful Grid Accent 4;
\lsdpriority60 Light Shading Accent 5;\lsdpriority61 Light List Accent 5;\lsdpriority62 Light Grid Accent 5;\lsdpriority63 Medium Shading 1 Accent 5;\lsdpriority64 Medium Shading 2 Accent 5;\lsdpriority65 Medium List 1 Accent 5;\lsdpriority66 Medium List 2 Accent 5;
\lsdpriority67 Medium Grid 1 Accent 5;\lsdpriority68 Medium Grid 2 Accent 5;\lsdpriority69 Medium Grid 3 Accent 5;\lsdpriority70 Dark List Accent 5;\lsdpriority71 Colorful Shading Accent 5;\lsdpriority72 Colorful List Accent 5;\lsdpriority73 Colorful Grid Accent 5;
\lsdpriority60 Light Shading Accent 6;\lsdpriority61 Light List Accent 6;\lsdpriority62 Light Grid Accent 6;\lsdpriority63 Medium Shading 1 Accent 6;\lsdpriority64 Medium Shading 2 Accent 6;\lsdpriority65 Medium List 1 Accent 6;\lsdpriority66 Medium List 2 Accent 6;
\lsdpriority67 Medium Grid 1 Accent 6;\lsdpriority68 Medium Grid 2 Accent 6;\lsdpriority69 Medium Grid 3 Accent 6;\lsdpriority70 Dark List Accent 6;\lsdpriority71 Colorful Shading Accent 6;\lsdpriority72 Colorful List Accent 6;\lsdpriority73 Colorful Grid Accent 6;
\lsdqformat1\lsdpriority19 Subtle Emphasis;\lsdqformat1\lsdpriority21 Intense Emphasis;\lsdqformat1\lsdpriority31 Subtle Reference;\lsdqformat1\lsdpriority32 Intense Reference;\lsdqformat1\lsdpriority33 Book Title;\lsdsemihidden1\lsdunhideused1\lsdpriority37 Bibliography;
\lsdsemihidden1\lsdunhideused1\lsdqformat1\lsdpriority39 TOC Heading;}}}

Hi,


Thanks for your inquiry. We will provide a functionality in Aspose.Words that will allow you to export individual Document Nodes to RTF format. The ID of this issue is WORDSNET-9111. We will inform you via this thread as soon as this feature will be available in future. We apologize for any inconvenience.

Best regards,