When I read RTF into a stream and save it as a PDF file, it works fine.
But if I save the same RTF into a stream as PDF, and either look at the result or save it as a PDF file, it is unreadable.
Snippet from correct PDF output:
%PDF-1.7
5 0 obj
<</Type /Page/Parent 3 0 R/Contents 6 0 R/MediaBox [0 0 612 792]/Resources<</Font<</FAAABA 10 0 R/FAAABD 13 0 R/FAAABG 16 0 R>>/XObject<</XC1 7 0 R>>>>/Group <</Type/Group/S/Transparency/CS/DeviceRGB>>>>
endobj
6 0 obj
<</Length 18 0 R/Filter /FlateDecode>>stream
xœ½ZksÛºý+øV{Æ¢ ¾éo~äÕi'V›Nsû"!½|]‚´£ßÝIQ¶$ëÊÓÎÍ$
From incorrect output:
%PDF-1.7\r\n5 0 obj\r\n<</Type /Page/Parent 3 0 R/Contents 6 0 R/MediaBox [0 0 612 792]/Resources<</Font<</FAAABA 10 0 R/FAAABD 13 0 R/FAAABG 16 0 R>>/XObject<</XC1 7 0 R>>>>/Group <</Type/Group/S/Transparency/CS/DeviceRGB>>>>\r\nendobj\r\n6 0 obj\r\n<</Length 18 0 R/Filter /FlateDecode>>stream\r\nx��Zksۺ\u0011�+�V{Ƣ\t��o~��i\u0012'V�Ns�\u0001\"!\u0011�|]���\u007f��\u0005IQ�$�����$\u0014\tb�g��\u001e��3\u001b��q�+�\u001d�\u0014�\u000
@rspiewak The provided snippet looks valid. Could you please provide your input RTF, output PDF and problematic output PDF? Also, please provide code that will allow us to reproduce the problem.
UnitTestData.pdf (2.0 MB)
UnitTestDataFromStream.pdf (3.6 MB)
Here is the input RTF as a string:
internal class TestData
{
internal static string GetRtfString()
{
string rtfString = @"
{\rtf\ansi\deff0{\fonttbl{\f0\froman Tms Rmn;}{\f1\fdecor \r\nSymbol;}{\f2\fswiss Helv;}}{\protect\v0 \li0\cf2\f0\fs18 Organization Name: \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 Organization Address: \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 Patient: UnitTestPatientName \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 Patient MRN: UnitTestExtPatientId \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 DOB: 7/13/2003 \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 Encounter ID: 9223372036854775807 \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 Encounter Date: 7/13/2023 \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 Gender: UnitTestGender \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 \f0\fs20\cf2\par\pard \v{\footer\protect\v0 \li0\cf2\f0\fs18 Patient MRN: UnitTestExtPatientId \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\
f0\fs18 Encounter ID: 9223372036854775807 \f0\fs20\cf2\par\pard \v\v0 \pard Page \b\chpgn of \b{\field{\*\fldinst NUMPAGES }}\par}\colortbl;\red0\green0\blue0;\r\n\red0\green0\blue255;\red0\green255\blue255;\red0\green255\\r\nblue0;\red255\green0\blue255;\red255\green0\blue0;\red255\\r\ngreen255\blue0;\red255\green255\blue255;}{\stylesheet{\fs20 \snext0Normal;}}\widoctrl\ftnbj \sectd\linex0\endnhere \pard\plain \fs20 This is plain text. This is plain text. This is pl
ain text. This is plain text. This is plain text. This is plain text. This is plain text. This is plain t
ext. This is plain text. This is plain text. This is plain text. This is plain text.
This is plain text. This is plain text. This is plain text. This is plain text. This
is plain text. This is plain text. This is plain text. This is plain text. This is plain text. This is pl
ain text. This is plain text. This is plain text. This is plain text. This is plain text. This is plain t
ext. This is plain text. This is plain text. This is plain text. This is plain text.
This is plain text. This is plain text. This is plain text. This is plain text. This
is plain text. This is plain text. This is plain text. This is plain text. This is plain text. This is pl
ain text. This is plain text. This is plain text. This is plain text. This is plain text. This is plain t
ext. This is plain text. This is plain text. This is plain text. This is plain text.
This is plain text. This is plain text. This is plain text. This is plain text. This
is plain text. This is plain text. This is plain text. This is plain text. This is plain text. This is pl
ain text. This is plain text. This is plain text. This is plain text. This is plain text. This is plain t
ext. This is plain text. This is plain text. This is plain text. This is plain text.
This is plain text. This is plain text. This is plain text. This is plain text. This
is plain text. This is plain text. This is plain text. This is plain text. This is plain text. This is pl
ain text. This is plain text. This is plain text. This is plain text. This is plain text. This is plain t
ext. This is plain text. This is plain text. This is plain text. This is plain text.
This is plain text. This is plain text. This is plain text. This is plain text. \par}\r\n""
}";
return rtfString;
}
}
Here is the test code that produces both PDF outputs. One is from the stream conversion, one from saving to a file:
string testString = TestData.GetRtfString();
Stream stream = StringsAndStreams.GetStreamFromString(testString);
document = new(stream, loadOptions);
MemoryStream outputStream = new();
PdfSaveOptions saveOptions = new() { SaveFormat = SaveFormat.Pdf, EmbedFullFonts = true };
document.Save(outputStream, saveOptions);
outputStream.Position = 0;
string output = StringsAndStreams.GetStringFromStream(outputStream);
string outputFileName = "UnitTestData";
string outputFile = outputFileName + ".pdf";
string outputPath = Path.Combine(outputFolder, outputFile);
document.Save(outputPath, saveOptions);
outputFile = "UnitTestDataFromStream.pdf";
outputPath = Path.Combine(outputFolder, outputFile);
File.WriteAllText(outputPath, output);
@rspiewak PDF is a binary format and conversion to string damages it. You should use byte array instead of string:
MemoryStream outputStream = new();
PdfSaveOptions saveOptions = new() { SaveFormat = SaveFormat.Pdf, EmbedFullFonts = true };
document.Save(outputStream, saveOptions);
byte[] output = outputStream.ToArray();
outputFile = "UnitTestDataFromStream.pdf";
outputPath = Path.Combine(outputFolder, outputFile);
File.WriteAllBytes(outputPath, output);
A post was split to a new topic: Single developer license question
And one more - if I convert from RTF to RTF, it works and the output looks fine, but I seem to get a whole lot of extra data - fffff - in it if I look at the data in notepad. Is there something I’m missing here?
I’d like to use this to clean up RTF, but I’m not sure how to test it by comparing input and output!
Thanks!
@rspiewak I am afraid it is not always possible to preserve original RTF structure after open/save the document. When you open document using Aspose.Words, it is loaded into Aspose.Words DOM. Whole document is stored in memory as DOM objects hierarchy. After saving the document there is no warranty of preserving the original file internal representation, since file is built from DOM.
Thanks - as I said, the output works fine as RTF, cleaning up some format problems in the input.
Are there any options to reduce redundant size?
@rspiewak You can use RtfSaveOptions.ExportCompactSize property to minimize the output RTF file size.
Here’s what I see with a simple test case:
Input RTF:
internal static string GetRtfString()
{
string rtfString = @"
{\rtf\ansi\deff0{\fonttbl{\f0\froman Tms Rmn;}{\f1\fdecor \r\nSymbol;}{\f2\fswiss Helv;}}{\protect\v0 \li0\cf2\f0\fs18 Organization Name: \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 Organization Address: \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 Patient: UnitTestPatientName \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 Patient MRN: UnitTestExtPatientId \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 DOB: 7/13/2003 \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 Encounter ID: 9223372036854775807 \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 Encounter Date: 7/13/2023 \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 Gender: UnitTestGender \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\f0\fs18 \f0\fs20\cf2\par\pard \v{\footer\protect\v0 \li0\cf2\f0\fs18 Patient MRN: UnitTestExtPatientId \f0\fs20\cf2\par\pard \v\protect\v0 \li0\cf2\
f0\fs18 Encounter ID: 9223372036854775807 \f0\fs20\cf2\par\pard \v\v0 \pard Page \b\chpgn of \b{\field{\*\fldinst NUMPAGES }}\par}\colortbl;\red0\green0\blue0;\r\n\red0\green0\blue255;\red0\green255\blue255;\red0\green255\\r\nblue0;\red255\green0\blue255;\red255\green0\blue0;\red255\\r\ngreen255\blue0;\red255\green255\blue255;}{\stylesheet{\fs20 \snext0Normal;}}\widoctrl\ftnbj \sectd\linex0\endnhere \pard\plain \fs20 This is plain text. This is plain text. This is pl
ain text. This is plain text. This is plain text. This is plain text. This is plain text. This is plain t
ext. This is plain text. This is plain text. This is plain text. This is plain text.
This is plain text. This is plain text. This is plain text. This is plain text. This
is plain text. This is plain text. This is plain text. This is plain text. This is plain text. This is pl
ain text. This is plain text. This is plain text. This is plain text. This is plain text. This is plain t
ext. This is plain text. This is plain text. This is plain text. This is plain text.
This is plain text. This is plain text. This is plain text. This is plain text. This
is plain text. This is plain text. This is plain text. This is plain text. This is plain text. This is pl
ain text. This is plain text. This is plain text. This is plain text. This is plain text. This is plain t
ext. This is plain text. This is plain text. This is plain text. This is plain text.
This is plain text. This is plain text. This is plain text. This is plain text. This
is plain text. This is plain text. This is plain text. This is plain text. This is plain text. This is pl
ain text. This is plain text. This is plain text. This is plain text. This is plain text. This is plain t
ext. This is plain text. This is plain text. This is plain text. This is plain text.
This is plain text. This is plain text. This is plain text. This is plain text. This
is plain text. This is plain text. This is plain text. This is plain text. This is plain text. This is pl
ain text. This is plain text. This is plain text. This is plain text. This is plain text. This is plain t
ext. This is plain text. This is plain text. This is plain text. This is plain text.
This is plain text. This is plain text. This is plain text. This is plain text. \par}\r\n""
}";
return rtfString;
}
UnitTestData.7z (25.3 KB)
The attached file is the result of converting this. It expands quite a bit!
Any thoughts?
Thanks!
@rspiewak As I have mentioned, it is not always possible to preserve original RTF structure after open/save the document. If you open/save your RTF document using MS Word or OpenOffice, the RTF file size also will not be the same.