.NET: XMP metadata property value containing < and > characters is set with escaped/encoded characters < and >

amytant · March 9, 2022, 12:15pm

We are attempting to use the Aspose.PDF .NET library to set XMP metadata in PDF and AI files. This generally works as expected. But when we set a value that contains < and > characters on a XMP metadata property (like RDF tags), then look directly at the XMP metadata packet within the file data (for example by opening it in Notepad++), we see the value that’s been written has these characters escaped/encoded as < and >. This is not the case when using the Aspose.Imaging .NET or Aspose.PSD .NET libraries to set XMP metadata in JPEG/PNG/TIFF/GIF or PSD/PSB file formats.

Example:
We set the value of dc:creator as <rdf:Seq><rdf:li>2021</rdf:li></rdf:Seq>
In the XMP metadata packet, we see: <dc:creator><rdf:Seq><rdf:li>Aprimo 2022 (C)</rdf:li></rdf:Seq></dc:creator>

We thought it might be related to setting this RDF data as a string when perhaps it should be a XmpValue array:

pdf.Metadata.Add(“dc:keywords”, new XmpValue(new XmpValue[] { new XmpValue(“<rdf:Seq><rdf:li>2021</rdf:li></rdf:Seq>”) } ));

But then we get the same result, except it’s enclosed by <rdf:Bag> </rdf:Bag> tags.

We could try adding the string values inside of the RDF tags in the original value we want to set to the array, which works like this:

pdf.Metadata.Add(“dc:keywords”, new XmpValue(new XmpValue[] { new XmpValue(“2021”), new XmpValue(“2022”) }));

and results in this value in the XMP metadata property:

<dc:keywords>
<rdf:Bag>
<rdf:li>2021</rdf:li>
<rdf:li>2022</rdf:li>
</rdf:Bag>
</dc:keywords>

In this case, the < and > characters aren’t escaped/encoded. But then we don’t have any control over the enclosing tags of the inserted value, it’s always <rdf:Bag> </rdf:Bag> when we want to be able to customize the RDF container (Bag/Alt/Seq, see semantic web - How rdf:Bag, rdf:Seq and rdf:Alt is different while using them? - Stack Overflow)

So the question is, is there an issue with how we are using the Aspose.PDF library to set XMP metadata, or is this a bug?

A notable difference between Aspose.PDF and Aspose.Imaging / Aspose.PSD is that with the latter two libraries, we are using the XmpPacketWrapper object. Aspose.PDF doesn’t have an equivalent XmpPacketWrapper object that we have access to or can use in the same way to manipulate the file’s XMP metadata packet.

We are using Aspose.PDF version 21.12.0. Release notes for the 22.x releases do not indicate that this is an issue that has been resolved in later versions.

Our implementation to set XMP metadata in Aspose.PDF:

var pdf = new Aspose.Pdf.Document(localInputPath);

var namespace = pdf.Metadata.GetNamespaceUriByPrefix(“dc”);
if (String.IsNullOrEmpty(namespace))
{
pdf.Metadata.RegisterNamespaceUri(“dc”, “DCMI: DCMI Metadata Terms”);
}

if (pdf.Metadata.ContainsKey(“dc:keywords”))
{
pdf.Metadata[“dc:keywords”] = “<rdf:Seq><rdf:li>2021</rdf:li></rdf:Seq>”;
}
else
{
pdf.Metadata.Add(“dc:keywords”, “<rdf:Seq><rdf:li>2021</rdf:li></rdf:Seq>”);
}

pdf.Save(localOutputPath);

asad.ali · March 9, 2022, 7:40pm

@amytant

Can you please share a sample PDF document along with generated output PDF at your end? We will test the scenario in our environment and address it accordingly.

amytant · March 9, 2022, 10:12pm

We’re seeing this with every PDF or AI file we try to set XMP metadata on. Here are sample input and PDF documents.
MultiColumn input.pdf (36.8 KB)
MultiColumn output.pdf (35.2 KB)

asad.ali · March 9, 2022, 11:36pm

@amytant

We have tested the scenario in our environment while using 22.2 version of the API and noticed the similar issue in the generated PDF output. Therefore, an issue as PDFNET-51477 has been logged in our issue tracking system. We will further look into its details and keep you posted with the status of its rectification. Please be patient and spare us some time.

We are sorry for the inconvenience.

amytant · June 2, 2022, 12:16pm

Can I ask what the current status of issue PDFNET-51477 is? When can we expect a fix to be released?

asad.ali · June 2, 2022, 4:47pm

@amytant

We are afraid that the earlier logged ticket has not been yet resolved due to other issues in the queue logged prior to it. The issues in free support model are resolved on a first come first serve basis and we will surely inform you once we have some updates in this regard. Please spare us some time.

We are sorry for the inconvenience.

amytant · November 3, 2022, 3:04pm

@asad.ali Could you provide an update about the status of PDFNET-51477?

We have been waiting for resolution on this issue since March 2022 and it’s still showing “Open” status.

When can we expect this issue to be fixed?

asad.ali · November 3, 2022, 7:38pm

@amytant

We are afraid that the investigation against this ticket could not get completed yet. As shared earlier, the issues in free support model are resolved on first come first serve basis unlike the paid support where they have highest priority and are resolved on urgent basis. Nevertheless, we have recorded your concerns and will consider them during ticket investigation. We will let you know via this forum thread once we have some news about ticket resolution or ETA.

We apologize for your inconvenience.

amytant · October 5, 2023, 9:51am

@asad.ali I see PDFNET-51477 is still on Open status, can we expect this issue to be resolved in the foreseeable future?

We have been waiting on this for 1.5 year already. This is a serious issue for us that limits our ability to use Aspose.PDF for updating XMP metadata on PDF and AI files.

asad.ali · October 5, 2023, 6:56pm

@amytant

We apologize for the inconvenience and the delay that you have been facing. We regret that the earlier logged ticket could not get resolved due to other pending issues in the queue. However, we have updated the issue priority and increased it to the next level. We will surely inform you once we have some news about ticket resolution or ETA. We again apologize for the inconvenience caused.

.NET: XMP metadata property value containing < and > characters is set with escaped/encoded characters &lt; and &gt;

.NET: XMP metadata property value containing < and > characters is set with escaped/encoded characters < and >