Color and the DOM

Hi,

I’m trying to determine which pages in existing PDF documents contain color. I have downloaded the latest Aspose.Pdf (I’m already using the unmerged .Pdf and .Kit) as I considered the DOM API a good candidate for this.

I thought the first port of call should be assessing the text font color. This is what have:

Aspose.Pdf.Document doc = new Aspose.Pdf.Document(_filePath);

colPages = 0;
bwPages = 0;

foreach (Page pg in doc.Pages)
{
TextFragmentAbsorber txtFrAbsorber = new TextFragmentAbsorber();
pg.Accept(txtFrAbsorber);
bool isColor = false;

foreach (TextFragment txtFrag in txtFrAbsorber.TextFragments)
{
if (!IsGreyScale(txtFrag.TextState.ForegroundColor))
{
isColor = true;
break;
}
}

if (isColor)
colPages++;
else bwPages++;
}

This works, but it’s slow. About 2.5s for a 25 page text document. After this we would also have to look at images and other objects that could contain color.

Is there are faster way walking the DOM using your library? Or a better way to achieve this color searching (other than creating images of each page and processing the pixels)?

Thanks

Hello Jai,

I am working over this query and will get back to you soon. We apologize for your inconvenience.

Hello Jai,

Sorry for delay in response.

The requirement to enhance the performance of component while determining the color inside PDF pages has been logged as PDFNEWNET-30921 in our issue tracking system. Our development team is looking into the details of this requirement and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for the delay and inconvenience.

Ok, thanks.

I am still looking for a solution for this, so let me know.

Hello Jai,


Thanks for your patience.

I am pleased to share that the feature for determining which page of the PDF contains colour is implemented and it will become available in our upcoming release version of Aspose.Pdf for .NET 6.5.0. Please note that in order to determine which pages of the document contain the background colour, Background property of the Page class can be used:

System.Drawing.Color bkPageColor = document.Pages[i].Background;

This code works correctly if only the page contains marked background, namely:
  • The document is Tagged PDF,
  • The page’s background is setted by means of Adobe Acrobat (Document -> Background -> Add/Replace…),
  • The page’s background is setted by means of Aspose.Pdf.Page::Background property. You can set the background colour of all the pages of PDF file using following code line. Document1.Background = System.Drawing.Color.Crimson;
  • The page’s background is setted by means of any application that marks background when setting it.

The issues you have found earlier (filed as 30921) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.

I’m also interested in the ability to detect color, but don’t know how the feature added in 6.5 is accessed - can anyone help?

George

Hi George,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

As the detailed shared by Nayyer over 348368, you can get / detect the color of the page by using Document.Pages[pageIndex].Background property. You may check the following code in this regard and also the details shared in the above mentioned post to get more details.

Aspose.Pdf.Document doc = new Aspose.Pdf.Document("input.pdf");

System.Drawing.Color color = doc.Pages[1].Background;

In case you are facing a different issue or have a different requirement, please share further details with us and we will check it and get back to you soon.

Sorry for the inconvenience,

Ah, OK - I think there’s some confusion over what ‘detect color’ means here! What I was looking for (and I believe the original poster was too) is to be able to determine whether the page contents as a whole (background, text, images etc.) contain any color elements, or whether everything is black and white.


Is there any way to achieve this requirement?

Thanks,

George

Hi George,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Are you interested in getting a color’s collection containing all the colors used on a particular page? Please share some more details and some sample PDF file to show your requirement to us. This will help us in supporting the exact required feature (if possible) as per your requirements.

We are very sorry for the inconvenience,

The requirement is this (and I assume it’s fairly common):

If a page contains any element (text, image, line etc.) that is any color other than black, convert it to a color image (e.g. JPEG), otherwise convert it to a bitonal image (e.g. TIF).

Obviously I could render the PDF to a color image first, count the colors in that, and then further convert to bitonal if necessary, but it would be nice to get the color data direct from the PDF if possible.

Hi George,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for sharing the details regarding your requirement.

I am afraid; currently Aspose.Pdf for .NET does not support your requested feature. For the sake of investigation and implementation, the feature has been registered in our issue tracking system with issue id: PDFNEWNET-34413. Our development team will further look into this feature and we will update your via this thread regarding the implementation.

We are very sorry for the inconvenience,

Hi George,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Please see the following code as per your requirement. The code segement has 2 parts,

· Check if page has color set operators which use only black & white colors (method HasOnlyBIColor);

· Check if all images on the page are bitonal (method HasOnlyBitonalImages).

The provided code snippet is not a generic code and you will need to adjust it as per your particular case.

class Program

{

static private bool HasOnlyBIColor(Page page)

{

foreach (Operator op in page.Contents)

if (op is Operator.SetColorOperator)

{

Operator.SetColorOperator opSC = op as Operator.SetColorOperator;

System.Drawing.Color color = opSC.getColor();

if (!((color.R == 0 && color.G == 0 && color.B == 0) ||

(color.R == 255 && color.G == 255 && color.B == 255)))

return false;

}

return true;

}

static private bool IsBitonalImage(XImage image)

{

MemoryStream ms = new MemoryStream();

image.Save(ms);

System.Drawing.Bitmap bmp = new System.Drawing.Bitmap(ms);

for (int j = 0; j < bmp.Height; j++)

for (int i = 0; i < bmp.Width; i++)

{

System.Drawing.Color color = bmp.GetPixel(i, j);

if (!((color.R == 255 && color.G == 255 && color.B == 255) ||

(color.R == 0 && color.G == 0 && color.B == 0)))

return false;

}

return true;

}

static private bool HasOnlyBitonalImages(Page page)

{

// return true if no images exist or all images are white

if (page.Resources.Images.Count == 0)

return true;

foreach (XImage image in page.Resources.Images)

if (!IsBitonalImage(image))

return false;

return true;

}

static Device GetProperDevice(Page page)

{

Device device;

if (HasOnlyBIColor(page) && HasOnlyBitonalImages(page))

device = new TiffDevice(new Resolution(100), new TiffSettings(ColorDepth.Format1bpp));

else

device = new JpegDevice(new Resolution(100), 100);

return device;

}

static void Main(string[] args)

{

License lic = new License();

lic.SetLicense(@"Aspose.Total.lic");

Document doc = new Document("34413.pdf");

foreach (Page page in doc.Pages)

{

Device device = GetProperDevice(page);

if (device is DocumentDevice)

doc.SendTo(device as DocumentDevice, page.Number, page.Number, string.Format("page{0}.tiff", page.Number));

else

page.SendTo(device as PageDevice, string.Format("page{0}.jpg", page.Number));

}

}

}

In case it does not fulfill your requirements, please share your template file and we will further check it and get back to you soon.

Sorry for the inconvenience,

That’s very useful - many thanks.

George

The issues you have found earlier (filed as PDFNEWNET-34413) have been fixed in Aspose.Pdf for .NET 7.5.0.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.

Is the resolution to PDFNEWNET-34413 something different than the code in post 422050, or is that considered the resolution for this issue?

Hi Barry,


Thanks for you inquiry. We have shared the same sample code in 422050 post that was suggested in PDFNEWNET-34413 ticket as a resolution. If it doesn’t meet your requirements, you can raise your query as new post in Aspose.Pdf forum. We will be more than happy to help you.

Best Regards,