@Niit_deependra.khangarot,
Thank you for contacting support.
Sample code to get the page size of a PDF document:
[C#]
// import a PDF document
var document = new Document("blah.pdf");
// get width and height of the page by index
double width = document.Pages[1].Rect.Width;
double height = document.Pages[1].Rect.Height;
Document pdf = new Document(@"c:\temp\test_pdfextractor.pdf");
for (int i = 1; i <= pdf.Pages.Count; i++)
{
foreach (Aspose.Pdf.Text.Font font in pdf.Pages[i].Resources.Fonts)
Console.WriteLine(font.FontName);
}
Most of the our queries resolved but still we have few more:
Font style (Weight, Color, Size):
How to get font style for particular word?
We tried to get the color but it’s extracting different color.
Vertical text identification
Sample file is attached. There is a three lines of vertical text near figure 1.
How we can get the transformation information for the same?
Line height calculation
How we can calculate line height?
Word spacing
Word spacing can be extracted or not?
Output for different resolution
Is it possible to get extraction for different resolutions?
I have tried Aspose PDF to html and it’s output contains all the required information that means there is a way to get all this. May be we are not getting exact methods or properties.
I have tested the scenario of text extraction but I am afraid currently the API is not able to extract rotated text instances. For the sake of correction, I have logged it as PDFNET-43152 in our issue tracking system.
I am afraid the feature is currently not supported. However for the sake of implementation, I have logged it as PDFNET-43151. We will further look into the details of this problem and will keep you updated on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.
Please try using following code line.
textFragment.TextState.WordSpacing
Can you please share some further details regarding this requirement, so that we can reply accordingly.
@Deepsa,
You can get the original color of each text element as below:
[C#]
// Open document
Document pdfDocument = new Document(@"C:\Pdf\test213\Sample.pdf");
// Create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber();
// Accept the absorber for all the pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
// Get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
// Loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
Aspose.Pdf.Color color = textFragment.TextState.ForegroundColor;
//color.ToRgb();
Console.WriteLine("Text : {0} ", textFragment.Text);
}
The ToRgb() method of the Color class allows to convert the color into RGB. We have tested your PDF with the latest version 17.8 and could not find the issue of incorrect color codes. If this does not help, then kindly share your code snippet. We will investigate and share our findings with you.