How to split a single large image into multiple image files

spothuganti · February 15, 2012, 4:39am

Hi AsposeTeam,

I have a image file which is very lengthy..Please see the attached file.

I want to know how to split this image file into multiple single imagefiles..so that these can fit into either a word or PDF Document which is of size A4/A3..etc.

I have used the code in the forum http://www.aspose.com/community/forums/339863/converting-large-images/showthread.aspx#339863

But still it generates the output in single page in the resultant word or PDF document.

Can you please help me to split this large image file into small images..

Thanks,

Siddi.

spothuganti · February 16, 2012, 10:55am

Hi Aspose Team,

The requirement of the project is to be able to print this image which is in a PDF file..Can you help me how to split this image into multiple documents..so that which are in a printable format.

Thanks,

Siddi.

tahir.manzoor · February 16, 2012, 12:42pm

Hi Siddi,

Thanks for your query. I regret to share with you that Aspose.Words does not provide the requested feature.

We apology for your inconvenience.

spothuganti · February 16, 2012, 2:14pm

Hi Tahir,

Thanks for your reply.

I want to know is it possible with Aspose.Pdf or with other dlls…to achieve this…

and

If not …is it not possible to include this in the future releases of Aspose.

And also Could you please suggest me any other ways of achieving this.

Thanks,

Siddi.

adam.skelton · February 16, 2012, 8:24pm

Hi Siddi,

Thanks for your inquiry.

Of course, this is possible using Aspose.Words. Please see the code below for a demonstration on how to achieve this.

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

Shape shape = builder.InsertImage(Image.FromFile(dataDir + "first.jpg"),
RelativeHorizontalPosition.Page, // Insert the image floating at the top left of the page.
0,
RelativeVerticalPosition.Page,
0,
doc.FirstSection.PageSetup.PageWidth, // Make the image fit the page width.
-1); // The height doesn't matter as we will split the image into separate parts.
builder.MoveToShape(shape, ShapePosition.Floating, shape.Width, shape.Height);

doc.Accept(new LargeImageSplitter());

doc.Save("Document Out.pdf");

public class LargeImageSplitter : DocumentVisitor
{
    public override VisitorAction VisitShapeStart(Shape shape)
    {
        if (shape.HasImage)
        {
            SplitImage(shape, shape.ParentParagraph.ParentSection.PageSetup);
            return VisitorAction.Continue;
        }
    }

    private void SplitImage(Shape shape, PageSetup pageSetup)
    {
        // Store the original height of shape.
        double origHeight = shape.ImageData.ImageSize.HeightPoints;
        double currentHeight = origHeight;

        // Get the available height on the page.
        double pageHeight = pageSetup.PageHeight - (pageSetup.TopMargin + pageSetup.BottomMargin);

        // If the height of this shape is bigger than the page then split it.
        if (currentHeight > pageHeight)
        {
            // At what ratio of the shape does it cut off at the end of the page. e.g. 0.5 if half of the image is on the first page, the other half in the second.
            double ratio = pageHeight / currentHeight;

            // Create a new bitmap from the image data.
            Bitmap bitmap = new Bitmap(shape.ImageData.ToImage());

            // Find the pixel height point where the image is to be cut. Ratio needs to be used in this case as the shape height is in points while the bitmap is in pixels.
            int heightToCrop = (int)(bitmap.Height * ratio);

            // Define which section of the original image should be used for the new shape on the next page. This is all of the image which is cut off at the end of the page.
            Rectangle cropRect = new Rectangle(0, heightToCrop, bitmap.Width, bitmap.Height - heightToCrop);

            // Crop the image at this point, create a clone of the original shape and insert it after the original shape.
            Image croppedImage = CropImage(bitmap, cropRect);
            Shape newShape = (Shape)shape.Clone(true);
            newShape.ImageData.SetImage((Image)croppedImage);

            // Repeat the process but this time for the original image on the preceding page. Crop this to the point where the page ends.
            bitmap = new Bitmap(shape.ImageData.ToImage());
            heightToCrop = (int)(bitmap.Height * ratio);
            cropRect = new Rectangle(0, 0, bitmap.Width, heightToCrop);
            Image topCroppedImage = CropImage(bitmap, cropRect);
            shape.ImageData.SetImage((Image)topCroppedImage);

            // The height of the original shape should now be the page height.
            shape.Height = pageHeight;

            // The shape on the next page should be the difference of the original height and what the original shape is now.
            newShape.Height = origHeight - shape.Height;

            // Create a new paragraph after the parent paragraph of the original shape and insert the new shape into it.
            // This new image should automatically be pushed to the next page.
            Paragraph parentPara = shape.ParentParagraph;
            Paragraph newPara = new Paragraph(shape.Document);
            newPara.AppendChild(newShape);
            parentPara.ParentNode.InsertAfter(newPara, parentPara);

            // If the original image is not inline then we need to separate the images with a section break.
            if (!shape.IsInline)
                parentPara.AppendChild(new Run(shape.Document, ControlChar.SectionBreak));

            // Visit the new shape.
            VisitShapeStart(newShape);
        }
    }
}

If you have any other queries, please feel free to ask.

Thanks,

spothuganti · February 21, 2012, 6:28am

Hi Adam,

I am able to split the image into multiple images and insert into the PDF.

It is working fine..But some text is cut in the middle and split into two pages which makes it unreadable...

So Can you tell me whether it is possible to avoid the splitting of a line of text present in the image..

Please see the input image file and the out put PDF file for more information.

Thanks,

Siddi.

awais.hafeez · February 21, 2012, 1:27pm

Hi Siddi,

Thanks for your inquiry. Unfortunately, Aspose.Words does not support reading/recognizing text in an image file. May be you can use Aspose.OCR as a character recognition component.

Best Regards,

adam.skelton · February 22, 2012, 4:20am

Hi Siddi,

Thanks for this additional information.

You’re correct, there is a small bug in the code. Please delete the following piece of code highlighted in red and it works as expected:

// Get the avaliable height on the page.
`double` pageHeight = pageSetup.PageHeight - (pageSetup.TopMargin + pageSetup.BottomMargin);

I hope you will have the issue resolved.
Thanks,

spothuganti · February 23, 2012, 3:56am

Hi Adam/Hafeez,

Thanks for looking into this.

By using the following statement to calculate the height of the page I am able to get the out put pdf with reduced Splitting of text across the pages.

double pageHeight = pageSetup.PageHeight;

Please see the attachment "OutPUTPDFS.rar".

But I have another image, for which the output pdf has the text split across two pages.

Please see the attached file "TextSplit.rar".In which the output PDF has the field "Desired Base Rate 40000" split across two pages.

In the previous reply,Hafeez mentioned using Aspose.OCR we can detect the characters.

I think based on this we can detect whether there are any characters present at the line where we are going to split the image and accordingly descrease or increase the height of the image to be cut ..

I dont know how to achieve this.

It would be helpful for me if you can explain me how to use Aspose.OCR to split the image more accurately and insert into the PDF.

Thanks,

Siddi.

awais.hafeez · February 23, 2012, 6:59am

Hi Siddi,

Thanks for the additional information. First of all, please note that Aspose.OCR supports BMP and TIFF image formats only. Secondly, to be able to get the starting point, width and height of text inside a BMP, please use the following code snippet:

static void Main(string[] args)

{

const string
resourceFileName = @“D:\2011.08.05 v1.1
Aspose.OCR.Resouces.zip”;

try

{

OcrEngine ocrEngine = new
OcrEngine();

ocrEngine.Image = ImageStream.FromFile(@“C:\temp\Sample.bmp”);

ocrEngine.Languages.AddLanguage(Language.Load(“english”));

ocrEngine.Config.NeedRotationCorrection = false;

ocrEngine.Config.UseDefaultDictionaries = true;

using (ocrEngine.Resource = new
FileStream(resourceFileName, FileMode.Open))

{

try

{

if (ocrEngine.Process())

{

foreach (IRecognizedTextPartInfo
part in ocrEngine.Text.PartsInfo)

{

Rectangle textBox = part.Box;

Console.WriteLine("X: "

textBox.X + "; Y: " + textBox.Y
“; Width: “ + textBox.Width + ”; Height: “ + textBox.Height + ”\n\n”);

}

catch (Exception
ex)

{

Console.WriteLine("Exception: " + ex.Message);

}

ocrEngine = null;

}

catch (Exception
ex)

{

Console.WriteLine("Exception:
" + ex.Message);

}

Console.ReadKey();

}

Moreover, you can download resource file (2011.08.05 v1.1 Aspose.OCR.Resouces.zip) from the following link:

https://releases.aspose.com/

Also, I have attached a sample BMP here for you to play with.

Best Regards,

spothuganti · February 24, 2012, 5:13am

Hi ,

I am unable to read the text Correctly using Aspose.OCR.

I am using the following code:

protected void Button1_Click(object sender, EventArgs e)
{

const string resourceFileName = @"D:\2011.08.05 v1.1 Aspose.OCR.Resouces.zip";

Aspose.OCR.License l = new Aspose.OCR.License();

l.SetLicense(@"C:\Jetstream\500\Resources\Aspose.OCR.lic");

try
{
OcrEngine ocrEngine = new OcrEngine();
ocrEngine.Image = ImageStream.FromFile(@"D:\Sample.bmp");

//ocrEngine.Image = ImageStream.FromFile(@"D:\Sample.tif");


ocrEngine.Languages.AddLanguage(Language.Load("english"));
ocrEngine.Config.NeedRotationCorrection = false;
ocrEngine.Config.UseDefaultDictionaries = true;


using (ocrEngine.Resource = new FileStream(resourceFileName, FileMode.Open))
{
try
{
if (ocrEngine.Process())
{
foreach (IRecognizedTextPartInfo part in ocrEngine.Text.PartsInfo)
{
Rectangle textBox = part.Box;

Response.Write("Detected Text:" + part.Text);
Response.Write("
X: " + textBox.X + "; Y: " + textBox.Y + "; Width: " + textBox.Width + "; Height: " + textBox.Height + "\n\n");
}
}
}
catch (Exception ex)
{
Response.Write("Exception: " + ex.Message);
}
}


ocrEngine = null;
}
catch (Exception ex)
{
Response.Write("Exception: " + ex.Message);
}


//Console.ReadKey();

}

For the Sample.bmp which you have attached I am getting the outupt text as “SiSAL”.
Detected Text:SiSAL
X: 1; Y: 1; Width: 274; Height: 129

ANd I tried the approach mentioned in the URL “https://docs.aspose.com/ocr/net/image-regions-extract/”
With this approach the output text is shown: SiSAL

Can you please verify the above code and suggest me if there is anything I am missing and let me know if you need any more information.

Thanks,
Siddi.

alexey.noskov · February 25, 2012, 8:26am

Hi

Thanks for your request. I will move your request into Aspose.OCR forum. Out colleagues will answer you shortly.

Best regards,

spothuganti · February 29, 2012, 7:02am

Hi ,

Is there any update on this.

The issue is : I am not able to read the text from an image using Aspose.OCR properly.

The input image contains the text : This is a text.
Upon using the code above, the detected text is: SiSAL instead of "This is a text".

ANd I am trying this in Windows server 2003.

Please let me know if you need more information.

Thanks,
Siddi.

muhammad.ijaz · February 29, 2012, 11:52pm

Hi Siddi,

Unfortunately, Aspose.OCR does not support the font sizes smaller than 28pt but this issue has been logged into our issue tracking system as OCR-29048. We will keep you updated on this issue in this thread. Sorry for the inconvenience.

Best Regards,

spothuganti · March 12, 2012, 12:41am

Hi Ijaz,

Thanks for looking into this.

I want to check one more approach..

Is it possible to detect the blank line using aspose.ocr ?

so that I can try to decrease the height I am trying to use to split the image as shown in the above code.
i.e., to decrement the variable "heightToCrop" value so that it will not cut the text in the middle and split this text into two pages.

Thanks,
Siddi.

muhammad.ijaz · March 14, 2012, 7:07am

Hi Siddi,

You cannot check the dimensions of blank line however you can check a specific block of image if it has some text or empty line e.g.

//Select the block to recognize text

`int` startX = 0, startY = 0, width = pageWidth, height = 5;
`IRecognitionBlock` rectangleBlock = Aspose.Ocr.RecognitionBlock.FromRectangle(startX, startY, width, height);
ocrEngine.AddRecognitionBlock(rectangleBlock);

Best Regards,