想把文本格式的pdf文件直接转成图片格式的pdf文件(书签一并跟过来,点击可跳转),有没有比较简便的方法,不需要把每页pdf转成图片后再转为pdf。或者从文本格式的pdf文本中读取书签再写入图片格式的pdf文件中,谢谢。
请尝试使用以下代码段,并在我们这边检查使用它生成的附件输出文件。
Aspose.Pdf.Document document = new Aspose.Pdf.Document(dataDir + "文本格式PDF.pdf");
Document doc = new Document();
// Create PdfBookmarkEditor
Facades.PdfBookmarkEditor bookmarkEditor = new Facades.PdfBookmarkEditor();
bookmarkEditor.BindPdf(document);
// Extract bookmarks
Aspose.Pdf.Facades.Bookmarks bookmarks = bookmarkEditor.ExtractBookmarks();
NameValueCollection lstPages = new NameValueCollection();
foreach (Aspose.Pdf.Facades.Bookmark bookmark in bookmarks)
{
PngDevice device = new PngDevice(new Resolution(300));
MemoryStream stream = new MemoryStream();
device.Process(document.Pages[bookmark.PageNumber], stream);
var page = doc.Pages.Add();
Aspose.Pdf.Image img = new Image();
img.ImageStream = stream;
page.Paragraphs.Add(img);
lstPages.Add(bookmark.Title, page.Number.ToString());
}
OutlineItemCollection pdfOutline = new OutlineItemCollection(doc.Outlines);
pdfOutline.Title = lstPages.Keys[0];
// Set the destination page number
pdfOutline.Action = new GoToAction(doc.Pages[Convert.ToInt32(lstPages[0])]);
// Add bookmark in the document's outline collection.
for (int i = 1; i < lstPages.Count; i++)
{
OutlineItemCollection pdfChildOutline = new OutlineItemCollection(doc.Outlines);
pdfChildOutline.Title = lstPages.Keys[i];
pdfChildOutline.Italic = true;
pdfChildOutline.Bold = true;
pdfChildOutline.Action = new GoToAction(doc.Pages[Convert.ToInt32(lstPages[i])]);
// Add child bookmark in parent bookmark's collection
pdfOutline.Add(pdfChildOutline);
}
doc.Outlines.Add(pdfOutline);
doc.Save(dataDir + "output.pdf");
如果您需要进一步的帮助,请随时告诉我们。
output.pdf (3.2 MB)
@asad.ali
感谢回复。我试了一下,书签问题完美解决。
但转换出来的pdf文件不知道是不是pdf页面的问题,图像都变形了,而且原pdf文件页面有纵、横向,此代码转换后应该只有横向,这可能是变形的原因。刚接触aspose.pdf,不知道怎么改,能否麻烦帮忙再看看。谢谢。
您可以按以下方式更改图片添加代码,以生成正确的输出,并告诉我们您是否仍然遇到任何问题:
Aspose.Pdf.Image img = new Image();
img.ImageStream = stream;
page.PageInfo.Margin = new MarginInfo(0, 0, 0, 0);
page.PageInfo.Height = new Bitmap(stream).Height;
page.PageInfo.Width = new Bitmap(stream).Width;
page.Paragraphs.Add(img);
output.pdf (3.2 MB)
@asad.ali
感谢asad.ali ,图片输出变形的问题已解决。
倒是书签那个还有点小问题再麻烦您一下。
您给出的代码在处理只有一个一级书签+子书签时没有问题,如果有两个或两个以上一级书签带子书签时(见附件),
pdfChildOutline.Action = new GoToAction(doc.Pages[Convert.ToInt32(lstPages[i])])这行就会出现”输入字符串的格式不正确“,查了一下,似乎是lstPages[i]出现了负数,麻烦再帮我看看,万分感谢。报价合并文件(包).zip (30.7 KB)
请尝试使用以下代码段,如果需要进一步的帮助,请告诉我们。
Aspose.Pdf.Document document = new Aspose.Pdf.Document(dataDir + "文本格式PDF.pdf");
Document doc = new Document();
// Create PdfBookmarkEditor
Facades.PdfBookmarkEditor bookmarkEditor = new Facades.PdfBookmarkEditor();
bookmarkEditor.BindPdf(document);
// Extract bookmarks
Aspose.Pdf.Facades.Bookmarks bookmarks = bookmarkEditor.ExtractBookmarks();
List<NameValueCollection> lstPages = new List<NameValueCollection>();
foreach (Aspose.Pdf.Facades.Bookmark bookmark in bookmarks)
{
PngDevice device = new PngDevice(new Resolution(300));
MemoryStream stream = new MemoryStream();
device.Process(document.Pages[bookmark.PageNumber], stream);
var page = doc.Pages.Add();
Aspose.Pdf.Image img = new Image();
img.ImageStream = stream;
page.PageInfo.Margin = new MarginInfo(0, 0, 0, 0);
page.PageInfo.Height = new Bitmap(stream).Height;
page.PageInfo.Width = new Bitmap(stream).Width;
page.Paragraphs.Add(img);
NameValueCollection nvc = new NameValueCollection();
nvc.Add(bookmark.Title, page.Number.ToString());
lstPages.Add(nvc);
}
OutlineItemCollection pdfOutline = new OutlineItemCollection(doc.Outlines);
pdfOutline.Title = lstPages[0].Keys[0];
// Set the destination page number
pdfOutline.Action = new GoToAction(doc.Pages[Convert.ToInt32(lstPages[0][0])]);
// Add bookmark in the document's outline collection.
for (int i = 1; i < lstPages.Count; i++)
{
OutlineItemCollection pdfChildOutline = new OutlineItemCollection(doc.Outlines);
pdfChildOutline.Title = lstPages[i].Keys[0];
pdfChildOutline.Italic = true;
pdfChildOutline.Bold = true;
pdfChildOutline.Action = new GoToAction(doc.Pages[Convert.ToInt32(lstPages[i][0])]);
// Add child bookmark in parent bookmark's collection
pdfOutline.Add(pdfChildOutline);
}
doc.Outlines.Add(pdfOutline);
doc.Save(dataDir + "output.pdf");
@asad.ali
回复非常迅速,十分感谢。可能我的需求没描述清楚,第二个一级书签层级没调整过来。(即生成书签层级与原文本格式pdf一样)我在附件里发了对比,麻烦再帮忙看看,谢谢。pdf.png (29.0 KB)
经过进一步调查,我们能够准备更清洁的解决方案。请尝试使用以下代码段:
Aspose.Pdf.Document document = new Aspose.Pdf.Document(dataDir + "文本格式PDF.pdf");
Document doc = new Document();
// Create PdfBookmarkEditor
Facades.PdfBookmarkEditor bookmarkEditor = new Facades.PdfBookmarkEditor();
bookmarkEditor.BindPdf(document);
// Extract bookmarks
Aspose.Pdf.Facades.Bookmarks bookmarks = bookmarkEditor.ExtractBookmarks();
List<Aspose.Pdf.Facades.Bookmark> lstPages = new List<Aspose.Pdf.Facades.Bookmark>();
foreach (Aspose.Pdf.Facades.Bookmark bookmark in bookmarks)
{
PngDevice device = new PngDevice(new Resolution(300));
MemoryStream stream = new MemoryStream();
device.Process(document.Pages[bookmark.PageNumber], stream);
var page = doc.Pages.Add();
Aspose.Pdf.Image img = new Image();
img.ImageStream = stream;
page.PageInfo.Margin = new MarginInfo(0, 0, 0, 0);
page.PageInfo.Height = new Bitmap(stream).Height;
page.PageInfo.Width = new Bitmap(stream).Width;
page.Paragraphs.Add(img);
bookmark.PageNumber = page.Number;
bookmark.PageDisplay_Zoom = 0;
bookmark.PageDisplay_Top = Convert.ToInt32(page.PageInfo.Height);
bookmark.PageDisplay_Left = 0;
lstPages.Add(bookmark);
}
Facades.PdfBookmarkEditor bookmarkEditor2 = new Facades.PdfBookmarkEditor();
bookmarkEditor2.BindPdf(doc);
foreach (Aspose.Pdf.Facades.Bookmark bookmark in lstPages)
{
if (bookmark.Level == 1)
{
bookmarkEditor2.CreateBookmarks(bookmark);
}
}
doc.Save(dataDir + "output.pdf");
output.pdf (6.3 MB)
PDF较大时会出现内存不足的情况,我把Resolution(300)中的300改小点就没问题了。
其它的完美解决,多谢。
是的,您是对的,前面的代码段仅将加书签的页面转换为图像。 为了将每个页面转换为图像并复制书签,请使用以下代码:
Aspose.Pdf.Document document = new Aspose.Pdf.Document(dataDir + "文本格式.pdf");
Document doc = new Document();
// Create PdfBookmarkEditor
Facades.PdfBookmarkEditor bookmarkEditor = new Facades.PdfBookmarkEditor();
bookmarkEditor.BindPdf(document);
foreach(var targetpage in document.Pages)
{
PngDevice device = new PngDevice(new Resolution(300));
MemoryStream stream = new MemoryStream();
device.Process(targetpage, stream);
var page = doc.Pages.Add();
Aspose.Pdf.Image img = new Image();
img.ImageStream = stream;
page.PageInfo.Margin = new MarginInfo(0, 0, 0, 0);
page.PageInfo.Height = new Bitmap(stream).Height;
page.PageInfo.Width = new Bitmap(stream).Width;
page.Paragraphs.Add(img);
}
// Extract bookmarks
Aspose.Pdf.Facades.Bookmarks bookmarks = bookmarkEditor.ExtractBookmarks();
List<Aspose.Pdf.Facades.Bookmark> lstPages = new List<Aspose.Pdf.Facades.Bookmark>();
foreach (Aspose.Pdf.Facades.Bookmark bookmark in bookmarks)
{
bookmark.PageDisplay_Zoom = 0;
bookmark.PageDisplay_Top = Convert.ToInt32(doc.Pages[bookmark.PageNumber].PageInfo.Height);
bookmark.PageDisplay_Left = 0;
lstPages.Add(bookmark);
}
Facades.PdfBookmarkEditor bookmarkEditor2 = new Facades.PdfBookmarkEditor();
bookmarkEditor2.BindPdf(doc);
foreach (Aspose.Pdf.Facades.Bookmark bookmark in lstPages)
{
if (bookmark.Level == 1)
{
bookmarkEditor2.CreateBookmarks(bookmark);
}
}
doc.Save(dataDir + "output.pdf");
@asad.ali
谢谢您。
device.Process(targetpage, stream)这行会显示“错误 CS1503 参数 1: 无法从“object”转换为“Aspose.Pdf.Page”,
我把foreach (var targetpage in document.Pages)改为 foreach (Aspose.Pdf.Page targetpage in document.Pages)后正常运行。
但doc.Save(dfile)这行会出现System.ArgumentException:“Invalid image stream (内存不足。)”
的错误信息(测试文件见附件),麻烦再帮忙看看,谢谢。
顺便咨询一下,aspose.pdf有没有双层pdf功能,即图片格式pdf文件在adobe acrobat进行OCR操作后,可以复制其中的文本。文本格式.zip (652.9 KB)
根据我们的理解,您需要将不可搜索的PDF文件转换为可搜索的文档。 为此,请尝试在Tesseract中使用以下代码段。
Document doc = new Document("D:/Downloads/input.pdf");
doc.Convert(CallBackGetHocr);
doc.Save("E:/Data/pdf_searchable.pdf");
//********************* CallBackGetHocr method ***********************//
static string CallBackGetHocr(System.Drawing.Image img)
{
string dir = @"E:\Data\";
img.Save(dir + "ocrtest.jpg");
ProcessStartInfo info = new ProcessStartInfo(@"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe");
info.WindowStyle = ProcessWindowStyle.Hidden;
info.Arguments = @"E:\data\ocrtest.jpg E:\data\out hocr";
Process p = new Process();
p.StartInfo = info;
p.Start();
p.WaitForExit();
StreamReader streamReader = new StreamReader(@"E:\data\out.html");
string text = streamReader.ReadToEnd();
streamReader.Close();
return text;
}
是的。
我得琢磨一下您给的代码。十分感谢。