API Reference¶
Document¶
-
class
redstork.
Document
(file_name, password=None)¶ PDF document.
A
list
-like container of pages. Sample use:doc = Document('sample.pdf') print("Number of pages:', len(doc)) for key, value in doc.meta.items(): print(' ', key, ':', value)
-
__init__
(file_name, password=None)¶ Create new PDF Document object, from a file.
Parameters:
-
__getitem__
(page_index)¶ Returns
Page
at this index.Example:
doc = ... page = doc[0] # first page
Parameters: page_index (int) – zero-based page index Returns: Page
object
-
__len__
()¶ Returns number of pages in this document
-
__iter__
()¶ Iterate over the pages of this document
-
get_all_pages_size
()¶ get width and height of all pages, without loading each page
-
changed
¶ True is PDF was changed since teh load (or last save)
-
save
(filename)¶ Saves PDF file, resets
Document.changed()
to False
-
Page¶
-
class
redstork.
Page
(page, page_index, parent)¶ Represents page of a PDF file.
-
crop_box
¶ Page crop box.
-
media_box
¶ Page media box.
-
rotation
¶ Page rotation.
- 0 - no rotation
- 1 - rotated 90 degrees clock-wise
- 2 - rotated 180 degrees clock-wise
- 3 - rotated 270 degrees clock-wise
-
label
¶ Page label.
-
__len__
()¶ Number of objects on this page.
-
__getitem__
(index)¶ Get object at this index.
-
__iter__
()¶ Iterates over page objects.
-
flat_iter
()¶ Iterates over all non-container objects (Text, Image, Path).
-
render_to_buffer
(scale=1.0, rect=None)¶ Render page (or rectangle on the page) to memory (the pixel format is BGRx)
Parameters:
-
render
(file_name, scale=1.0, rect=None)¶ Render page (or rectangle on the page) as PPM image file.
Parameters:
-
PageObject¶
-
class
redstork.
PageObject
(obj, index, typ, parent)¶ -
OBJ_TYPE_TEXT
= 1¶ see
TextObject
-
OBJ_TYPE_PATH
= 2¶ see
PathObject
-
OBJ_TYPE_IMAGE
= 3¶ see
ImageObject
-
OBJ_TYPE_SHADING
= 4¶ see
ShadingObject
-
OBJ_TYPE_FORM
= 5¶ Common superclass of all page objects
-
type
= None¶ type of this object
-
matrix
= None¶ transformation matrix of this object
-
page
¶ Links back to the parent page
-
TextObject¶
-
class
redstork.
TextObject
(obj, index, typ, parent)¶ Represents a string of text on a page
-
font_size
= None¶ font size of this text object
-
matrix
= None¶ matrix for this page object
-
__len__
()¶ Number of items in this string
-
__getitem__
(index)¶ Returns item at this index.
Each item is a 3-tuple: (charcode, x, y).
-
__iter__
()¶ Iterates over items.
-
char_iter
()¶ Iterates over characters (skips kerns)
-
text_geometry_iter
()¶ Iterates over characters and returns character text and bounds
-
effective_font_size
¶ Returns effective (user-visible) font size
-
scale_y
¶ Returns Y-scale of text matrix transformation
-
scale_x
¶ Returns X-scale of text matrix transformation
-
skew
¶ Returns skew value of text matrix.
-
box
(x0, y0, x1, y1)¶ Computes bounding box after transformation with text matrix
-
Font¶
-
class
redstork.
Font
(font, parent)¶ Represents font used in a PDF file.
-
FLAGS_NORMAL
= 0¶ Normal font
-
FLAGS_FIXED_PITCH
= 1¶ Fixed pitch font
-
FLAGS_SERIF
= 2¶ Serif font
-
FLAGS_SYMBOLIC
= 4¶ Symbolic font
-
FLAGS_SCRIPT
= 8¶ Script font
-
FLAGS_NONSYMBOLIC
= 32¶ Non-symbolic font
-
FLAGS_ITALIC
= 64¶ Italic font
-
FLAGS_ALLCAP
= 65536¶ All-cap font
-
FLAGS_SMALLCAP
= 131072¶ Small-cap font
-
FLAGS_FORCE_BOLD
= 262144¶ Force-bold font
-
name
¶ Font name in the PDF document.
-
simple_name
¶ Font name without PDF-specific prefix.
-
flags
¶ Font flags.
-
weight
¶ Font weight.
-
is_vertical
¶ True for vertical writing systems (CJK)
-
id
¶ Tuple of (Object_id, Generation_id), identifying underlaying stream in PDF file
-
load_glyph
(charcode)¶ Load glyph, see
Glyph
Parameters: charcode (int) – the character code (see TextObject
)
-
__getitem__
(charcode)¶ Returns Unicode text of this character.
Parameters: charcode (int) - the character code (see TextObject
) –
-
is_editable
¶ True if font encoding can be changed
-
ImageObject¶
PathObject¶
ShadingObject¶
-
class
redstork.
ShadingObject
(obj, index, typ, parent)¶ Represents a shading object on a page.
FormObject¶
-
class
redstork.
FormObject
(obj, index, typ, parent)¶ Represents a form (XObject) on a page - a container of other page objects (used internally).
-
matrix
= None¶ matrix for this page object
-
form_matrix
= None¶ transformation matrix for contained objects
-
flat_iter
()¶ Iterates over all non-container objects in this form.
-