API Reference

Document

class redstork.Document(file_name, password=None)

PDF document.

A list-like container of pages. Sample use:

doc = Document('sample.pdf')
print("Number of pages:', len(doc))

for key, value in doc.meta.items():
    print('     ', key, ':', value)
__init__(file_name, password=None)

Create new PDF Document object, from a file.

Parameters:
  • file_name (str) – Name of PDF file
  • password (str) – File password (optional)
numpages = None

int – total number of pages

meta = None

dict – document meta info (Author, Title, etc)

fonts = None

dict – font collection (populated lazily as pages are parsed)

__getitem__(page_index)

Returns Page at this index.

Example:

doc = ...

page = doc[0]  # first page
Parameters:page_index (int) – zero-based page index
Returns:Page object
__len__()

Returns number of pages in this document

__iter__()

Iterate over the pages of this document

get_all_pages_size()

get width and height of all pages, without loading each page

changed

True is PDF was changed since teh load (or last save)

save(filename)

Saves PDF file, resets Document.changed() to False

Page

class redstork.Page(page, page_index, parent)

Represents page of a PDF file.

crop_box

Page crop box.

media_box

Page media box.

rotation

Page rotation.

  • 0 - no rotation
  • 1 - rotated 90 degrees clock-wise
  • 2 - rotated 180 degrees clock-wise
  • 3 - rotated 270 degrees clock-wise
label

Page label.

__len__()

Number of objects on this page.

__getitem__(index)

Get object at this index.

__iter__()

Iterates over page objects.

flat_iter()

Iterates over all non-container objects (Text, Image, Path).

render_to_buffer(scale=1.0, rect=None)

Render page (or rectangle on the page) to memory (the pixel format is BGRx)

Parameters:
  • scale (float) – scale to use (default is 1.0, which will assume that 1pt takes 1px)
  • rect (tuple) – optional rectangle to render. Value is a 4-tuple of (x0, y0, x1, y1) in PDF coordinates. if None, then page’s crop_box will be used for rendering.
render(file_name, scale=1.0, rect=None)

Render page (or rectangle on the page) as PPM image file.

Parameters:
  • file_name (str) – name of the output file
  • scale (float) – scale to use (default is 1.0, which will assume that 1pt takes 1px)
  • rect (tuple) – optional rectangle to render. Value is a 4-tuple of (x0, y0, x1, y1) in PDF coordinates. if None, then page’s crop_box will be used for rendering.

PageObject

class redstork.PageObject(obj, index, typ, parent)
OBJ_TYPE_TEXT = 1

see TextObject

OBJ_TYPE_PATH = 2

see PathObject

OBJ_TYPE_IMAGE = 3

see ImageObject

OBJ_TYPE_SHADING = 4

see ShadingObject

OBJ_TYPE_FORM = 5

Common superclass of all page objects

type = None

type of this object

matrix = None

transformation matrix of this object

page

Links back to the parent page

TextObject

class redstork.TextObject(obj, index, typ, parent)

Represents a string of text on a page

font = None

Font for this text object

font_size = None

font size of this text object

matrix = None

matrix for this page object

__len__()

Number of items in this string

__getitem__(index)

Returns item at this index.

Each item is a 3-tuple: (charcode, x, y).

__iter__()

Iterates over items.

char_iter()

Iterates over characters (skips kerns)

text_geometry_iter()

Iterates over characters and returns character text and bounds

effective_font_size

Returns effective (user-visible) font size

scale_y

Returns Y-scale of text matrix transformation

scale_x

Returns X-scale of text matrix transformation

skew

Returns skew value of text matrix.

box(x0, y0, x1, y1)

Computes bounding box after transformation with text matrix

Font

class redstork.Font(font, parent)

Represents font used in a PDF file.

FLAGS_NORMAL = 0

Normal font

FLAGS_FIXED_PITCH = 1

Fixed pitch font

FLAGS_SERIF = 2

Serif font

FLAGS_SYMBOLIC = 4

Symbolic font

FLAGS_SCRIPT = 8

Script font

FLAGS_NONSYMBOLIC = 32

Non-symbolic font

FLAGS_ITALIC = 64

Italic font

FLAGS_ALLCAP = 65536

All-cap font

FLAGS_SMALLCAP = 131072

Small-cap font

FLAGS_FORCE_BOLD = 262144

Force-bold font

name

Font name in the PDF document.

simple_name

Font name without PDF-specific prefix.

flags

Font flags.

weight

Font weight.

is_vertical

True for vertical writing systems (CJK)

id

Tuple of (Object_id, Generation_id), identifying underlaying stream in PDF file

load_glyph(charcode)

Load glyph, see Glyph

Parameters:charcode (int) – the character code (see TextObject)
__getitem__(charcode)

Returns Unicode text of this character.

Parameters:charcode (int) - the character code (see TextObject) –
is_editable

True if font encoding can be changed

__setitem__(charcode, text)

Updates font encoding.

Parameters:
  • charcode (int) – character code
  • text (str) – new text for this character code
Raises:

ReadOnlyEncodingError – if encoding is read-one (no “ToUnicode” map in the font dictionary)

Glyph

class redstork.Glyph(glyph, parent)

Represents Glyph drawing instructions

LINETO = 0

LineTo instruction

CURVETO = 1

CurveTo instruction

MOVETO = 2

MoveTo instruction

__getitem__(i)

Returns a 4-tuple representing this drawing instruction: (x, y, type, close).

Parameters:i (int) – index of the instruction

ImageObject

class redstork.ImageObject(obj, index, typ, parent)

Represents image on a page.

matrix = None

matrix for this page object

pixel_width

width of the bitmap, in pixels

pixel_height

height of the bitmap, in pixels

PathObject

class redstork.PathObject(obj, index, typ, parent)

Represents vector graphics on a aage.

matrix = None

matrix for this page object

ShadingObject

class redstork.ShadingObject(obj, index, typ, parent)

Represents a shading object on a page.

FormObject

class redstork.FormObject(obj, index, typ, parent)

Represents a form (XObject) on a page - a container of other page objects (used internally).

matrix = None

matrix for this page object

form_matrix = None

transformation matrix for contained objects

flat_iter()

Iterates over all non-container objects in this form.