Name

pdf2djvu — creates DjVu files from PDF files

Synopsis

pdf2djvu [ { -o | --output } output-djvu-file ] [option...] pdf-file

pdf2djvu { -i | --indirect } index-djvu-file [option...] pdf-file

pdf2djvu { --version | --help | -h }

Description

This program creates a DjVu file from the Portable Document Format file pdf-file.

Options

pdf2djvu accepts the following options:

Document type, file names

-o, --output=output-djvu-file

Generate a bundled multi-page document. Write the file into output-djvu-file instead of standard output.

-i, --indirect=index-djvu-file

Generate an indirect multi-page document. Use index-djvu-file as the index file name; put the component files into the same directory. The directory must exist and be writable.

--pageid-prefix=prefix

Specifies the naming scheme for page identifiers: prefix, followed by 0-padded page number, followed by the the djvu extension. prefix must consist only of letters, digits, _, +, - and dot. The default is p.

Resolution, page size

-d, --dpi=resolution

Specifies the desired resolution to resolution dots per inch. The default is 300 dpi. The allowed range is: 72 ≤ resolution ≤ 6000.

--media-box

Use MediaBox to determine page size. CropBox is used by default.

--page-size=widthxheight

Specifies the preferred page size to width pixels × height pixels. The actual page size may be altered in order to respect aspect ratio and DjVu limitations on resolution. (This option takes precedence over -d/--dpi.)

Image quality

--bg-slices=n++n, --bg-slices=n,,n

Specifies the encoding quality of the IW44 background layer. This option is similar to the -slice option of c44. Consult the c44(1) manual page for details. The default is 72+11+10+10.

--bg-subsample=n

Specifies the background subsampling ratio. The default is 3. Valid values are integers between 1 and 12, inclusive.

--fg-colors=web

Reduce foreground layer colors to the web palette (216 colors). This is default.

--fg-colors=n

Use GraphicsMagick to reduce number of distinct colors in the foreground layer to n. Valid values are integers between 1 and 4080. This option is not recommended.

--monochrome

Render pages as monochrome bitmaps. With this option, --bg- and --fg- options are not respected.

--loss-level=N

Specify the aggressiveness of the lossy compression. The default is 0 (lossless). Valid values are integers between 0 and 200, inclusive. This option is similar to the -losslevel option of cjb2; consult the cjb2(1) manual page for details. This option is respected only along with the --monochrome option.

--lossy

Synonym for --loss-level=100.

--anti-alias

Enable font and vector anti-aliasing. This option is not recommended.

Extraction

--no-metadata

Don't extract the metadata.

By default:

  • XMP metadata is extracted.

  • The following entries of the document information dictionary are extracted: Title, Author, Subject, Creator, Producer, CreationDate, ModDate. Timestamps are formatted according to RFC 3999, with date and time components separated by a single space.

--verbatim-metadata

Keep the original metadata intact.

By default, the Producer entry is extended by the pdf2djvu version information.

--no-outline

Don't extract the document outline.

--hyperlinks=options

Specifies hyperlink display options. options must be a comma-separated list of:

border-avis

Make hyperlinks' borders always visible. (Otherwise, the border will be visible only when the mouse is over the hyperlink.)

#RRGGBB

Set hyperlinks' borders color.

--no-hyperlinks

Don't extract hyperlinks.

--no-text

Don't extract the text.

--words

Extract the text. Record the location of every word. This is the default.

--lines

Extract the text. Record the location of every line, rather that every word.

--no-nfkc

Don't NFKC-normalize the text.

--pages=page-range

Specifies pages to convert. page-range is a comma-separated list of sub-ranges. Each sub-range is either a single page (e.g. 17) or a contiguous range of pages (e.g. 37-42). Pages are numbered from 1.

The default is to convert all pages.

Verbosity, help

-v, --verbose

Display more informational messages while converting the file.

-q, --quiet

Don't display informational messages while converting the file.

--version

Output version information and exit.

-h, --help

Display help and exit.

Implementation defails

Layer separation algorithm

Unless the --monochrome option is on, pdf2djvu uses the following naïve layer separation algorithm:

  1. For each page, do the following:

    1. Raster the page into a pixmap, in the usual manner.

    2. Raster the page into another pixmap, omitting the following page elements:

      • text,

      • 1 bit-per-pixel raster images,

      • vector elements (except fills of large areas).

    3. Compare both pixmaps, pixel by pixel:

      1. If their colors match, classify the pixel as a part of the background layer.

      2. Otherwise, classify the pixel as a part of the foreground layer.

See also

djvudigital(1), csepdjvu(1)