Chapter 15
PDF and JSON

cpdf in.pdf -output-json -o out.json
     [-output-json-parse-content-streams]
     [-output-json-no-stream-data]

Output PDF as JSON data. Each object is written under its object number. The object number zero is used to store the trailer dictionary. Negative object numbers are reserved for future format expansion. Here is an example of the output for a small PDF:

[ [ 0,
    { "/Size": 4, "/Root": 4,
      "/ID": ["<elided>", "<elided>"] } ],
  [ 3,
    { "/Type": "/Page", "/Parent": 1,
      "/Resources": { "/Font": { "/F0": { "/Type": "/Font",
                                          "/Subtype": "/Type1",
                                          "/BaseFont": "/Times-Italic" } } },
      "/MediaBox": [ 0, 0, 595.275591, 841.889764 ], "/Rotate": 0,
      "/Contents": [ 2 ] } ],
  [ 4, { "/Type": "/Catalog", "/Pages": 1 } ],
  [ 1, { "/Type": "/Pages", "/Kids": [ 3 ], "/Count": 1 } ],
  [ 2,
    [ { "/Length": 49 },
      "1 0 0 1 50 770 cm BT/F0 36 Tf(Hello, World!)Tj ET" ] ] ]

The option -output-json-parse-content-streams will also convert content streams to JSON, so our example content stream will be expanded:

[ [ 1.000000, 0.000000, 0.000000, 1.000000, 50.000000, 770.000000,
          "cm" ],
        [ "BT" ], [ "/F0", 36.000000, "Tf" ], [ "Hello, World!", "Tj" ],
        [ "ET" ] ] ] ] ]

The option -output-json-no-stream-data simply elides the stream data instead, leading to much smaller JSON files.