PageSnap API Documentation

Simplify Integration: Unravel Our API's Potential for Your Projects.

Introduction

Welcome to our cutting-edge SaaS service, where you can effortlessly convert website links or HTML content into high-quality PDFs through simple HTTP requests. Our API allows you to customize the PDF output to match your preferences and automatically provides a download link via HTTP POST webhook response and your dashboard. The PDFs are securely stored in our AWS S3 bucket for 2 days, ensuring easy access and prompt redaction and removal thereafter. Designed for efficiency and reliability, our service integrates seamlessly with your workflows, enabling you to focus on your core activities while we handle your document conversion needs.

Authentication

To authenticate your account, include your API username and keys in the Basic Auth header of your HTTP requests. You can manage your API tokens in the dashboard.

Keep your API keys confidential and do not share them with others. Each API key is associated with your account and is used for billing purposes. If you need to provide API access to multiple users or services, create separate API keys for each use case. Regularly review and deactivate any unused keys to maintain the security of your account. Remember, you are responsible for all API requests made using your API keys.
Request Header
Include the Basic Auth username and keys in the request header using your preferred programming language.

Sandbox

When getting started with PageSnap or working in development mode, it's strongly advised to enable the sandbox parameter by setting it to true. This parameter adds a watermark to the generated PDF without deducting from your conversion credits, allowing you to set up your code and perform multiple tests without worrying about exhausting your credit balance.

Once you've completed your setup and testing, you can conduct a final local test with the sandbox parameter removed or set to false to ensure the PDF renders precisely as intended. Please note that there is a limit of 50 sandbox requests per day per account.

Request Body
{
    "sandbox": true
}

Generate API Key

To interact with the PageSnap API for capturing PDFs, you'll need an API key. PageSnap uses this key to grant access permission and track credit usage. You can generate multiple API keys if needed, to manage different keys for different projects. To generate an API key, create a PageSnap account and generate it via your dashboard.

Basic Example

This is one of the most basic example that you can quick test with your API key. This will generate the PDF in sandbox mode, and you can retrieve it from your dashboard account. If a webhook_url is provided, the result and the PDF download link will be sent via webhook.

Request Header
Include the Basic Auth username and keys in the request header using your preferred programming language.
Request Body
{
    "sandbox": true,
    "contents": {
        "urls": [
            "https://en.wikipedia.org/wiki/Tech"
        ]
    },
    "options": {}
}

Response

This is the API response format you will receive after submitting a request. You can use the returned request_id to verify and match the incoming webhook notification once the PDF conversion is completed. Below are the potential response statuses you may receive when interacting with our API, along with the corresponding reasons for each status:

Response Example
{
    "request_id": "5d2509ce-93f0-4e52-aeae-cab134a11c08",
    "success": true,
    "message": "Request sent to processing queue successfully",
    "error": ""
}
Status Code Reason
200 Request successful.
401 Missing Authorization Token.
402 Insufficient credit.
403 Invalid API key or user not authorized.
429 Sandbox limit per day limit reached.
500 Internal Server Error.

Help

If you encounter any problems while integrating with our API, please don't hesitate to contact our support team at [email protected]. We're here to assist you in achieving your desired document conversions.

Request Setup

Parameters that allow you to customize and control the behavior of the request.

contents required array

At least one entry is required for either the urls or htmls parameter. When a request with this JSON payload is sent to the API, the service will process each web page specified by the URLs and each HTML snippet independently, creating a distinct PDF document for each content source provided. This allows users to efficiently generate multiple PDFs from different types of content with a single API request. The maximum count of contents per request is 200.

  • urls: This array contains a list of URLs pointing to web pages. The API will retrieve the content from each of these web pages and generate a separate PDF for each URL.
  • htmls: This array holds raw HTML strings. Each HTML snippet will be rendered and captured into its own separate PDF.
Request Body
{
    ...
    "contents": {
        "urls": [
            "https://en.wikipedia.org/wiki/Tech",
            "https://en.wikipedia.org/wiki/Web"
        ],
        "htmls": [
            "<div>Hello World!</div>",
            "<p>Hello World Again!</p>",
        ]
    }
    ...
}

webhook_url string

The webhook_url parameter allows you to specify a URL where our service will send an HTTP POST request containing the results of the PDF capture process. The request will include the following information in the response body:

  • request: The original request payload you sent to our API for capturing the PDFs.
  • results: An array of URLs where you can download the captured PDF files. Each URL corresponds to a single captured PDF. These URLs will be valid for 2 hours.
  • error: If there were any errors during the PDF capture process, this field will contain an error message. If the process was successful, this field will be null.

If the webhook_url parameter is not provided, the request submission will still work. However, you will not receive any webhook notifications with the download links. In this case, you can manually retrieve the download links for your captured PDFs from our dashboard under the Requests section.

Note: Ensure that the provided webhook_url, if specified, is a valid, publicly accessible URL that can handle incoming POST requests.

Request Body
{
    ...
    "webhook_url": "https://your-webhook-url.com"
    ...
}
Webhook Response Sample
{
    "request_id": "5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b",
    "request": {
        ...Request Payload...
    },
    "results": [
        {
            "source": "https://en.wikipedia.org/wiki/Tech",
            "pdf": "https://pagesnap.io/download/5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b/tech.pdf",
            "expired_at": "2021-12-31T23:59:59Z",
            "error": null
        }
    ]
}

s3_path_url string

The s3_path_url is an optional parameter that allows you to store the captured PDFs in your own AWS S3 bucket. Provide the s3_path_url in the format s3://your-bucket-name/path/to/store/pdfs/ and ensure our service has write permissions to your bucket. You can copy and paste our sample bucket policy to apply to your AWS S3 bucket. When s3_path_url is specified, the results array in the webhook response and our dashboard will contain S3 path to the PDFs in your designated bucket. Please note that the path is not a pre-signed link because we do not have permission to create a pre-signed link for your bucket.

Please remember to replace the <You_bucket_name> in our policy sample.
Request Body
{
    ...
    "s3_path_url": "s3://your-bucket-name/path/to/store/pdfs/",
    ...
}
Required Bucket Policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Only allow writes to this bucket with bucket owner full control",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::146666182615:role/pagesnap-role-lambda"
            },
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::<Your_bucket_name>/*",
            "Condition": {
                "StringEquals": {
                    "s3:x-amz-acl": "bucket-owner-full-control"
                }
            }
        }
    ]
}

Request Options

Comprehensive Tuning Options for Achieving Your Desired Output.

set_java_script_enabled boolean

Enables or disables JavaScript execution on the page, allowing control over whether JavaScript code runs during the page load and interactions. By default, any JavaScript code present on the web page will be executed during the page load and interaction.

Request Body
{
    ...
    "options": {
        "set_java_script_enabled": false
    }
    ...
}

set_extra_http_headers json object

Set custom HTTP headers for all requests made by the page, enabling modification of the headers sent with each request. HTTP headers are key-value pairs that provide additional information about the request or the response. By default, PageSnap sends some standard headers with each request, such as User-Agent, Accept-Language, etc.

Request Body
{
    ...
    "options": {
        "set_extra_http_headers": {
            "User-Agent": "MyCustomUserAgent/1.0",
            "Referer": "https://example.com"
        }
    }
    ...
}

auth json object

Providing credentials for HTTP authentication requests made by the page, enabling automatic login or access to protected resources that require authentication. Keep in mind that this option only works for HTTP authentication mechanisms supported by the browser, such as Basic and Digest authentication. It does not handle other types of authentication, such as form-based login or OAuth.

Request Body
{
    ...
    "options": {
        "auth": {
            "username": "myname",
            "password": "abc123"
        }
    }
    ...
}

cookies array

Allows you to set cookies for the current page programmatically. PageSnap allows user to set cookie with properties name, and value.

Request Body
{
    ...
    "options": {
        "cookies": [{
            "name": "cookie1",
            "value": "value1"
        }, {
            "name": "cookie2",
            "value": "value2"
        }]
    }
    ...
}

stop_for_status boolean

Stop capturing the page if the website HTTP response code is not 2XX, allowing control over the response status code.

Request Body
{
    ...
    "options": {
        "stop_for_status": true
    }
    ...
}

click_selector string

Fetches an element with the given selector, scrolls it into view if needed, and then click in the center of the element.

Request Body
{
    ...
    "options": {
        "click_selector": "#myElement"
    }
    ...
}

add_style string

This will append the given CSS styles to the document before saving it. It can be a URL to a css style sheet file or a string of CSS rules.

Request Body
{
    ...
    "options": {
        "add_style": "https://example.com/styles.css"
    }
    ...
}
{
    ...
    "options": {
        "add_style": "body {background-color: red;}"
    }
    ...
}

add_script string

This will execute the given JavaScript before saving the document. It can be a URL or a string of JS code.

Request Body
{
    ...
    "options": {
        "add_script": "https://example.com/script.js"
    }
    ...
}
{
    ...
    "options": {
        "add_script": "document.body.style.backgroundColor = 'lightblue';"
    }
    ...
}

emulate_media_type string

Allows setting the page's emulated media type, to simulate different rendering scenarios. Supported values are: screen, print

Request Body
{
    ...
    "options": {
        "emulate_media_type": "print"
    }
    ...
}

wait_for_selector string

Waits for a specified selector to appear in the page's DOM. It pauses the execution until the element matching the selector is found or until a timeout is reached.

Request Body
{
    ...
    "options": {
        "wait_for_selector": "#myElement"
    }
    ...
}

wait_until string

Allows you to specify when to consider the navigation to be finished. You can provide multiple values separated by a comma. Supported values are:

  • load: Waits for the load event to be fired, indicating that the page has fully loaded.
  • domcontentloaded: Waits for the DOMContentLoaded event to be fired, indicating that the initial HTML document has been parsed completely.
  • networkidle0: Waits until there are no more than 0 network connections for at least 500 ms.
  • networkidle2: Waits until there are no more than 2 network connections for at least 500 ms.
Request Body
{
    ...
    "options": {
        "wait_until": "domcontentloaded,load"
    }
    ...
}

wait_for_timeout string

Specify the maximum time (in milliseconds) that the navigation should take before it times out. By default, the timeout is set to 30 seconds (30,000 milliseconds). If the navigation takes longer than the specified timeout, PageSnap will abort the navigation.

Request Body
{
    ...
    "options": {
        "wait_for_timeout": 30000
    }
    ...
}

idle_time_after_loaded integer

Wait for the specified time in milliseconds after each page load or click event. The default is 1000 milliseconds. This is an extra delay to ensure that the page has fully loaded and any animations or dynamic content have settled before capturing the screenshot or generating the PDF. This option does not override other waiting options.

Request Body
{
    ...
    "options": {
        "idle_time_after_loaded": 1000
    }
    ...
}

header_template string

Specify a custom HTML template for the header of each page in the generated PDF. This option works in conjunction with the display_header_footer option. When display_header_footer is set to true, PageSnap will use the provided header_template to render the header on each page of the PDF. The value should be a valid HTML string that defines the content and styling of the header. You can also use HTML classes date, title, url, pageNumber, totalPages to inject printing values. Make sure to provide appropriate margins using the margin_top option to ensure that the header has enough space and doesn't overlap with the main content of the page.

Request Body
{
    ...
    "options": {
        "display_header_footer": true,
        "header_template": "<span style='float: right;'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></span>",
        "margin_top": "5cm"
    }
    ...
}

page_ranges string

Specify the range of pages to include in the generated PDF. By default, when this option is not provided, PageSnap will generate a PDF that includes all the pages of the web page and might consumed more credits unnecessarily. You can specify a range of pages to include by providing a comma-separated list of page numbers or ranges. For example, "1,3-5,8" will include pages 1, 3, 4, 5, and 8 in the generated PDF.

Request Body
{
    ...
    "options": {
        "page_ranges": "1,3-5,8"
    }
    ...
}

scale float

Determines how the web page is scaled when rendering it to a PDF. It affects the size and resolution of the content in the generated PDF. The default value of scale is 1, which means the web page is rendered at its original size. If you set scale to a value greater than 1, the content will be scaled up, resulting in a larger PDF. Conversely, if you set scale to a value less than 1, the content will be scaled down, resulting in a smaller PDF. The supported values are in the range of 0.1 to 2.0.

Request Body
{
    ...
    "options": {
        "scale": 1
    }
    ...
}

width string

Allows you to specify the width of the generated PDF in cm. It's important to note that setting the width option may affect the layout and appearance of the content in the PDF. If the specified width is smaller than the actual content width, the content will be scaled down to fit within the given width. This may result in a different layout compared to the original web page.

Request Body
{
    ...
    "options": {
        "width": "10cm"
    }
    ...
}

height string

Allows you to specify the height of the generated PDF in cm. It's important to note that setting the height option may affect the layout and appearance of the content in the PDF. If the specified height is smaller than the actual content height, the content will be scaled down to fit within the given height. This may result in a different layout compared to the original web page.

Request Body
{
    ...
    "options": {
        "height": "50cm"
    }
    ...
}

margin_top string

Defines the margins for the top sides of the PDF pages. The margins are specified in cm units. The option is particularly useful when you want to add some breathing space around the content in the PDF or when you need to accommodate headers, footers, or other elements that require specific positioning.

Request Body
{
    ...
    "options": {
        "margin_top": "5cm"
    }
    ...
}

margin_bottom string

Defines the margins for the bottom sides of the PDF pages. The margins are specified in cm units. The option is particularly useful when you want to add some breathing space around the content in the PDF or when you need to accommodate headers, footers, or other elements that require specific positioning.

Request Body
{
    ...
    "options": {
        "margin_bottom": "5cm"
    }
    ...
}

margin_left string

Defines the margins for the left sides of the PDF pages. The margins are specified in cm units. The option is particularly useful when you want to add some breathing space around the content in the PDF or when you need to accommodate headers, footers, or other elements that require specific positioning.

Request Body
{
    ...
    "options": {
        "margin_left": "5cm"
    }
    ...
}

margin_right string

Defines the margins for the right sides of the PDF pages. The margins are specified in cm units. The option is particularly useful when you want to add some breathing space around the content in the PDF or when you need to accommodate headers, footers, or other elements that require specific positioning.

Request Body
{
    ...
    "options": {
        "margin_right": "5cm"
    }
    ...
}

print_background boolean

Control whether the background graphics of the web page should be included in the generated PDF. By default, this option is set to false, which means that background graphics, such as background colors, images, and gradients, are not included in the PDF. Keep in mind that enabling this option may increase the file size of the generated PDF, especially if the web page contains large background images or complex background graphics. It can also impact the rendering performance and the time required to generate the PDF.

Request Body
{
    ...
    "options": {
        "print_background": false
    }
    ...
}

landscape boolean

Allows you to specify the orientation of the generated PDF pages. By default, the option is set to false, which means that the PDF pages are generated in portrait orientation. In portrait orientation, the height of the page is greater than its width To generate the PDF pages in landscape orientation, you can set the option to true.

Request Body
{
    ...
    "options": {
        "landscape": false
    }
    ...
}

format string

Specify the paper size of the generated PDF. By default, the paper size is set to Letter. You can choose from a variety of paper sizes, including A0, A1, A2, A3, A4, A5, A6, Letter, Legal, and Tabloid.

Request Body
{
    ...
    "options": {
        "format": "Letter"
    }
    ...
}