PageSnap API Documentation
Simplify Integration: Unravel Our API's Potential for Your Projects.
Introduction
Welcome to our cutting-edge SaaS service, where you can effortlessly convert website links or HTML content into high-quality PDFs through simple HTTP requests. Our API allows you to customize the PDF output to match your preferences and automatically provides a download link via HTTP POST webhook response and your dashboard. The PDFs are securely stored in our AWS S3 bucket for 2 days, ensuring easy access and prompt redaction and removal thereafter. Designed for efficiency and reliability, our service integrates seamlessly with your workflows, enabling you to focus on your core activities while we handle your document conversion needs.
Authentication
To authenticate your account, include your API username and keys in the Basic Auth header of your HTTP requests. You can manage your API tokens in the dashboard.
Sandbox
When getting started with PageSnap or working in development mode, it's strongly advised to enable the sandbox parameter by setting it to true. This parameter adds a watermark to the generated PDF without deducting from your conversion credits, allowing you to set up your code and perform multiple tests without worrying about exhausting your credit balance.
Once you've completed your setup and testing, you can conduct a final local test with the sandbox parameter removed or set to false to ensure the PDF renders precisely as intended. Please note that there is a limit of 50 sandbox requests per day per account.
{
"sandbox": true
}
Generate API Key
To interact with the PageSnap API for capturing PDFs, you'll need an API key. PageSnap uses this key to grant access permission and track credit usage. You can generate multiple API keys if needed, to manage different keys for different projects. To generate an API key, create a PageSnap account and generate it via your dashboard.
Basic Example
This is one of the most basic example that you can quick test with your API key. This will generate the PDF in sandbox mode, and you can retrieve it from your dashboard account. If a webhook_url is provided, the result and the PDF download link will be sent via webhook.
{
"sandbox": true,
"contents": {
"urls": [
"https://en.wikipedia.org/wiki/Tech"
]
},
"options": {}
}
Response
This is the API response format you will receive after submitting a request. You can use the returned request_id to verify and match the incoming webhook notification once the PDF conversion is completed. Below are the potential response statuses you may receive when interacting with our API, along with the corresponding reasons for each status:
{
"request_id": "5d2509ce-93f0-4e52-aeae-cab134a11c08",
"success": true,
"message": "Request sent to processing queue successfully",
"error": ""
}
Status Code | Reason |
200 | Request successful. |
401 | Missing Authorization Token. |
402 | Insufficient credit. |
403 | Invalid API key or user not authorized. |
429 | Sandbox limit per day limit reached. |
500 | Internal Server Error. |
Help
If you encounter any problems while integrating with our API, please don't hesitate to contact our support team at [email protected]. We're here to assist you in achieving your desired document conversions.
Request Setup
Parameters that allow you to customize and control the behavior of the request.
contents required array
At least one entry is required for either the urls or htmls parameter. When a request with this JSON payload is sent to the API, the service will process each web page specified by the URLs and each HTML snippet independently, creating a distinct PDF document for each content source provided. This allows users to efficiently generate multiple PDFs from different types of content with a single API request. The maximum count of contents per request is 200.
- urls: This array contains a list of URLs pointing to web pages. The API will retrieve the content from each of these web pages and generate a separate PDF for each URL.
- htmls: This array holds raw HTML strings. Each HTML snippet will be rendered and captured into its own separate PDF.
{
...
"contents": {
"urls": [
"https://en.wikipedia.org/wiki/Tech",
"https://en.wikipedia.org/wiki/Web"
],
"htmls": [
"<div>Hello World!</div>",
"<p>Hello World Again!</p>",
]
}
...
}
webhook_url string
The webhook_url parameter allows you to specify a URL where our service will send an HTTP POST request containing the results of the PDF capture process. The request will include the following information in the response body:
- request: The original request payload you sent to our API for capturing the PDFs.
- results: An array of URLs where you can download the captured PDF files. Each URL corresponds to a single captured PDF. These URLs will be valid for 2 hours.
- error: If there were any errors during the PDF capture process, this field will contain an error message. If the process was successful, this field will be null.
If the webhook_url parameter is not provided, the request submission will still work. However, you will not receive any webhook notifications with the download links. In this case, you can manually retrieve the download links for your captured PDFs from our dashboard under the Requests section.
Note: Ensure that the provided webhook_url, if specified, is a valid, publicly accessible URL that can handle incoming POST requests.
{
...
"webhook_url": "https://your-webhook-url.com"
...
}
{
"request_id": "5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b",
"request": {
...Request Payload...
},
"results": [
{
"source": "https://en.wikipedia.org/wiki/Tech",
"pdf": "https://pagesnap.io/download/5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b/tech.pdf",
"expired_at": "2021-12-31T23:59:59Z",
"error": null
}
]
}
s3_path_url string
The s3_path_url is an optional parameter that allows you to store the captured PDFs in your own AWS S3 bucket. Provide the s3_path_url in the format s3://your-bucket-name/path/to/store/pdfs/ and ensure our service has write permissions to your bucket. You can copy and paste our sample bucket policy to apply to your AWS S3 bucket. When s3_path_url is specified, the results array in the webhook response and our dashboard will contain S3 path to the PDFs in your designated bucket. Please note that the path is not a pre-signed link because we do not have permission to create a pre-signed link for your bucket.
{
...
"s3_path_url": "s3://your-bucket-name/path/to/store/pdfs/",
...
}
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Only allow writes to this bucket with bucket owner full control",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::146666182615:role/pagesnap-role-lambda"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::<Your_bucket_name>/*",
"Condition": {
"StringEquals": {
"s3:x-amz-acl": "bucket-owner-full-control"
}
}
}
]
}
Request Options
Comprehensive Tuning Options for Achieving Your Desired Output.
set_java_script_enabled boolean
Enables or disables JavaScript execution on the page, allowing control over whether JavaScript code runs during the page load and interactions. By default, any JavaScript code present on the web page will be executed during the page load and interaction.
{
...
"options": {
"set_java_script_enabled": false
}
...
}
set_extra_http_headers json object
Set custom HTTP headers for all requests made by the page, enabling modification of the headers sent with each request. HTTP headers are key-value pairs that provide additional information about the request or the response. By default, PageSnap sends some standard headers with each request, such as User-Agent, Accept-Language, etc.
{
...
"options": {
"set_extra_http_headers": {
"User-Agent": "MyCustomUserAgent/1.0",
"Referer": "https://example.com"
}
}
...
}
auth json object
Providing credentials for HTTP authentication requests made by the page, enabling automatic login or access to protected resources that require authentication. Keep in mind that this option only works for HTTP authentication mechanisms supported by the browser, such as Basic and Digest authentication. It does not handle other types of authentication, such as form-based login or OAuth.
{
...
"options": {
"auth": {
"username": "myname",
"password": "abc123"
}
}
...
}
stop_for_status boolean
Stop capturing the page if the website HTTP response code is not 2XX, allowing control over the response status code.
{
...
"options": {
"stop_for_status": true
}
...
}
click_selector string
Fetches an element with the given selector, scrolls it into view if needed, and then click in the center of the element.
{
...
"options": {
"click_selector": "#myElement"
}
...
}
add_style string
This will append the given CSS styles to the document before saving it. It can be a URL to a css style sheet file or a string of CSS rules.
{
...
"options": {
"add_style": "https://example.com/styles.css"
}
...
}
{
...
"options": {
"add_style": "body {background-color: red;}"
}
...
}
add_script string
This will execute the given JavaScript before saving the document. It can be a URL or a string of JS code.
{
...
"options": {
"add_script": "https://example.com/script.js"
}
...
}
{
...
"options": {
"add_script": "document.body.style.backgroundColor = 'lightblue';"
}
...
}
emulate_media_type string
Allows setting the page's emulated media type, to simulate different rendering scenarios. Supported values are: screen, print
{
...
"options": {
"emulate_media_type": "print"
}
...
}
wait_for_selector string
Waits for a specified selector to appear in the page's DOM. It pauses the execution until the element matching the selector is found or until a timeout is reached.
{
...
"options": {
"wait_for_selector": "#myElement"
}
...
}
wait_until string
Allows you to specify when to consider the navigation to be finished. You can provide multiple values separated by a comma. Supported values are:
- load: Waits for the load event to be fired, indicating that the page has fully loaded.
- domcontentloaded: Waits for the DOMContentLoaded event to be fired, indicating that the initial HTML document has been parsed completely.
- networkidle0: Waits until there are no more than 0 network connections for at least 500 ms.
- networkidle2: Waits until there are no more than 2 network connections for at least 500 ms.
{
...
"options": {
"wait_until": "domcontentloaded,load"
}
...
}
wait_for_timeout string
Specify the maximum time (in milliseconds) that the navigation should take before it times out. By default, the timeout is set to 30 seconds (30,000 milliseconds). If the navigation takes longer than the specified timeout, PageSnap will abort the navigation.
{
...
"options": {
"wait_for_timeout": 30000
}
...
}
idle_time_after_loaded integer
Wait for the specified time in milliseconds after each page load or click event. The default is 1000 milliseconds. This is an extra delay to ensure that the page has fully loaded and any animations or dynamic content have settled before capturing the screenshot or generating the PDF. This option does not override other waiting options.
{
...
"options": {
"idle_time_after_loaded": 1000
}
...
}
header_template string
Specify a custom HTML template for the header of each page in the generated PDF. This option works in conjunction with the display_header_footer option. When display_header_footer is set to true, PageSnap will use the provided header_template to render the header on each page of the PDF. The value should be a valid HTML string that defines the content and styling of the header. You can also use HTML classes date, title, url, pageNumber, totalPages to inject printing values. Make sure to provide appropriate margins using the margin_top option to ensure that the header has enough space and doesn't overlap with the main content of the page.
{
...
"options": {
"display_header_footer": true,
"header_template": "<span style='float: right;'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></span>",
"margin_top": "5cm"
}
...
}
page_ranges string
Specify the range of pages to include in the generated PDF. By default, when this option is not provided, PageSnap will generate a PDF that includes all the pages of the web page and might consumed more credits unnecessarily. You can specify a range of pages to include by providing a comma-separated list of page numbers or ranges. For example, "1,3-5,8" will include pages 1, 3, 4, 5, and 8 in the generated PDF.
{
...
"options": {
"page_ranges": "1,3-5,8"
}
...
}
scale float
Determines how the web page is scaled when rendering it to a PDF. It affects the size and resolution of the content in the generated PDF. The default value of scale is 1, which means the web page is rendered at its original size. If you set scale to a value greater than 1, the content will be scaled up, resulting in a larger PDF. Conversely, if you set scale to a value less than 1, the content will be scaled down, resulting in a smaller PDF. The supported values are in the range of 0.1 to 2.0.
{
...
"options": {
"scale": 1
}
...
}
width string
Allows you to specify the width of the generated PDF in cm. It's important to note that setting the width option may affect the layout and appearance of the content in the PDF. If the specified width is smaller than the actual content width, the content will be scaled down to fit within the given width. This may result in a different layout compared to the original web page.
{
...
"options": {
"width": "10cm"
}
...
}
height string
Allows you to specify the height of the generated PDF in cm. It's important to note that setting the height option may affect the layout and appearance of the content in the PDF. If the specified height is smaller than the actual content height, the content will be scaled down to fit within the given height. This may result in a different layout compared to the original web page.
{
...
"options": {
"height": "50cm"
}
...
}
margin_top string
Defines the margins for the top sides of the PDF pages. The margins are specified in cm units. The option is particularly useful when you want to add some breathing space around the content in the PDF or when you need to accommodate headers, footers, or other elements that require specific positioning.
{
...
"options": {
"margin_top": "5cm"
}
...
}
margin_bottom string
Defines the margins for the bottom sides of the PDF pages. The margins are specified in cm units. The option is particularly useful when you want to add some breathing space around the content in the PDF or when you need to accommodate headers, footers, or other elements that require specific positioning.
{
...
"options": {
"margin_bottom": "5cm"
}
...
}
margin_left string
Defines the margins for the left sides of the PDF pages. The margins are specified in cm units. The option is particularly useful when you want to add some breathing space around the content in the PDF or when you need to accommodate headers, footers, or other elements that require specific positioning.
{
...
"options": {
"margin_left": "5cm"
}
...
}
margin_right string
Defines the margins for the right sides of the PDF pages. The margins are specified in cm units. The option is particularly useful when you want to add some breathing space around the content in the PDF or when you need to accommodate headers, footers, or other elements that require specific positioning.
{
...
"options": {
"margin_right": "5cm"
}
...
}
print_background boolean
Control whether the background graphics of the web page should be included in the generated PDF. By default, this option is set to false, which means that background graphics, such as background colors, images, and gradients, are not included in the PDF. Keep in mind that enabling this option may increase the file size of the generated PDF, especially if the web page contains large background images or complex background graphics. It can also impact the rendering performance and the time required to generate the PDF.
{
...
"options": {
"print_background": false
}
...
}
landscape boolean
Allows you to specify the orientation of the generated PDF pages. By default, the option is set to false, which means that the PDF pages are generated in portrait orientation. In portrait orientation, the height of the page is greater than its width To generate the PDF pages in landscape orientation, you can set the option to true.
{
...
"options": {
"landscape": false
}
...
}
format string
Specify the paper size of the generated PDF. By default, the paper size is set to Letter. You can choose from a variety of paper sizes, including A0, A1, A2, A3, A4, A5, A6, Letter, Legal, and Tabloid.
{
...
"options": {
"format": "Letter"
}
...
}