Take a screenshot of any website with Puppeteer and Python

Jan 18, 2024

Puppeteer is a powerful Node.js library that provides a high-level API to control headless Chrome or Chromium browsers. It allows you to automate web interactions, perform web scraping, and capture screenshots of web pages. In this article, we'll explore how to take screenshots of websites using Puppeteer and Python.

Key Points

Puppeteer is a Node.js library for controlling headless Chrome/Chromium browsers.
Screenshots can be captured using the screenshot() method of page or element objects in Puppeteer.
You can set the viewport size, specify the screenshot format (PNG, JPEG, or WebP), and control whether to capture the full page or a specific element.
When scraping dynamic web pages, it's important to wait for the page to fully load before capturing a screenshot.

Taking Screenshots with Puppeteer

To take a screenshot with Puppeteer, you first need to launch a browser instance and navigate to the desired web page. Here's an example of how to capture a screenshot using Puppeteer:

const puppeteer = require('puppeteer');

async function run() {

const browser = await puppeteer.launch();

const page = await browser.newPage();

await page.goto("https://httpbin.dev/html");

await page.screenshot({

"type": "png",

"path": "screenshot.png",

"fullPage": true,

});

browser.close();

}

run();

In this code snippet:1. We launch a new browser instance using puppeteer.launch().2. We create a new page using browser.newPage() and navigate to the desired URL using page.goto().3. We capture a screenshot of the entire page using page.screenshot(), specifying the screenshot format as PNG, the file path to save the screenshot, and setting fullPage to true to capture the full page content.4. Finally, we close the browser instance.

Capturing Specific Elements

In addition to capturing the entire page, Puppeteer allows you to take screenshots of specific elements on the page. Here's an example:

const element = await page.$("p");

await element.screenshot({"path": "just-the-paragraph.png", "type": "png"});

In this code, we use page.$() to select a specific element (in this case, a paragraph <p>) and then call element.screenshot() to capture a screenshot of just that element.

Considerations for Dynamic Web Pages

When scraping dynamic web pages that load content asynchronously, it's important to ensure that the page is fully loaded before capturing a screenshot. You can use Puppeteer's built-in methods like page.waitForSelector() or page.waitForNavigation() to wait for specific elements to appear or for the page to finish loading.

Using Puppeteer with Python

While Puppeteer is a Node.js library, you can still use it with Python by leveraging the pyppeteer library, which provides a Python port of Puppeteer. The usage and API are similar to the JavaScript version.

Here's an example of taking a screenshot using pyppeteer in Python:

import asyncio

from pyppeteer import launch

async def main():

browser = await launch()

page = await browser.newPage()

await page.goto('https://example.com')

await page.screenshot({'path': 'screenshot.png'})

await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Summary

Puppeteer is a powerful tool for automating web interactions and capturing screenshots of websites. With its simple API, you can easily take screenshots of entire pages or specific elements. When working with dynamic web pages, make sure to wait for the page to fully load before capturing a screenshot. While Puppeteer is a Node.js library, you can also use it with Python by leveraging the pyppeteer library.

By following the code examples and considerations mentioned in this article, you'll be able to capture screenshots of any website using Puppeteer and Python.

Let's get scraping 🚀

Ready to start?

Get scraping now with a free account and $25 in free credits when you sign up.

Latest articles

Resources

Introduction Web Scraping with C# 2024

Oct 19, 2023

Resources

Making HTTP Requests with Axios

Aug 24, 2023