Downloading a file using Puppeteer can be tricky. On some systems, there can be issues with the usual file saving process that prevent you from doing it the easy way. However, there are different techniques that work... most of the time ;-)

These techniques are only necessary when we don't have a direct file link, which is usually the case when the file being downloaded is based on more complicated data export.

1. Setting up a download path and reading from the disk

Let's start with the easier option.

This method tells the browser in what folder we want to download a file from Puppeteer after clicking on it, and then it uses the file system to get the file from the actor's disk into memory or save it into the Key-value store for later usage/download.

await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', downloadPath: './my-downloads'})

We use the mysterious ._client API which gives us access to all the functions of the underlying developer console protocol. Basically, it extends Puppeteer's functionality.

Then we can download the file by clicking on it.

await page.click('.export-button');

Let's wait for one minute. In a real use case, you want to check the state of the file in the file system.

await page.waitFor(60000);

To extract the file from the file system into memory, we have to first find its name, and then we can read it.

const fs = require('fs');
const fileNames = fs.readdirSync('./my-downloads');
// There won't be more files so let's pick the first
const fileData = fs.readFileSync(`./my-downloads/${fileNames[0]}`);

// Now we can use the data or save it into Key-value store.
// Let's assume it is a PDF file
await Apify.setValue('MY-PDF', fileData, { contentType: 'application/pdf'});

2. Intercepting and replicating a Puppeteer file download request


For this second option, we can trigger the file download, intercept the request going out, and then replicate it to get the actual data.

Setting up file interception

First, we need to enable request interception. This is done using the following line of code:

await page.setRequestInterception(true);

More info on this in Puppeteer docs.


Triggering file export

Next, we need to trigger the actual file export. We might need to fill in some form, select an exported file type, etc. In the end, it will look something like this:

await page.click('.export-button');

We don't need to await this promise since we'll be waiting for the result of this action anyway (the triggered request).

Intercepting file request

The crucial part is intercepting the request that would result in downloading the file. Since the interception is already enabled, we just need to wait for the request to be sent.

const xRequest = await new Promise(resolve => {
    page.on('request', interceptedRequest => {
        interceptedRequest.abort();     //stop intercepting requests
        resolve(interceptedRequest);
    });
});

Replicating request

The last thing is to convert the intercepted Puppeteer request into a request-promise options object.

We need to have the request-promise package required.

const request = require('request-promise');

Since the request interception does not include cookies, we need to add them subsequently.

const options = {
    encoding: null,
    method: xRequest._method,
    uri: xRequest._url,
    body: xRequest._postData,
    headers: xRequest._headers
}

/* add the cookies */
const cookies = await page.cookies();
options.headers.Cookie = cookies.map(ck => ck.name + '=' + ck.value).join(';');

/* resend the request */
const response = await request(options);

Now the response contains the binary data of the downloaded file. It can be saved to a hard drive, uploaded somewhere, or submitted with another form like this: Submitting a form with a file attachment using Puppeteer

Did this answer your question?