Downloading a file using Puppeteer can be tricky. On some systems, there can be issues with the usual file saving process that prevent you from doing it the easiest way. However, we can use another technique that works - most of the time ;-)

This technique is only necessary when we don't have a direct file link, which is usually the case when the file being downloaded is based on a more complicated data export.

There are actually 2 techniques to do this so if one would not work for you, you can try the other one :)

1. Setting up a download path and then reading from disk

This is the simpler way to do it so let's start with it.

This method tells the browser to which folder we want to download a file after clicking on it and then it uses the file system to get the file from the actor's disk into memory or save it into Key-value store for later usage/download.

await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', downloadPath: './my-downloads'})

We use the mysterious ._client API which gives us access to all functions of the underlying developer console protocol. Basically, it extends Puppeteer's functionality.

Then we can download the file by clicking on it.

await page.click('.export-button');

Let's wait for one minute. In real use-case, you want to check the state of the file in the file system.

await page.waitFor(60000);

And extract the file from the file system into memory. We have to first find its name and then we can read it.

const fs = require('fs');
const fileNames = fs.readdirSync('./my-downloads');
// There won't be more files so let's pick the first
const fileData = fs.readFileSync(`./my-downloads/${fileNames[0]}`);

// Now we can use the data or save it into Key-value store.
// Let's assume it is a PDF file
await Apify.setValue('MY-PDF', fileData, { contentType: 'application/pdf'});

2. Intercepting and replicating file download request


To do this, in essence, we can trigger the file download, intercept the request going out and then replicate it to get the actual data.

Setting up file interception

First we need to enable request interception. This is done using the following line of code:

await page.setRequestInterception(true);

More info on this in Puppeteer docs.


Triggering file export

Next, we need to trigger the actual file export. We might need to fill in some form, select exported file type, etc., but in the end, there will be something like this:

await page.click('.export-button');

We don't need to await this promise since we'll be waiting for the result of this action anyway (the triggered request).

Intercepting file request

The crucial part is intercepting the request that would result in downloading the file. Since the interception is already enabled, we just need to wait for the request to be sent.

const xRequest = await new Promise(resolve => {
    page.on('request', interceptedRequest => {
        interceptedRequest.abort();     //stop intercepting requests
        resolve(interceptedRequest);
    });
});

Replicating request

The last thing is to convert the intercepted Puppeteer request into a request-promise options object. 

We need to have the request-promise package required.

const request = require('request-promise');

Since the request interception does not include cookies, we need to add them subsequently.

const options = {
    encoding: null,
    method: xRequest._method,
    uri: xRequest._url,
    body: xRequest._postData,
    headers: xRequest._headers
}

/* add the cookies */
const cookies = await page.cookies();
options.headers.Cookie = cookies.map(ck => ck.name + '=' + ck.value).join(';');

/* resend the request */
const response = await request(options);

Now the response contains the binary data of the downloaded file. It can be saved to a hard drive, uploaded somewhere or submitted with another form like this: Submitting a form with file attachment using Puppeteer

Did this answer your question?