The most common integration of Apify with your system is usually very simple. You need to run an actor or task, wait for it to finish and then collect the data. With all the features Apify provides, new users may not be sure of the standard/easiest way to implement this. So let's dive in and show that it is actually pretty simple.
Don't forget to check the full API documentation with examples in different languages and live API console. I also recommend testing the API with some nice desktop client like Postman.
We will go through the 3 major steps chronologically:
- Run actor/task
- Wait for it to finish
- Collect the data into your system
1. Run actor/task
The API endpoints and their usage is basically the same for actors and tasks. If you are still not sure of the difference between actor and task, read about that in the tasks docs. In short, tasks are just pre-saved inputs for actors, nothing more.
To call (that's how we say "to run") an actor/task, you will need a few things:
- Name or id of the actor/task. The name is in the format
username~actorName
orusername~taskName
. - Your API token (make sure it doesn't leak anywhere!)
- Possibly an input or other settings if you want to change the default values (like memory, build, etc.)
The template URL for a POST request to run the actor looks like this:
https://api.apify.com/v2/acts/ACTOR_NAME_OR_ID/runs?token=YOUR_TOKEN
For tasks, we just switch the path from acts to actor-tasks:
https://api.apify.com/v2/actor-tasks/TASK_NAME_OR_ID/runs?token=YOUR_TOKEN
If we send a correct POST request to this endpoint, the actor/task will start just as if we had pressed the Run button in the web app.
Additional settings
We can also add any settings (these will override the default settings) as additional query parameters. So if you want to change how much memory you want to allocate and which build you want to run, simply add these as parameters separated with &
.
https://api.apify.com/v2/acts/ACTOR_NAME_OR_ID/runs?token=YOUR_TOKEN&memory=8192&build=beta
This works almost identically for actors and tasks. However, for tasks there is no sense in providing a build
since a task already has only one specific actor build.
Input JSON
Most actors wouldn't be much use if you could not pass any input to change their behavior. And even though each task already has an input, it is handy to be able to always overwrite with the API call.
The input of an actor or task can be an arbitrary JSON so its structure really depends only on the specific actor. This input JSON should be send as the body of the POST request.
If you want to run one of the major actors from Apify Store, you usually don't need to provide all possible fields in the input. Good actors have reasonable defaults for most of them.
Let's try to run the most popular actor - generic Web Scraper.
The full input with all possible fields is pretty long and ugly so we won't show it here. As it has default values for most of its fields, we can provide just a simple JSON input.
We will send a POST request to
https://api.apify.com/v2/acts/apify~web-scraper/runs?token=YOUR_TOKEN
and add the JSON as a body.
This is how it can look in Postman.

If we press Send, it will immediately return some info about the run. The status
will be either READY
(which means that it is waiting to be allocated on a server) or RUNNING
(99% of cases).

We will later use this run info JSON to retrieve the data. You can also get this info about the run with another call to GET RUN endpoint.
2. Wait for finish
There may be cases where we need to simply run the actor and go away. But in any kind of integration, we are usually interested in its output. We have three basic options for how to wait for the actor/task to finish.
- Synchronous call
- Webhooks
- Polling
Synchronous call
For simple and short actor runs, the synchronous call is the easiest one to implement. You can make the POST request wait by simply adding a parameter waitForFinish
that can have a value from 0
to 300
which is a time in seconds (max wait time is 5 minutes). The example URL can be extended like this:
https://api.apify.com/v2/acts/apify~web-scraper/runs?token=YOUR_TOKEN&waitForFinish=300
Again, the final response will be the run info object, but now it should have its status
as SUCCEEDED
or FAILED
. If the run exceeds the waitForFinish
, the status
will still be RUNNING
.
Run-sync endpoint
There is also one special case with a limited use case. The Apify API provides a special run-sync endpoint for actors and tasks that will wait as in the previous case. The advantage over the previous waiting parameter is that you will get back the data right away along with the info JSON as a response. This saves you one more call. The disadvantage is that this only works if the data is stored in a Key value store of the run. Most of the time, you store the data in a dataset where this endpoint doesn't help.
Webhooks
If you have a server, webhooks are the most elegant and flexible solution. You can simply set up a webhook for any actor or task and that webhook sends a POST request to your server after some event happens. Usually this event is a successfully finished run but you can also set a different webhook for failed runs, etc.

The webhook will send you a pretty complicated JSON but usually you are only interested in the resource
object, which is basically the run info JSON from the previous sections. You can leave the payload template as is as for our use case, since it is what we need.
Once you receive this request from the webhook, you know the event happened and you can ask for the complete data. Don't forget to respond to the webhook with a 200 status. Otherwise, it will ping you again.
Polling
However, there are cases where you don't have a server and the run is too long to use a synchronous call. Periodic polling of the run status is then the solution.
You run the actor with the usual call shown in the beginning of this article. That will run the actor and give you back the run info JSON. You need to extract the id
field from this JSON which is the ID of the actor run that you just started. Then you set an interval that will poll Apify API (let's say every 5 seconds). Every interval you will call the GET RUN endpoint to retrieve the status of the run. You simply replace RUN_ID with the id
in the following URL :
https://api.apify.com/v2/acts/ACTOR_NAME_OR_ID/runs/RUN_ID
Once it returns with a status
of SUCCEEDED
or FAILED
you know it has finished and you can cancel the interval and ask for the data.
3. Collect the data
Unless you have used the special run-sync endpoint mentioned above, you will have to make one additional request to the API to retrieve the data. The run info JSON also contains IDs of the default dataset and key value store that are allocated seprately for each run. This is usually everything you need. The fields are called defaultDatasetId
and defaultKeyValueStoreId
.
Collecting dataset
If you are scraping products or basically any list of items with similar fields, the dataset is the storage of choice. Don't forget that dataset items are immutable: you can only push to the dataset, not change its content.
Retrieving the data is simple: Send a GET request to the GET ITEMS endpoint and pass the defaultDatasetId
to the URL. For GET request to the default dataset, no token is needed.
https://api.apify.com/v2/datasets/DATASET_ID/items
By default, it will return the data in JSON format with some metadata. The actual data are in the items
array.
There are plenty of additional parameters that you can use. Learning about them is not the focus of this article, so check the docs. We will only mention that you can pass a format
parameter that transforms the response to any popular format like CSV, XML, Excel, RSS, etc. Also, the items are paginated, which means you can ask only for a subset of the data. This is specified with the limit
and offset
parameters. There is actually an overall limit of 250,000 items that the endpoint can return per request so to retrieve more, you need to send more requests incrementing the offset
.
https://api.apify.com/v2/datasets/DATASET_ID/items?format=csv&offset=250000
Collecting files from key value store
Key value store is mainly useful if you have a single output or any kind of files that cannot be stringified like images, PDFs, etc. When you want to retrieve anything from key value store, the defaultKeyValueStoreId
is not enough. You also need to know the name of the record you want to retrieve.
If you have a single output JSON, the convention is to return this as a record named OUTPUT to the default key value store. To retrieve the content of the record, call GET RECORD endpoint. Again, no need for a token for simple GET requests.
https://api.apify.com/v2/key-value-stores/STORE_ID/records/RECORD_KEY
If you don't know the keys(names) of the records in advance, you can retrieve just the keys with LIST KEYS endpoint. Just keep in mind that you can get max 1000 keys per one request so you will need to paginate over the keys using the exclusiveStartKey
parameter if you have more than 1000 keys. Basically, after each call, you will take the last record key and provide it as a exclusiveStartKey
parameter. You can do this until you get 0 keys back.
https://api.apify.com/v2/key-value-stores/STORE_ID/keys?exclusiveStartKey=myLastRecordKey
Summary
We have reviewed the basic integration process with all of its main options. Of course, there are plenty of parameters and functionalities that you can use to make your integration smoother. Check our help section for more knowledge content.