TLDR

1 GB memory x 1 hour = 1 compute unit

Introduction

Serverless and scraping platforms around the world use different metrics to measure your usage of their platform. It may be number of opened pages, available concurrent accesses, data traffic, etc. Although these metrics are pretty easy to understand, they lack flexibility.  

Compute units are flexible

At Apify, we decided to value flexibility and give you complete control over how you want to use our platform. Our users have several use cases, examples of which are:

  • Scrape 10 million pages with one week-long scrape run every month.
  • Run 50 parallel runs of quick scrapes at peak time.
  • Run a one page scraper every 5 seconds.
  • A complex system with many different actors integrated and produce high value data combined from many sources.
  • Run an actor as a server

All of these use cases and many more are measured with the same predictable unit - the compute unit.

Compute units are fair and transparent

At its core, Apify manages a large pool of servers with a strong platform on top and then lends this platform to users. If you run an actor or a task, you get allocated a part of our servers with the added services that we provide. This allocated server does some work for you and after everything is done, the server is returned to our pool.

This means that our costs and your costs are directly proportional and that makes compute unit usage very transparent.

How exactly are compute units calculated?

To calculate compute units, you need to multiply two factors:

  • Duration (hours) - How long the server is used (actor or task run).
  • Memory (GBs) - Size of the allocated server for your actor or task run.

Duration

Compute units are calculated in seconds. If you know the number of seconds your run took (or should take), you just need to convert that into hours to get a number that you can use for compute unit calculation. For example, if your run took 6 minutes (360 seconds) you can use 0.1 (hours) as the first number to calculate compute units.

Memory

Memory means how big the share of a server you are allocated. The number of MBs (megabytes) directly corresponds to RAM allocated but also for each 4096 MB of RAM you get allocated one CPU core (for 1024 MB you get allocated 25% of the CPU core, and so on.).

It is important to understand that memory in this case directly corresponds to the overall power and speed of the underlying computer. The real RAM usage is usually much below the maximum limit, but that doesn't mean that you should reduce the memory allocation. In 99% cases, the bottleneck is CPU usage (or, at least, it should be unless there's a bug in the code). By allocating more memory, you also get a larger portion of the CPU, so you get more power (speed).

How to calculate monthly usage

Apify offers a few predefined subscription plans with a certain amount of monthly compute units. This might seem really complicated to estimate, but you can do a simple test to get a pretty accurate estimation. 

Most actors in our Store have some information in their README about how much they consume per certain amount of data scraped. So if you read that you can scrape 1000 pages of data for 1 compute unit and you want to scrape approx 2 million of them monthly, that means you will need 2000 compute units monthly and subscribe to the Business plan.

If the actor doesn't have this information, or you want to use your own solution, just run your solution like you want to use it long term. Let's say that you want to scrape the data every hour for the whole month. You set up a reasonable memory allocation like 4096 MB and the whole run takes 15 minutes. That should consume 1 compute unit (4 * 0.25 = 1). Now you just need to multiply that by number of hours in the day and by the number of days in a month and you get an estimated usage of 720 (1 * 24 * 30) compute units monthly.

How do I know how much memory to allocate?

Actors built on top of the Apify SDK are autoscaling. This means that they will always run as efficiently as they can with the memory they have allocated. So if you allocate 2 times more memory, the run should be 2 times faster and consume the same amount of compute units ( 1 * 1 = 0.5 * 2).

A good middle ground to start is 4096 MB. If you need the results faster, increase the memory. You can also try decreasing it to lower the pressure on the target site.

The autoscaling only applies to solutions that run multiple tasks (URLs) for at least 30 seconds. If you need to scrape just one URL or use actors like Google Sheets that do just a single isolated job, we recommend you lower the memory.

Did this answer your question?