We're really happy that you are considering submitting your web scraping project on Apify! Apify is the leading web scraping and web automation platform and there really isn't a project that we wouldn't be able to deliver 😉 As you might know, your web scraping or web automation project could be delivered through multiple streams. Here is a short recap for clarity:

  • Apify Freelancers - the most affordable and fastest delivery of your custom web scraping solution. Ideal for smaller projects or one-time data extraction. In this case, you are basically hiring a certified web scraping developer (freelancer) who will deliver your web scraping project.

  • Apify Enterprise - fully-customized web scraping and automation solution for any scale. Ideal for companies that require data continuously. In this case, you will have the whole Apify Delivery Team and even our Apify Solution Providers to rely on.

To make sure that we are able to send you an accurate price estimation for your web scraping project, we've created this guide to walk you through some of the most important aspects of each web scraping or web automation project request.

How do we reach the items you need?

Try to describe whether you are only interested in a specific category/subcategory of the website or whether you need to extract data from the whole shop/website. In general, describe how to get detailed URLs for the items that you are interested in. Optionally, we can also search for keywords in the shop and store only those results that come up. The best way to tell us which URLs you are interested in is to describe step-by-step what we should do to get all the URLs you need - as if you were to do the data extraction yourself manually. This will help us understand the whole workflow and estimate how difficult it would be to set up scrapers or automation bots for your project.

What information do you need to save for results?

The most important thing is to tell us what information needs to be extracted and stored for each record that we get for you. Do you want to store title, price, stock count and shipping costs for each item, or is there any other information that you would like to store, such as breadcrumbs? If you are familiar with data types, you can specify these for each of them. If not, don't worry about this and we'll suggest some.

Example web scraping project

Name of your project:
Notino.co.uk pagination scraper

List of URLs of the web pages to extract the data from:
https://www.notino.co.uk/

List of data attributes to extract from each web page:

{

  "itemId": 15843458, //int

  "itemUrl": "https://www.notino.co.uk/armani/code-absolu-eau-de-parfum-for-men/", //string

  "itemName": "Armani Code Absolu Eau de Parfum for Men 110 ml", // string

  "discounted": false, //boolean

  "currentPrice": 2500, //float

  "originalPrice": null, //float

  "category": ["Brands","Armani","Code"] //array of breadcrumbs

}

How many web pages do you need to crawl?
1,000-10,000

How often do you need to get fresh data?
Daily

Please provide a detailed description of how to extract the data. Imagine you're explaining the steps to a person who will perform them manually:

Prepare a solution that will scrape all products from the website. It will start on the website https://www.notino.co.uk/, grab all menu categories, go through all of them and through all pagination and from there take all URLs for item details.

Then visit detail and store data as in the example below.

For every item, we need to save the unique identifier of the product, which is also on the detail page for the product. For notino.co.uk, this is the id number in the data-product-code in HTML, e.g. 15843458 on this URL https://www.notino.co.uk/armani/code-absolu-eau-de-parfum-for-men/ 

If it is discounted, then put TRUE for discounted: currentPrice will be the discounted price and originalPrice is the price before discount.

If the price contains the word "from" (see the screenshot attached), then we need to visit the detail page and grab variants, so we don't miss any items.

Upload any relevant files, e.g. screenshots or a screenshare recording (i.e. via Vidyard or Loom) :

Did this answer your question?