You might be wondering whether it is possible to publish Actor tasks. The answer is: sort of. Instead of publishing tasks directly, you can create a new unique Actor from your task using the Actor metamorph feature and publish the Actor instead.
This allows you to leverage Actor features such as input schema to define user-friendly input fields for your Actor or add a README.md file containing documentation for the new Actor.
In this example, we will publish an Actor task of the apify/web-scraper Actor that scrapes all articles from a selected category at Economist.com website.
First, create a new task of the apify/web-scraper Actor and set the following input configuration:
Start URL is the first page of the category we want to scrape
https://www.economist.com/united-states/?page=1
Pseudo-URL is a pattern that matches all the subsequent pages
https://www.economist.com/united-states/?page=[\d+]
Link selector is
a
A page function that is executed on each page the crawler visits:
async function pageFunction(context) {
// request is an instance of Apify.Request (https://sdk.apify.com/docs/api/request)
// $ is an instance of jQuery (http://jquery.com/)
const request = context.request;
const $ = context.jQuery;
const pageNum = parseInt(request.url.split('?page=').pop());
context.log.info(Scraping ${context.request.url});
// Extract all articles.
const articles = [];
$('article').each((index, articleEl) => {
const $articleEl = $(articleEl);
// H3 contains 2 child elements where first one is topic and second is article title.
const $h3El = $articleEl.find('h3');
// Extract additonal info and push it to data object.
articles.push({
pageNum,
topic: $h3El.children().first().text(),
title: $h3El.children().last().text(),
url: $articleEl.find('a')[0].href,
teaser: $articleEl.find('.teaser__text').text(),
});
});
// Return results.
return articles;
}
Now that you have created the task, click the "Create Actor from task" button in the top-right menu:
After confirmation, the system creates a new Actor and opens its page. If you go to the Source tab and open the main.js
file, you will see that it contains the whole Actor task configuration, and in the last line it contains Apify.metamorph()
call to metamorh your Actor run to apify/web-scraper.
const Apify = require('apify');
Apify.main(async () => {
// Get input of the Actor. Input fields can be modified in INPUT_SCHEMA.json file.
// For more information, see https://apify.com/docs/actor/input-schema
const input = await Apify.getInput();
console.log('Input:');
console.dir(input);
// Here you can prepare your input for Actor apify/web-scraper this input is based on a Actor
// task you used as the starting point.
const metamorphInput = {
"startUrls": [
{
"url": "https://www.economist.com/united-states/?page=1",
"method": "GET"
}
],
"pseudoUrls": [
{
"purl": "https://www.economist.com/united-states/?page=[\\d+]",
"method": "GET"
}
],
"useRequestQueue": true,
"linkSelector": "a",
// ... here are other fields such as pageFunction that we omit for simplicity
};
// Now let's metamorph into Actor apify/web-scraper using the created input.
await Apify.metamorph('apify/web-scraper', metamorphInput);
});
To learn more about the metamorph feature, please see the documentation.
Now, let's say we want to enable users of the new Actor to select a category on the Economist.com for Actor input. To do that, we just need to update the following lines of code:
"startUrls": [
{
"url": "https://www.economist.com/united-states/?page=1",
"method": "GET"
}
],
"pseudoUrls": [
{
"purl": "https://www.economist.com/united-states/?page=[\\d+]",
"method": "GET"
}
],
to use the category from the input:
"startUrls": [
{
"url": `https://www.economist.com/${input.category}/?page=1`,
"method": "GET"
}
],
"pseudoUrls": [
{
"purl": `https://www.economist.com/${input.category}/?page=[\\d+]`,
"method": "GET"
}
],
Then we just need to update the INPUT_SCHEMA.json
file to include a text field for the category selection:
{
"title": "My input schema",
"type": "object",
"schemaVersion": 1,
"properties": {
"category": {
"title": "Category",
"type": "string",
"description": "Economist.com category to be scraped",
"editor": "textarea",
"prefill": "briefing"
}
}
}
After you build the Actor, you will see the automatically generated user interface for the input:
Now, what remains is to write a short description and document all the features of the new Actor in the README.md file and then we can finally publish using the Publication tab:
And that's it, you have just published a task as a new Actor. You can check this example in Apify Store as the mtrunkat/economist-category-scraper Actor.