Build a Lists Crawler: Managed Scraper + Automation

September 5, 2024
7
minutes read
web scraper, web scraping, lists crawler, multiple urls to scrape

WEBINAR

Join our free webinar on

Using Byteline Web Scraper

Thursday Jun 10 at 1 pm ET

Join us to learn the usage of Byteline's no-code Web Scraper along with its Chrome extension. We will then utilize it to scrape an eCommerce store.

Check out the agenda and let us know what more should be covered.

Why create a list crawler?

Web scraping from pages is a fast and efficient method of capturing data from tables and nested pages. It’s widely used for research, price monitoring, E-Commerce, marketing and much more. Sometimes web scraping, whether you’ve written a script, using a browser extension or even hiring a data mining service doesn’t go far enough to collect information from multiple URLs.

How does Byteline solve for this?

We’ve combined the power of our Managed Scraper service with the flexibility of Workflow Automaton, giving you the control over easily configuring a lists crawler in minutes. Byteline reads the data from your spreadsheet with a list of URLs and passes them through the configured scrape. The steps to go from an idea to a live lists crawler are:

  1. Submit a request for the site you want scraped
  2. Go to the Scraper Dashboard and add the scraper from your ‘Site Request Status’ once your request has been completed
  3. Create a flow from your Automation Dashboard, starting with the Batch Scheduler trigger node that pulls the URL list from a Google Sheet.

What you’ll need

Below is a step-by-step breakdown of how this looks. We are using the site Eventbrite as an example.

  1. From your dashboard (https://console.byteline.io/scraper/dashboard), go to the Managed Scraper service, and select “Request a site”
  1. Enter one of the URLs from the site that you would like scraped
  1. Describe the information that you would like captured and the frequency
  1. You will be directed to a confirmation page with an overview of the information that you provided
  1. You will receive an email from us with the header ‘Your request has been successfully completed!’. Either from the Byteline console or clicking ‘Click here to get data’ on the email - navigate to the Scraper Dashboard and select “Add Scraper”
  1. Select which data fields you want captured
  1. From your Configured Web Scrapers table, select Automate from the configured scrape row
  2. This will bring you to a pre-configured flow in Workflow Automation with a Scheduler trigger
  1. Change this to Batch Scheduler from the Select Trigger Node popup
  1. Select Batch Scheduler to set the frequency and source of data

Your Spreadsheet should have a header with the list of URLs. Here is the list we’re using for this example:

  1. On the Managed Scraper node, check ‘Do you want to overwrite the URL?’ and pick your input from the input box
  1. Choose where you want the output of the batch scrape and map fields. NOTE: Select at least one ‘Mark as unique’ to prevent duplicate records

Run a test to ensure everything is working correctly. You can either add more nodes or set your flow to live.

That’s it! Get started here to request your first managed scrape.

Resources

Upvote this feature

If you like this feature and are interested in using it, please upvote it from the Byteline Console at https://console.byteline.io

How can I use it?

This feature is generally available and you can start using it from the Byteline Console at https://console.byteline.io/