AI Powered Web Scraping with Jina, Google Sheets and OpenAI : the EASY way

Nodes

083a55c9-df56-4154-959f-52737890cad0bd79094e-5967-4692-8dc9-207cbfbe889a24e3b914-15fa-444f-80e3-ca29bdacaf403b9f1b78-6b80-4575-aa6c-966987000389

Created by

DeDerek Cheung

Last edited 10 days ago

Purpose of workflow: The purpose of this workflow is to automate scraping of a website, transforming it into a structured format, and loading it directly into a Google Sheets spreadsheet.

How it works:

  1. Web Scraping: Uses the Jina AI service to scrape website data and convert it into LLM-friendly text.
  2. Information Extraction: Employs an AI node to extract specific book details (title, price, availability, image URL, product URL) from the scraped data.
  3. Data Splitting: Splits the extracted information into individual book entries.
  4. Google Sheets Integration: Automatically populates a Google Sheets spreadsheet with the structured book data.

Step by step setup:

  1. Set up Jina AI service:

    • Sign up for a Jina AI account and obtain an API key.
  2. Configure the HTTP Request node:

    • Enter the Jina AI URL with the target website.
    • Add the API key to the request headers for authentication.
  3. Set up the Information Extractor node:

    • Use Claude AI to generate a JSON schema for data extraction.
    • Upload a screenshot of the target website to Claude AI.
    • Ask Claude AI to suggest a JSON schema for extracting required information.
    • Copy the generated schema into the Information Extractor node.
  4. Configure the Split node:

    • Set it up to separate the extracted data into individual book entries.
  5. Set up the Google Sheets node:

    • Create a Google Sheets spreadsheet with columns for title, price, availability, image URL, and product URL.
    • Configure the node to map the extracted data to the appropriate columns.

New to n8n?

Need help building new n8n workflows? Process automation for you or your company will save you time and money, and it's completely free!