API Schema Extractor
Categories
Created by
PoPolina Medvedieva
Last edited 39 days ago
This workflow automates the process of discovering and extracting APIs from various services, followed by generating custom schemas. It works in three distinct stages: research, extraction, and schema generation, with each stage tracking progress in a Google Sheet.
🙏 Jim Le deserves major kudos for helping to build this sophisticated three-stage workflow that cleverly automates API documentation processing using a smart combination of web scraping, vector search, and LLM technologies.
How it works
Stage 1 - Research:
- Fetches pending services from a Google Sheet
- Uses Google search to find API documentation
- Employs Apify for web scraping to filter relevant pages
- Stores webpage contents and metadata in Qdrant (vector database)
- Updates progress status in Google Sheet (pending, ok, or error)
Stage 2 - Extraction:
- Processes services that completed research successfully
- Queries vector store to identify products and offerings
- Further queries for relevant API documentation
- Uses Gemini (LLM) to extract API operations
- Records extracted operations in Google Sheet
- Updates progress status (pending, ok, or error)
Stage 3 - Generation:
- Takes services with successful extraction
- Retrieves all API operations from the database
- Combines and groups operations into a custom schema
- Uploads final schema to Google Drive
- Updates final status in sheet with file location
Ideal for:
- Development teams needing to catalog multiple APIs
- API documentation initiatives
- Creating standardized API schema collections
- Automating API discovery and documentation
Accounts required:
- Google account (for Sheets and Drive access)
- Apify account (for web scraping)
- Qdrant database
- Gemini API access
Set up instructions:
- Prepare your Google Sheets document with the services information. Here's an example of a Google Sheet – you can copy it and change or remove the values under the columns. Also, make sure to update Google Sheets nodes with the correct Google Sheet ID.
- Configure Google Sheets OAuth2 credentials, required third-party services (Apify, Qdrant) and Gemini.
- Ensure proper permissions for Google Drive access.
You may also like
New to n8n?
Need help building new n8n workflows? Process automation for you or your company will save you time and money, and it's completely free!