Document Q&A System with OpenAI GPT, Pinecone Vector DB & Google Drive Integration

Last edited 58 days ago

This workflow contains community nodes that are only compatible with the self-hosted version of n8n.

🤖 AI-Powered Document QA System using Webhook, Pinecone + OpenAI + n8n

This project demonstrates how to build a Retrieval-Augmented Generation (RAG) system using n8n, and create a simple Question Answer system using Webhook to connect with User Interface (created using Lovable):

🧾 Downloads the pdf file format documents from Google Drive (contract document, user manual, HR policy document etc...)

📚 Converts them into vector embeddings using OpenAI

🔍 Stores and searches them in Pinecone Vector DB

💬 Allows natural language querying of contracts using AI Agents

📂 Flow 1: Document Loading & RAG Setup

This flow automates:

Reading documents from a Google Drive folder

Vectorizing using text-embedding-3-small

Uploading vectors into Pinecone for later semantic search

🧱 Workflow Structure

A [Manual Trigger] --> B[Google Drive Search]
B --> C[Google Drive Download]
C --> D[Pinecone Vector Store]
D --> E[Default Data Loader]
E --> F[Recursive Character Text Splitter]
E --> G[OpenAI Embedding]

🪜 Steps

Manual Trigger: Kickstarts the workflow on demand for loading new documents.

Google Drive Search & Download

Node: Google Drive (Search: file/folder)

Downloads PDF documents

Apply Recursive Text Splitter: Breaks long documents into overlapping chunks

Settings:
Chunk Size: 1000
Chunk Overlap: 100

OpenAI Embedding

Model: text-embedding-3-small
Used for creating document vectors

Pinecone Vector Store

Host: url
Index: index
Batch Size: 200

Pinecone Settings:

Type: Dense
Region: us-east-1
Mode: Insert Documents

💬 Flow 2: Chat-Based Q&A Agent

This flow enables chat-style querying of stored documents using OpenAI-powered agents with vector memory.

🧱 Workflow Diagram

A[Webhook (chat message)] --> B[AI Agent]
B --> C[OpenAI Chat Model]
B --> D[Simple Memory]
B --> E[Answer with Vector Store]
E --> F[Pinecone Vector Store]
F --> G[Embeddings OpenAI]

🪜 Components

Chat (Trigger): Receives incoming chat queries

AI Agent Node

Handles query flow using:

Chat Model: OpenAI GPT

Memory: Simple Memory

Tool: Question Answer with Vector Store

Pinecone Vector Store: Connected via same embedding index as Flow 1

Embeddings: Ensures document chunks are retrievable using vector similarity

Response Node: Returns final AI response to user via webhook

🌐 Flow 3: UI-Based Query with Lovable

This flow uses a web UI built using Lovable to query contracts directly from a form interface.

📥 Webhook Setup for Lovable

Webhook Node

Method: POST
URL:url
Response: Using 'Respond to Webhook' Node

🧱 Workflow Logic

A[Webhook (Lovable Form)] --> B[AI Agent]
B --> C[OpenAI Chat Model]
B --> D[Simple Memory]
B --> E[Answer with Vector Store]
E --> F[Pinecone Vector Store]
F --> G[Embeddings OpenAI]
B --> H[Respond to Webhook]

💡 Lovable UI

Users can submit:

Full Name
Email
Department
Freeform Query: User can enter any freeform query.
image.png
Data is sent via webhook to n8n and responded with the answer from contract content.

🔍 Use Cases

Contract Querying for Legal/HR teams

Procurement & Vendor Agreement QA

Customer Support Automation (based on terms)

RAG Systems for private document knowledge

⚙️ Tools & Tech Stack

image.png

📌 Final Notes
Pinecone Index: package1536

Dimension: 1536

Chunk Size: 1000, Overlap: 100

Embedding Model: text-embedding-3-small

Feel free to fork the workflow or request the full JSON export.
Looking forward to your suggestions and improvements!

New to n8n?

Need help building new n8n workflows? Process automation for you or your company will save you time and money, and it's completely free!