Interactive Knowledge Base Chat with Supabase RAG using AI ππ¬
Categories
Created by
Last edited 10 days ago
Google Drive File Ingestion to Supabase for Knowledge Base ππΎ
Overview π
This n8n workflow automates the process of ingesting files from Google Drive into a Supabase database, preparing them for a knowledge base system. It supports text-based files (PDF, DOCX, TXT, etc.) and tabular data (XLSX, CSV, Google Sheets), extracting content, generating embeddings, and storing data in structured tables. This is a foundational workflow for building a company knowledge base that can be queried via a chat interface (e.g., using a RAG workflow). π
Problem Solved π―
Manually managing a knowledge base with files from Google Drive is time-consuming and error-prone. This workflow solves that by:
- Automatically ingesting files from Google Drive as they are created or updated.
- Extracting content from various file types (text and tabular).
- Generating embeddings for text-based files to enable vector search.
- Storing data in Supabase for efficient retrieval.
- Handling duplicates and errors to ensure data consistency.
Target Audience:
- Knowledge Managers: Build a centralized knowledge base from company files.
- Data Teams: Automate the ingestion of spreadsheets and documents.
- Developers: Integrate with other workflows (e.g., RAG for querying the knowledge base).
Workflow Description π
This workflow listens for new or updated files in Google Drive, processes them based on their type, and stores the extracted data in Supabase tables for later retrieval. Hereβs how it works:
- File Detection: Triggers when a file is created or updated in Google Drive.
- File Processing: Loops through each file, extracts metadata, and validates the file type.
- Duplicate Check: Ensures the file hasnβt been processed before.
- Content Extraction:
- Text-based Files: Downloads the file, extracts text, splits it into chunks, generates embeddings, and stores the chunks in Supabase.
- Tabular Files: Extracts data from spreadsheets and stores it as rows in Supabase.
- Metadata Storage: Stores file metadata and basic info in Supabase tables.
- Error Handling: Logs errors for unsupported formats or duplicates.
Nodes Breakdown π οΈ
1. Detect New File π
- Type: Google Drive Trigger
- Purpose: Triggers the workflow when a new file is created in Google Drive.
- Configuration:
- Credential: Google Drive OAuth2
- Event:
File Created
- Customization:
- Specify a folder to monitor specific directories.
2. Detect Updated File π
- Type: Google Drive Trigger
- Purpose: Triggers the workflow when a file is updated in Google Drive.
- Configuration:
- Credential: Google Drive OAuth2
- Event:
File Updated
- Customization:
- Currently disconnected; reconnect if updates need to be processed.
3. Process Each File π
- Type: Loop Over Items
- Purpose: Processes each file individually from the Google Drive trigger.
- Configuration:
- Input:
{{ $json.files }}
- Input:
- Customization:
- Adjust the batch size if processing multiple files at once.
4. Extract File Metadata π
- Type: Set
- Purpose: Extracts metadata like
file_id
,file_name
,mime_type
, andweb_view_link
. - Configuration:
- Fields:
file_id
:{{ $json.id }}
file_name
:{{ $json.name }}
mime_type
:{{ $json.mimeType }}
web_view_link
:{{ $json.webViewLink }}
- Fields:
- Customization:
- Add more metadata fields if needed (e.g.,
size
,createdTime
).
- Add more metadata fields if needed (e.g.,
5. Check File Type β
- Type: IF
- Purpose: Validates the file type by checking the MIME type.
- Configuration:
- Condition:
mime_type
contains supported types (e.g.,application/pdf
,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
).
- Condition:
- Customization:
- Add more supported MIME types as needed.
6. Find Duplicates π
- Type: Supabase
- Purpose: Checks if the file has already been processed by querying
knowledge_base
. - Configuration:
- Operation: Select
- Table:
knowledge_base
- Filter:
file_id = {{ $node['Extract File Metadata'].json.file_id }}
- Customization:
- Add additional duplicate checks (e.g., by file name).
7. Handle Duplicates π
- Type: IF
- Purpose: Routes the workflow based on whether a duplicate is found.
- Configuration:
- Condition:
{{ $node['Find Duplicates'].json.length > 0 }}
- Condition:
- Customization:
- Add notifications for duplicates if desired.
8. Remove Old Text Data ποΈ
- Type: Supabase
- Purpose: Deletes old text data from
documents
if the file is a duplicate. - Configuration:
- Operation: Delete
- Table:
documents
- Filter:
metadata->>'file_id' = {{ $node['Extract File Metadata'].json.file_id }}
- Customization:
- Add logging before deletion.
9. Remove Old Data ποΈ
- Type: Supabase
- Purpose: Deletes old tabular data from
document_rows
if the file is a duplicate. - Configuration:
- Operation: Delete
- Table:
document_rows
- Filter:
dataset_id = {{ $node['Extract File Metadata'].json.file_id }}
- Customization:
- Add logging before deletion.
10. Route by File Type π
- Type: Switch
- Purpose: Routes the workflow based on the fileβs MIME type (text-based or tabular).
- Configuration:
- Rules: Based on
mime_type
(e.g.,application/pdf
for text,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
for tabular).
- Rules: Based on
- Customization:
- Add more routes for additional file types.
11. Download File Content π₯
- Type: Google Drive
- Purpose: Downloads the file content for text-based files.
- Configuration:
- Credential: Google Drive OAuth2
- File ID:
{{ $node['Extract File Metadata'].json.file_id }}
- Customization:
- Add error handling for download failures.
12. Extract PDF Text π
- Type: Extract from File (PDF)
- Purpose: Extracts text from PDF files.
- Configuration:
- File Content:
{{ $node['Download File Content'].binary.data }}
- File Content:
- Customization:
- Adjust extraction settings for better accuracy.
13. Extract DOCX Text π
- Type: Extract from File (DOCX)
- Purpose: Extracts text from DOCX files.
- Configuration:
- File Content:
{{ $node['Download File Content'].binary.data }}
- File Content:
- Customization:
- Add support for other text formats (e.g., TXT, RTF).
14. Extract XLSX Data π
- Type: Extract from File (XLSX)
- Purpose: Extracts tabular data from XLSX files.
- Configuration:
- File ID:
{{ $node['Extract File Metadata'].json.file_id }}
- File ID:
- Customization:
- Add support for CSV or Google Sheets.
15. Split Text into Chunks βοΈ
- Type: Text Splitter
- Purpose: Splits extracted text into manageable chunks for embedding.
- Configuration:
- Chunk Size:
1000
- Chunk Overlap:
200
- Chunk Size:
- Customization:
- Adjust chunk size and overlap based on document length.
16. Generate Text Embeddings π
- Type: OpenAI
- Purpose: Generates embeddings for text chunks using OpenAI.
- Configuration:
- Credential: OpenAI API key
- Operation: Embedding
- Model:
text-embedding-ada-002
- Customization:
- Switch to a different embedding model if needed.
17. Store Text in Supabase πΎ
- Type: Supabase Vector Store
- Purpose: Stores text chunks and embeddings in the
documents
table. - Configuration:
- Credential: Supabase credentials
- Operation: Insert Documents
- Table Name:
documents
- Customization:
- Add metadata fields to store additional context.
18. Store Tabular Data πΎ
- Type: Supabase
- Purpose: Stores tabular data in the
document_rows
table. - Configuration:
- Operation: Insert
- Table:
document_rows
- Columns:
dataset_id
,row_data
- Customization:
- Add validation for tabular data structure.
19. Store File Metadata π
- Type: Supabase
- Purpose: Stores file metadata in the
document_metadata
table. - Configuration:
- Operation: Insert
- Table:
document_metadata
- Columns:
file_id
,file_name
,file_type
,file_url
- Customization:
- Add more metadata fields as needed.
20. Record in Knowledge Base π
- Type: Supabase
- Purpose: Stores basic file info in the
knowledge_base
table. - Configuration:
- Operation: Insert
- Table:
knowledge_base
- Columns:
file_id
,file_name
,file_type
,file_url
,upload_date
- Customization:
- Add indexes for faster lookups.
21. Log File Errors β οΈ
- Type: Supabase
- Purpose: Logs errors for unsupported file types.
- Configuration:
- Operation: Insert
- Table:
error_log
- Columns:
error_type
,error_message
- Customization:
- Add notifications for errors.
22. Log Duplicate Errors β οΈ
- Type: Supabase
- Purpose: Logs errors for duplicate files.
- Configuration:
- Operation: Insert
- Table:
error_log
- Columns:
error_type
,error_message
- Customization:
- Add notifications for duplicates.
Interactive Knowledge Base Chat with Supabase RAG using GPT-4o-mini ππ¬
Introduction π
This n8n workflow creates an interactive chat interface that allows users to query a company knowledge base using Retrieval-Augmented Generation (RAG). It retrieves relevant information from text documents and tabular data stored in Supabase, then generates natural language responses using OpenAIβs GPT-4o-mini model. Designed for teams managing internal knowledge, this workflow enables users to ask questions like βWhatβs the remote work policy?β or βShow me the latest budget dataβ and receive accurate, context-aware responses in a conversational format. π
Problem Statement π―
Managing a company knowledge base can be a daunting taskβemployees often struggle to find specific information buried in documents or spreadsheets, leading to wasted time and inefficiencies. Traditional search methods may not understand natural language queries or provide contextually relevant results. This workflow solves these issues by:
- Offering a chat-based interface for natural language queries, making it easy for users to ask questions in their own words.
- Leveraging RAG to retrieve relevant text and tabular data from Supabase, ensuring responses are accurate and context-aware.
- Supporting diverse file types, including text-based files (e.g., PDFs, DOCX) and tabular data (e.g., XLSX, CSV), for comprehensive knowledge access.
- Maintaining conversation history to provide context during interactions, improving the user experience.
Target Audience π₯
This workflow is ideal for:
- HR Teams: Quickly access company policies, employee handbooks, or benefits documents.
- Finance Teams: Retrieve budget data, expense reports, or financial summaries from spreadsheets.
- Knowledge Managers: Build a centralized assistant for internal documentation, streamlining information access.
- Developers: Extend the workflow with additional tools or integrations for custom use cases.
Workflow Description π
This workflow consists of a chat interface powered by n8nβs Chat Trigger node, an AI Agent node for RAG, and several tools to retrieve data from Supabase. Hereβs how it works step-by-step:
- User Initiates a Chat: The user interacts with a chat interface, sending queries like βSummarize our remote work policyβ or βShow budget data for Q1 2025.β
- Query Processing with RAG: The AI Agent processes the query using RAG, retrieving relevant data from Supabase tables and generating a response with OpenAIβs GPT-4o-mini model.
- Data Retrieval and Response Generation: The workflow uses multiple tools to fetch data:
- Retrieves text chunks from the
documents
table using vector search. - Fetches tabular data from the
document_rows
table based on file IDs. - Extracts full document text or lists available files as needed.
- Generates a natural language response combining the retrieved data.
- Retrieves text chunks from the
- Conversation History Management: Stores the conversation history in Supabase to maintain context for follow-up questions.
- Response Delivery: Formats and sends the response back to the chat interface for the user to view.
Nodes Breakdown π οΈ
1. Start Chat Interface π¬
- Type: Chat Trigger
- Purpose: Provides the interactive chat interface for users to input queries and receive responses.
- Configuration:
- Chat Title:
Company Knowledge Base Assistant
- Chat Subtitle:
Ask me anything about company documents!
- Welcome Message:
Hello! Iβm your Company Knowledge Base Assistant. How can I help you today?
- Suggestions:
What is the company policy on remote work?
,Show me the latest budget data.
,List all policy documents.
- Output Chat Session ID:
true
- Output User Message:
true
- Chat Title:
- Customization:
- Update the title and welcome message to align with your company branding (e.g.,
HR Knowledge Assistant
). - Add more suggestions relevant to your use case (e.g.,
What are the company benefits?
).
- Update the title and welcome message to align with your company branding (e.g.,
2. Process Query with RAG π§
- Type: AI Agent
- Purpose: Orchestrates the RAG process by retrieving relevant data using tools and generating responses with OpenAIβs GPT-4o-mini.
- Configuration:
- Credential: OpenAI API key
- Model:
gpt-4o-mini
- System Prompt:
You are a helpful assistant for a company knowledge base. Use the provided tools to retrieve relevant information from documents and tabular data. If the query involves tabular data, format it clearly in your response. If no relevant data is found, respond with "I couldnβt find any relevant information. Can you provide more details?"
- Input Field:
{{ $node['Start Chat Interface'].json.message }}
- Customization:
- Switch to a different model (e.g.,
gpt-3.5-turbo
) to adjust cost or performance. - Modify the system prompt to change the tone (e.g., more formal for HR use cases).
- Switch to a different model (e.g.,
3. Retrieve Text Chunks π
- Type: Supabase Vector Store (Tool)
- Purpose: Retrieves relevant text chunks from the
documents
table using vector search. - Configuration:
- Credential: Supabase credentials
- Operation Mode: Retrieve Documents (As Tool for AI Agent)
- Table Name:
documents
- Embedding Field:
embedding
- Content Field:
content_text
- Metadata Field:
metadata
- Embedding Model: OpenAI
text-embedding-ada-002
- Top K:
10
- Customization:
- Adjust
Top K
to retrieve more or fewer results (e.g.,5
for faster responses). - Ensure the
match_documents
function (see prerequisites) is defined in Supabase.
- Adjust
4. Fetch Tabular Data π
- Type: Supabase (Tool, Execute Query)
- Purpose: Retrieves tabular data from the
document_rows
table based on a file ID. - Configuration:
- Credential: Supabase credentials
- Operation: Execute Query
- Query:
SELECT row_data FROM document_rows WHERE dataset_id = $1 LIMIT 10
- Tool Description:
Run a SQL query - use this to query from the document_rows table once you know the file ID you are querying. dataset_id is the file_id and you are always using the row_data for filtering, which is a jsonb field that has all the keys from the file schema given in the document_metadata table.
- Customization:
- Modify the query to filter specific columns or add conditions (e.g.,
WHERE dataset_id = $1 AND row_data->>'year' = '2025'
). - Increase the
LIMIT
for larger datasets.
- Modify the query to filter specific columns or add conditions (e.g.,
5. Extract Full Document Text π
- Type: Supabase (Tool, Execute Query)
- Purpose: Fetches the full text of a document by concatenating all text chunks for a given
file_id
. - Configuration:
- Credential: Supabase credentials
- Operation: Execute Query
- Query:
SELECT string_agg(content_text, ' ') as document_text FROM documents WHERE metadata->>'file_id' = $1 GROUP BY metadata->>'file_id'
- Tool Description:
Given file id fetch the text from the documents
- Customization:
- Add filters to the query if needed (e.g., limit to specific metadata fields).
6. List Available Files π
- Type: Supabase (Tool, Select)
- Purpose: Lists all files in the knowledge base from the
document_metadata
table. - Configuration:
- Credential: Supabase credentials
- Operation: Select
- Schema:
public
- Table:
document_metadata
- Tool Description:
Use this tool to fetch all documents including the table schema if the file is csv, excel or xlsx
- Customization:
- Add filters to list specific file types (e.g.,
WHERE file_type = 'application/pdf'
). - Modify the columns selected to include additional metadata (e.g.,
file_size
).
- Add filters to list specific file types (e.g.,
7. Manage Chat History πΎ
- Type: Postgres Chat Memory (Tool)
- Purpose: Stores and retrieves conversation history to maintain context.
- Configuration:
- Credential: Supabase credentials (Postgres-compatible)
- Table Name:
n8n_chat_history
- Session ID Field:
session_id
- Session ID Value:
{{ $node['Start Chat Interface'].json.sessionId }}
- Message Field:
message
- Sender Field:
sender
- Timestamp Field:
timestamp
- Context Window Length:
5
- Customization:
- Increase the context window length for longer conversations (e.g.,
10
messages). - Add indexes on
session_id
andtimestamp
in Supabase for better performance.
- Increase the context window length for longer conversations (e.g.,
8. Format and Send Response π€
- Type: Set
- Purpose: Formats the AI Agentβs response and sends it back to the chat interface.
- Configuration:
- Fields:
response
:{{ $node['Process Query with RAG'].json.output }}
- Fields:
- Customization:
- Add additional formatting to the response if needed (e.g., prepend with a timestamp or apply markdown formatting).
Setup Instructions π οΈ
Prerequisites π
- n8n Setup:
- Ensure youβre using n8n version 1.0 or higher.
- Enable the AI features in n8n settings.
- Supabase:
- Create a Supabase project and set up the following tables:
documents
:id (uuid)
,content_text (text)
,embedding (vector(1536))
,metadata (jsonb)
document_rows
:id (uuid)
,dataset_id (varchar)
,row_data (jsonb)
document_metadata
:file_id (varchar)
,file_name (varchar)
,file_type (varchar)
,file_url (text)
knowledge_base
:id (serial)
,file_id (varchar)
,file_name (varchar)
,file_type (varchar)
,file_url (text)
,upload_date (timestamp)
n8n_chat_history
:id (serial)
,session_id (varchar)
,message (text)
,sender (varchar)
,timestamp (timestamp)
- Add the
match_documents
function to Supabase to enable vector search:CREATE OR REPLACE FUNCTION match_documents ( query_embedding vector(1536), match_count int DEFAULT 5, filter jsonb DEFAULT '{}' ) RETURNS TABLE ( id uuid, content_text text, metadata jsonb, similarity float ) LANGUAGE plpgsql AS $$ BEGIN RETURN QUERY SELECT documents.id, documents.content_text, documents.metadata, 1 - (documents.embedding <=> query_embedding) as similarity FROM documents WHERE documents.metadata @> filter ORDER BY similarity DESC LIMIT match_count; END; $$;
- Create a Supabase project and set up the following tables:
You may also like
New to n8n?
Need help building new n8n workflows? Process automation for you or your company will save you time and money, and it's completely free!