Installation¶
Requirements¶
Chunkin is built on top of LangChain - a framework for developing applications powered by language models. It uses LangChain's document loaders, text splitters, and vector store integrations.
Core Dependencies¶
Document Format Support¶
# PDF support
pip install pypdf
# Excel support
pip install openpyxl
# Word, PowerPoint, Markdown support (included in langchain-community)
Optional Dependencies by Extra¶
# OpenAI + FAISS (recommended for quick start)
pip install "chunkin[core]"
# With semantic chunking (requires OpenAI or similar)
pip install "chunkin[semantic]"
pip install langchain-openai
# Local vector stores (Chroma, Milvus, LanceDB, etc.)
pip install "chunkin[local]"
# Cloud providers
pip install "chunkin[aws]" # OpenSearch, Valkey, DocumentDB
pip install "chunkin[azure]" # Azure AI Search, Cosmos DB
pip install "chunkin[gcp]" # BigQuery, Vertex AI, Databricks
# All vector stores
pip install "chunkin[all]"
Environment Variables¶
For cloud providers and semantic chunking:
# OpenAI (for semantic chunking)
export OPENAI_API_KEY="your-api-key"
# Azure AI Search
export AZURE_AI_SEARCH_API_KEY="your-key"
export AZURE_AI_SEARCH_ENDPOINT="your-endpoint"
# Pinecone
export PINECONE_API_KEY="your-key"
# Weaviate
export WEAVIATE_URL="your-url"
LangChain Integration¶
Chunkin is designed to work seamlessly with LangChain:
from chunkin import DocumentChunker
from chunkin_indexer import DocIndexer
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
# Use with LangChain components
embeddings = OpenAIEmbeddings()
chunker = DocumentChunker()
indexer = DocIndexer(vector_store_type="faiss", embeddings=embeddings)
For more information about LangChain: - LangChain Documentation - LangChain Text Splitters - LangChain Vector Stores