parent_document_retriever.py — langchain Source File
Architecture documentation for parent_document_retriever.py, a python file in the langchain codebase. 6 imports, 0 dependents.
Entity Profile
Dependency Diagram
graph LR c81a3a7f_13c1_e4d6_0e2c_c63d603040cb["parent_document_retriever.py"] 8dfa0cac_d802_3ccd_f710_43a5e70da3a5["uuid"] c81a3a7f_13c1_e4d6_0e2c_c63d603040cb --> 8dfa0cac_d802_3ccd_f710_43a5e70da3a5 cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7["collections.abc"] c81a3a7f_13c1_e4d6_0e2c_c63d603040cb --> cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"] c81a3a7f_13c1_e4d6_0e2c_c63d603040cb --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3 c554676d_b731_47b2_a98f_c1c2d537c0aa["langchain_core.documents"] c81a3a7f_13c1_e4d6_0e2c_c63d603040cb --> c554676d_b731_47b2_a98f_c1c2d537c0aa 5d24a664_4d9b_7491_ea6a_e13ddbcc8eeb["langchain_text_splitters"] c81a3a7f_13c1_e4d6_0e2c_c63d603040cb --> 5d24a664_4d9b_7491_ea6a_e13ddbcc8eeb c0f56003_5236_544a_87a7_b51b1dd44e68["langchain_classic.retrievers"] c81a3a7f_13c1_e4d6_0e2c_c63d603040cb --> c0f56003_5236_544a_87a7_b51b1dd44e68 style c81a3a7f_13c1_e4d6_0e2c_c63d603040cb fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
import uuid
from collections.abc import Sequence
from typing import Any
from langchain_core.documents import Document
from langchain_text_splitters import TextSplitter
from langchain_classic.retrievers import MultiVectorRetriever
class ParentDocumentRetriever(MultiVectorRetriever):
"""Retrieve small chunks then retrieve their parent documents.
When splitting documents for retrieval, there are often conflicting desires:
1. You may want to have small documents, so that their embeddings can most
accurately reflect their meaning. If too long, then the embeddings can
lose meaning.
2. You want to have long enough documents that the context of each chunk is
retained.
The ParentDocumentRetriever strikes that balance by splitting and storing
small chunks of data. During retrieval, it first fetches the small chunks
but then looks up the parent IDs for those chunks and returns those larger
documents.
Note that "parent document" refers to the document that a small chunk
originated from. This can either be the whole raw document OR a larger
chunk.
Examples:
```python
from langchain_chroma import Chroma
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_classic.storage import InMemoryStore
# This text splitter is used to create the parent documents
parent_splitter = RecursiveCharacterTextSplitter(
chunk_size=2000, add_start_index=True
)
# This text splitter is used to create the child documents
# It should create documents smaller than the parent
child_splitter = RecursiveCharacterTextSplitter(
chunk_size=400, add_start_index=True
)
# The VectorStore to use to index the child chunks
vectorstore = Chroma(embedding_function=OpenAIEmbeddings())
# The storage layer for the parent documents
store = InMemoryStore()
# Initialize the retriever
retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=store,
child_splitter=child_splitter,
parent_splitter=parent_splitter,
)
```
"""
// ... (117 more lines)
Domain
Subdomains
Classes
Dependencies
- collections.abc
- langchain_classic.retrievers
- langchain_core.documents
- langchain_text_splitters
- typing
- uuid
Source
Frequently Asked Questions
What does parent_document_retriever.py do?
parent_document_retriever.py is a source file in the langchain codebase, written in python. It belongs to the CoreAbstractions domain, RunnableInterface subdomain.
What does parent_document_retriever.py depend on?
parent_document_retriever.py imports 6 module(s): collections.abc, langchain_classic.retrievers, langchain_core.documents, langchain_text_splitters, typing, uuid.
Where is parent_document_retriever.py in the architecture?
parent_document_retriever.py is located at libs/langchain/langchain_classic/retrievers/parent_document_retriever.py (domain: CoreAbstractions, subdomain: RunnableInterface, directory: libs/langchain/langchain_classic/retrievers).
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free