Home / File/ parent_document_retriever.py — langchain Source File

parent_document_retriever.py — langchain Source File

Architecture documentation for parent_document_retriever.py, a python file in the langchain codebase. 6 imports, 0 dependents.

Entity Profile

Dependency Diagram

graph LR
  c81a3a7f_13c1_e4d6_0e2c_c63d603040cb["parent_document_retriever.py"]
  8dfa0cac_d802_3ccd_f710_43a5e70da3a5["uuid"]
  c81a3a7f_13c1_e4d6_0e2c_c63d603040cb --> 8dfa0cac_d802_3ccd_f710_43a5e70da3a5
  cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7["collections.abc"]
  c81a3a7f_13c1_e4d6_0e2c_c63d603040cb --> cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7
  8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"]
  c81a3a7f_13c1_e4d6_0e2c_c63d603040cb --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3
  c554676d_b731_47b2_a98f_c1c2d537c0aa["langchain_core.documents"]
  c81a3a7f_13c1_e4d6_0e2c_c63d603040cb --> c554676d_b731_47b2_a98f_c1c2d537c0aa
  5d24a664_4d9b_7491_ea6a_e13ddbcc8eeb["langchain_text_splitters"]
  c81a3a7f_13c1_e4d6_0e2c_c63d603040cb --> 5d24a664_4d9b_7491_ea6a_e13ddbcc8eeb
  c0f56003_5236_544a_87a7_b51b1dd44e68["langchain_classic.retrievers"]
  c81a3a7f_13c1_e4d6_0e2c_c63d603040cb --> c0f56003_5236_544a_87a7_b51b1dd44e68
  style c81a3a7f_13c1_e4d6_0e2c_c63d603040cb fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

import uuid
from collections.abc import Sequence
from typing import Any

from langchain_core.documents import Document
from langchain_text_splitters import TextSplitter

from langchain_classic.retrievers import MultiVectorRetriever


class ParentDocumentRetriever(MultiVectorRetriever):
    """Retrieve small chunks then retrieve their parent documents.

    When splitting documents for retrieval, there are often conflicting desires:

    1. You may want to have small documents, so that their embeddings can most
        accurately reflect their meaning. If too long, then the embeddings can
        lose meaning.
    2. You want to have long enough documents that the context of each chunk is
        retained.

    The ParentDocumentRetriever strikes that balance by splitting and storing
    small chunks of data. During retrieval, it first fetches the small chunks
    but then looks up the parent IDs for those chunks and returns those larger
    documents.

    Note that "parent document" refers to the document that a small chunk
    originated from. This can either be the whole raw document OR a larger
    chunk.

    Examples:
        ```python
        from langchain_chroma import Chroma
        from langchain_community.embeddings import OpenAIEmbeddings
        from langchain_text_splitters import RecursiveCharacterTextSplitter
        from langchain_classic.storage import InMemoryStore

        # This text splitter is used to create the parent documents
        parent_splitter = RecursiveCharacterTextSplitter(
            chunk_size=2000, add_start_index=True
        )
        # This text splitter is used to create the child documents
        # It should create documents smaller than the parent
        child_splitter = RecursiveCharacterTextSplitter(
            chunk_size=400, add_start_index=True
        )
        # The VectorStore to use to index the child chunks
        vectorstore = Chroma(embedding_function=OpenAIEmbeddings())
        # The storage layer for the parent documents
        store = InMemoryStore()

        # Initialize the retriever
        retriever = ParentDocumentRetriever(
            vectorstore=vectorstore,
            docstore=store,
            child_splitter=child_splitter,
            parent_splitter=parent_splitter,
        )
        ```
    """
// ... (117 more lines)

Subdomains

Dependencies

  • collections.abc
  • langchain_classic.retrievers
  • langchain_core.documents
  • langchain_text_splitters
  • typing
  • uuid

Frequently Asked Questions

What does parent_document_retriever.py do?
parent_document_retriever.py is a source file in the langchain codebase, written in python. It belongs to the CoreAbstractions domain, RunnableInterface subdomain.
What does parent_document_retriever.py depend on?
parent_document_retriever.py imports 6 module(s): collections.abc, langchain_classic.retrievers, langchain_core.documents, langchain_text_splitters, typing, uuid.
Where is parent_document_retriever.py in the architecture?
parent_document_retriever.py is located at libs/langchain/langchain_classic/retrievers/parent_document_retriever.py (domain: CoreAbstractions, subdomain: RunnableInterface, directory: libs/langchain/langchain_classic/retrievers).

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free