Home / Class/ Document Class — langchain Architecture

Document Class — langchain Architecture

Architecture documentation for the Document class in base.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  38b769c4_c07d_3256_6870_9c6ee6931708["Document"]
  4de49bd6_375f_b3b1_f2be_3ee692248f19["BaseMedia"]
  38b769c4_c07d_3256_6870_9c6ee6931708 -->|extends| 4de49bd6_375f_b3b1_f2be_3ee692248f19
  cc001a5a_6b74_1bfc_b3e2_70eb3496588d["base.py"]
  38b769c4_c07d_3256_6870_9c6ee6931708 -->|defined in| cc001a5a_6b74_1bfc_b3e2_70eb3496588d
  cb75f349_06be_259f_a8ca_8ecaa856dbf8["__init__()"]
  38b769c4_c07d_3256_6870_9c6ee6931708 -->|method| cb75f349_06be_259f_a8ca_8ecaa856dbf8
  087d93f2_6ba6_49d2_6e83_48e11e65a751["is_lc_serializable()"]
  38b769c4_c07d_3256_6870_9c6ee6931708 -->|method| 087d93f2_6ba6_49d2_6e83_48e11e65a751
  e9826ab7_8d31_e2e8_fa42_c1d4c928cbde["get_lc_namespace()"]
  38b769c4_c07d_3256_6870_9c6ee6931708 -->|method| e9826ab7_8d31_e2e8_fa42_c1d4c928cbde
  bed318af_64a4_65fa_6c8d_36e32e483e9b["__str__()"]
  38b769c4_c07d_3256_6870_9c6ee6931708 -->|method| bed318af_64a4_65fa_6c8d_36e32e483e9b

Relationship Graph

Source Code

libs/core/langchain_core/documents/base.py lines 288–347

class Document(BaseMedia):
    """Class for storing a piece of text and associated metadata.

    !!! note

        `Document` is for **retrieval workflows**, not chat I/O. For sending text
        to an LLM in a conversation, use message types from `langchain.messages`.

    Example:
        ```python
        from langchain_core.documents import Document

        document = Document(
            page_content="Hello, world!", metadata={"source": "https://example.com"}
        )
        ```
    """

    page_content: str
    """String text."""

    type: Literal["Document"] = "Document"

    def __init__(self, page_content: str, **kwargs: Any) -> None:
        """Pass page_content in as positional or named arg."""
        # my-py is complaining that page_content is not defined on the base class.
        # Here, we're relying on pydantic base class to handle the validation.
        super().__init__(page_content=page_content, **kwargs)  # type: ignore[call-arg,unused-ignore]

    @classmethod
    def is_lc_serializable(cls) -> bool:
        """Return `True` as this class is serializable."""
        return True

    @classmethod
    def get_lc_namespace(cls) -> list[str]:
        """Get the namespace of the LangChain object.

        Returns:
            `["langchain", "schema", "document"]`
        """
        return ["langchain", "schema", "document"]

    def __str__(self) -> str:
        """Override `__str__` to restrict it to page_content and metadata.

        Returns:
            A string representation of the `Document`.
        """
        # The format matches pydantic format for __str__.
        #
        # The purpose of this change is to make sure that user code that feeds
        # Document objects directly into prompts remains unchanged due to the addition
        # of the id field (or any other fields in the future).
        #
        # This override will likely be removed in the future in favor of a more general
        # solution of formatting content directly inside the prompts.
        if self.metadata:
            return f"page_content='{self.page_content}' metadata={self.metadata}"
        return f"page_content='{self.page_content}'"

Extends

Frequently Asked Questions

What is the Document class?
Document is a class in the langchain codebase, defined in libs/core/langchain_core/documents/base.py.
Where is Document defined?
Document is defined in libs/core/langchain_core/documents/base.py at line 288.
What does Document extend?
Document extends BaseMedia.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free