Document Class — langchain Architecture
Architecture documentation for the Document class in base.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 38b769c4_c07d_3256_6870_9c6ee6931708["Document"] 4de49bd6_375f_b3b1_f2be_3ee692248f19["BaseMedia"] 38b769c4_c07d_3256_6870_9c6ee6931708 -->|extends| 4de49bd6_375f_b3b1_f2be_3ee692248f19 cc001a5a_6b74_1bfc_b3e2_70eb3496588d["base.py"] 38b769c4_c07d_3256_6870_9c6ee6931708 -->|defined in| cc001a5a_6b74_1bfc_b3e2_70eb3496588d cb75f349_06be_259f_a8ca_8ecaa856dbf8["__init__()"] 38b769c4_c07d_3256_6870_9c6ee6931708 -->|method| cb75f349_06be_259f_a8ca_8ecaa856dbf8 087d93f2_6ba6_49d2_6e83_48e11e65a751["is_lc_serializable()"] 38b769c4_c07d_3256_6870_9c6ee6931708 -->|method| 087d93f2_6ba6_49d2_6e83_48e11e65a751 e9826ab7_8d31_e2e8_fa42_c1d4c928cbde["get_lc_namespace()"] 38b769c4_c07d_3256_6870_9c6ee6931708 -->|method| e9826ab7_8d31_e2e8_fa42_c1d4c928cbde bed318af_64a4_65fa_6c8d_36e32e483e9b["__str__()"] 38b769c4_c07d_3256_6870_9c6ee6931708 -->|method| bed318af_64a4_65fa_6c8d_36e32e483e9b
Relationship Graph
Source Code
libs/core/langchain_core/documents/base.py lines 288–347
class Document(BaseMedia):
"""Class for storing a piece of text and associated metadata.
!!! note
`Document` is for **retrieval workflows**, not chat I/O. For sending text
to an LLM in a conversation, use message types from `langchain.messages`.
Example:
```python
from langchain_core.documents import Document
document = Document(
page_content="Hello, world!", metadata={"source": "https://example.com"}
)
```
"""
page_content: str
"""String text."""
type: Literal["Document"] = "Document"
def __init__(self, page_content: str, **kwargs: Any) -> None:
"""Pass page_content in as positional or named arg."""
# my-py is complaining that page_content is not defined on the base class.
# Here, we're relying on pydantic base class to handle the validation.
super().__init__(page_content=page_content, **kwargs) # type: ignore[call-arg,unused-ignore]
@classmethod
def is_lc_serializable(cls) -> bool:
"""Return `True` as this class is serializable."""
return True
@classmethod
def get_lc_namespace(cls) -> list[str]:
"""Get the namespace of the LangChain object.
Returns:
`["langchain", "schema", "document"]`
"""
return ["langchain", "schema", "document"]
def __str__(self) -> str:
"""Override `__str__` to restrict it to page_content and metadata.
Returns:
A string representation of the `Document`.
"""
# The format matches pydantic format for __str__.
#
# The purpose of this change is to make sure that user code that feeds
# Document objects directly into prompts remains unchanged due to the addition
# of the id field (or any other fields in the future).
#
# This override will likely be removed in the future in favor of a more general
# solution of formatting content directly inside the prompts.
if self.metadata:
return f"page_content='{self.page_content}' metadata={self.metadata}"
return f"page_content='{self.page_content}'"
Defined In
Extends
Source
Frequently Asked Questions
What is the Document class?
Document is a class in the langchain codebase, defined in libs/core/langchain_core/documents/base.py.
Where is Document defined?
Document is defined in libs/core/langchain_core/documents/base.py at line 288.
What does Document extend?
Document extends BaseMedia.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free