base.py — langchain Source File
Architecture documentation for base.py, a python file in the langchain codebase. 8 imports, 0 dependents.
Entity Profile
Dependency Diagram
graph LR 1241bfcd_16b1_a16e_1fde_a4ccdbf83db2["base.py"] 69e1d8cc_6173_dcd0_bfdf_2132d8e1ce56["contextlib"] 1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> 69e1d8cc_6173_dcd0_bfdf_2132d8e1ce56 efe366da_f9ff_cd65_dcbd_a2abe0675906["mimetypes"] 1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> efe366da_f9ff_cd65_dcbd_a2abe0675906 4e334bc1_18d9_a6a4_18e5_7a3030396c51["io"] 1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> 4e334bc1_18d9_a6a4_18e5_7a3030396c51 b6ee5de5_719a_eeb5_1e11_e9c63bc22ef8["pathlib"] 1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> b6ee5de5_719a_eeb5_1e11_e9c63bc22ef8 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"] 1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3 6e58aaea_f08e_c099_3cc7_f9567bfb1ae7["pydantic"] 1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> 6e58aaea_f08e_c099_3cc7_f9567bfb1ae7 30d1300e_92bb_90d4_ac5e_1afe56db09d2["langchain_core.load.serializable"] 1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> 30d1300e_92bb_90d4_ac5e_1afe56db09d2 cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7["collections.abc"] 1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7 style 1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
"""Base classes for media and documents.
This module contains core abstractions for **data retrieval and processing workflows**:
- `BaseMedia`: Base class providing `id` and `metadata` fields
- `Blob`: Raw data loading (files, binary data) - used by document loaders
- `Document`: Text content for retrieval (RAG, vector stores, semantic search)
!!! note "Not for LLM chat messages"
These classes are for data processing pipelines, not LLM I/O. For multimodal
content in chat messages (images, audio in conversations), see
`langchain.messages` content blocks instead.
"""
from __future__ import annotations
import contextlib
import mimetypes
from io import BufferedReader, BytesIO
from pathlib import Path, PurePath
from typing import TYPE_CHECKING, Any, Literal, cast
from pydantic import ConfigDict, Field, model_validator
from langchain_core.load.serializable import Serializable
if TYPE_CHECKING:
from collections.abc import Generator
PathLike = str | PurePath
class BaseMedia(Serializable):
"""Base class for content used in retrieval and data processing workflows.
Provides common fields for content that needs to be stored, indexed, or searched.
!!! note
For multimodal content in **chat messages** (images, audio sent to/from LLMs),
use `langchain.messages` content blocks instead.
"""
# The ID field is optional at the moment.
# It will likely become required in a future major release after
# it has been adopted by enough VectorStore implementations.
id: str | None = Field(default=None, coerce_numbers_to_str=True)
"""An optional identifier for the document.
Ideally this should be unique across the document collection and formatted
as a UUID, but this will not be enforced.
"""
metadata: dict = Field(default_factory=dict)
"""Arbitrary metadata associated with the content."""
class Blob(BaseMedia):
"""Raw data abstraction for document loading and file processing.
// ... (288 more lines)
Domain
Subdomains
Functions
Dependencies
- collections.abc
- contextlib
- io
- langchain_core.load.serializable
- mimetypes
- pathlib
- pydantic
- typing
Source
Frequently Asked Questions
What does base.py do?
base.py is a source file in the langchain codebase, written in python. It belongs to the DocumentProcessing domain, DataLoaders subdomain.
What functions are defined in base.py?
base.py defines 1 function(s): collections.
What does base.py depend on?
base.py imports 8 module(s): collections.abc, contextlib, io, langchain_core.load.serializable, mimetypes, pathlib, pydantic, typing.
Where is base.py in the architecture?
base.py is located at libs/core/langchain_core/documents/base.py (domain: DocumentProcessing, subdomain: DataLoaders, directory: libs/core/langchain_core/documents).
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free