Home / File/ base.py — langchain Source File

base.py — langchain Source File

Architecture documentation for base.py, a python file in the langchain codebase. 8 imports, 0 dependents.

File python DocumentProcessing DataLoaders 8 imports 1 functions 3 classes

Entity Profile

Dependency Diagram

graph LR
  1241bfcd_16b1_a16e_1fde_a4ccdbf83db2["base.py"]
  69e1d8cc_6173_dcd0_bfdf_2132d8e1ce56["contextlib"]
  1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> 69e1d8cc_6173_dcd0_bfdf_2132d8e1ce56
  efe366da_f9ff_cd65_dcbd_a2abe0675906["mimetypes"]
  1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> efe366da_f9ff_cd65_dcbd_a2abe0675906
  4e334bc1_18d9_a6a4_18e5_7a3030396c51["io"]
  1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> 4e334bc1_18d9_a6a4_18e5_7a3030396c51
  b6ee5de5_719a_eeb5_1e11_e9c63bc22ef8["pathlib"]
  1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> b6ee5de5_719a_eeb5_1e11_e9c63bc22ef8
  8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"]
  1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3
  6e58aaea_f08e_c099_3cc7_f9567bfb1ae7["pydantic"]
  1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> 6e58aaea_f08e_c099_3cc7_f9567bfb1ae7
  30d1300e_92bb_90d4_ac5e_1afe56db09d2["langchain_core.load.serializable"]
  1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> 30d1300e_92bb_90d4_ac5e_1afe56db09d2
  cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7["collections.abc"]
  1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 --> cfe2bde5_180e_e3b0_df2b_55b3ebaca8e7
  style 1241bfcd_16b1_a16e_1fde_a4ccdbf83db2 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

"""Base classes for media and documents.

This module contains core abstractions for **data retrieval and processing workflows**:

- `BaseMedia`: Base class providing `id` and `metadata` fields
- `Blob`: Raw data loading (files, binary data) - used by document loaders
- `Document`: Text content for retrieval (RAG, vector stores, semantic search)

!!! note "Not for LLM chat messages"

    These classes are for data processing pipelines, not LLM I/O. For multimodal
    content in chat messages (images, audio in conversations), see
    `langchain.messages` content blocks instead.
"""

from __future__ import annotations

import contextlib
import mimetypes
from io import BufferedReader, BytesIO
from pathlib import Path, PurePath
from typing import TYPE_CHECKING, Any, Literal, cast

from pydantic import ConfigDict, Field, model_validator

from langchain_core.load.serializable import Serializable

if TYPE_CHECKING:
    from collections.abc import Generator

PathLike = str | PurePath


class BaseMedia(Serializable):
    """Base class for content used in retrieval and data processing workflows.

    Provides common fields for content that needs to be stored, indexed, or searched.

    !!! note

        For multimodal content in **chat messages** (images, audio sent to/from LLMs),
        use `langchain.messages` content blocks instead.
    """

    # The ID field is optional at the moment.
    # It will likely become required in a future major release after
    # it has been adopted by enough VectorStore implementations.
    id: str | None = Field(default=None, coerce_numbers_to_str=True)
    """An optional identifier for the document.

    Ideally this should be unique across the document collection and formatted
    as a UUID, but this will not be enforced.
    """

    metadata: dict = Field(default_factory=dict)
    """Arbitrary metadata associated with the content."""


class Blob(BaseMedia):
    """Raw data abstraction for document loading and file processing.
// ... (288 more lines)

Subdomains

Functions

Dependencies

  • collections.abc
  • contextlib
  • io
  • langchain_core.load.serializable
  • mimetypes
  • pathlib
  • pydantic
  • typing

Frequently Asked Questions

What does base.py do?
base.py is a source file in the langchain codebase, written in python. It belongs to the DocumentProcessing domain, DataLoaders subdomain.
What functions are defined in base.py?
base.py defines 1 function(s): collections.
What does base.py depend on?
base.py imports 8 module(s): collections.abc, contextlib, io, langchain_core.load.serializable, mimetypes, pathlib, pydantic, typing.
Where is base.py in the architecture?
base.py is located at libs/core/langchain_core/documents/base.py (domain: DocumentProcessing, subdomain: DataLoaders, directory: libs/core/langchain_core/documents).

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free