Home / File/ base.py — langchain Source File

base.py — langchain Source File

Architecture documentation for base.py, a python file in the langchain codebase. 11 imports, 0 dependents.

File python DataProcessing TextSplitters 11 imports 6 functions 4 classes

Entity Profile

Dependency Diagram

graph LR
  3a8d6f7e_811d_79c5_19bf_f9486dc287bd["base.py"]
  43a94385_8919_6471_1068_d923b3a3c65f["copy"]
  3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> 43a94385_8919_6471_1068_d923b3a3c65f
  e27da29f_a1f7_49f3_84d5_6be4cb4125c8["logging"]
  3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> e27da29f_a1f7_49f3_84d5_6be4cb4125c8
  50e20440_a135_6be3_a5a5_67791be5a2a6["abc"]
  3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> 50e20440_a135_6be3_a5a5_67791be5a2a6
  cd5f8820_9b2e_4495_abb7_d76026ac826c["dataclasses"]
  3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> cd5f8820_9b2e_4495_abb7_d76026ac826c
  7ec08df6_88bd_07ab_d50f_0d4c4e429b7e["enum"]
  3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> 7ec08df6_88bd_07ab_d50f_0d4c4e429b7e
  feec1ec4_6917_867b_d228_b134d0ff8099["typing"]
  3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> feec1ec4_6917_867b_d228_b134d0ff8099
  6a98b0a5_5607_0043_2e22_a46a464c2d62["langchain_core.documents"]
  3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> 6a98b0a5_5607_0043_2e22_a46a464c2d62
  f85fae70_1011_eaec_151c_4083140ae9e5["typing_extensions"]
  3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> f85fae70_1011_eaec_151c_4083140ae9e5
  2bf6d401_816d_d011_3b05_a6114f55ff58["collections.abc"]
  3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> 2bf6d401_816d_d011_3b05_a6114f55ff58
  48f5485f_680a_97b7_bfc7_aff0508d4ca0["tiktoken"]
  3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> 48f5485f_680a_97b7_bfc7_aff0508d4ca0
  aec2dd6d_79fc_15ce_6af8_b63e259b6ffa["transformers.tokenization_utils_base"]
  3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> aec2dd6d_79fc_15ce_6af8_b63e259b6ffa
  style 3a8d6f7e_811d_79c5_19bf_f9486dc287bd fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

"""Text splitter base interface."""

from __future__ import annotations

import copy
import logging
from abc import ABC, abstractmethod
from dataclasses import dataclass
from enum import Enum
from typing import (
    TYPE_CHECKING,
    Any,
    Literal,
    TypeVar,
)

from langchain_core.documents import BaseDocumentTransformer, Document
from typing_extensions import Self, override

if TYPE_CHECKING:
    from collections.abc import Callable, Collection, Iterable, Sequence
    from collections.abc import Set as AbstractSet


try:
    import tiktoken

    _HAS_TIKTOKEN = True
except ImportError:
    _HAS_TIKTOKEN = False

try:
    from transformers.tokenization_utils_base import PreTrainedTokenizerBase

    _HAS_TRANSFORMERS = True
except ImportError:
    _HAS_TRANSFORMERS = False

logger = logging.getLogger(__name__)

TS = TypeVar("TS", bound="TextSplitter")


class TextSplitter(BaseDocumentTransformer, ABC):
    """Interface for splitting text into chunks."""

    def __init__(
        self,
        chunk_size: int = 4000,
        chunk_overlap: int = 200,
        length_function: Callable[[str], int] = len,
        keep_separator: bool | Literal["start", "end"] = False,  # noqa: FBT001,FBT002
        add_start_index: bool = False,  # noqa: FBT001,FBT002
        strip_whitespace: bool = True,  # noqa: FBT001,FBT002
    ) -> None:
        """Create a new `TextSplitter`.

        Args:
            chunk_size: Maximum size of chunks to return
            chunk_overlap: Overlap in characters between chunks
// ... (391 more lines)

Subdomains

Dependencies

  • abc
  • collections.abc
  • copy
  • dataclasses
  • enum
  • langchain_core.documents
  • logging
  • tiktoken
  • transformers.tokenization_utils_base
  • typing
  • typing_extensions

Frequently Asked Questions

What does base.py do?
base.py is a source file in the langchain codebase, written in python. It belongs to the DataProcessing domain, TextSplitters subdomain.
What functions are defined in base.py?
base.py defines 6 function(s): _HAS_TIKTOKEN, _HAS_TRANSFORMERS, collections, split_text_on_tokens, tiktoken, transformers.
What does base.py depend on?
base.py imports 11 module(s): abc, collections.abc, copy, dataclasses, enum, langchain_core.documents, logging, tiktoken, and 3 more.
Where is base.py in the architecture?
base.py is located at libs/text-splitters/langchain_text_splitters/base.py (domain: DataProcessing, subdomain: TextSplitters, directory: libs/text-splitters/langchain_text_splitters).

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free