base.py — langchain Source File
Architecture documentation for base.py, a python file in the langchain codebase. 11 imports, 0 dependents.
Entity Profile
Dependency Diagram
graph LR 3a8d6f7e_811d_79c5_19bf_f9486dc287bd["base.py"] 43a94385_8919_6471_1068_d923b3a3c65f["copy"] 3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> 43a94385_8919_6471_1068_d923b3a3c65f e27da29f_a1f7_49f3_84d5_6be4cb4125c8["logging"] 3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> e27da29f_a1f7_49f3_84d5_6be4cb4125c8 50e20440_a135_6be3_a5a5_67791be5a2a6["abc"] 3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> 50e20440_a135_6be3_a5a5_67791be5a2a6 cd5f8820_9b2e_4495_abb7_d76026ac826c["dataclasses"] 3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> cd5f8820_9b2e_4495_abb7_d76026ac826c 7ec08df6_88bd_07ab_d50f_0d4c4e429b7e["enum"] 3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> 7ec08df6_88bd_07ab_d50f_0d4c4e429b7e feec1ec4_6917_867b_d228_b134d0ff8099["typing"] 3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> feec1ec4_6917_867b_d228_b134d0ff8099 6a98b0a5_5607_0043_2e22_a46a464c2d62["langchain_core.documents"] 3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> 6a98b0a5_5607_0043_2e22_a46a464c2d62 f85fae70_1011_eaec_151c_4083140ae9e5["typing_extensions"] 3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> f85fae70_1011_eaec_151c_4083140ae9e5 2bf6d401_816d_d011_3b05_a6114f55ff58["collections.abc"] 3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> 2bf6d401_816d_d011_3b05_a6114f55ff58 48f5485f_680a_97b7_bfc7_aff0508d4ca0["tiktoken"] 3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> 48f5485f_680a_97b7_bfc7_aff0508d4ca0 aec2dd6d_79fc_15ce_6af8_b63e259b6ffa["transformers.tokenization_utils_base"] 3a8d6f7e_811d_79c5_19bf_f9486dc287bd --> aec2dd6d_79fc_15ce_6af8_b63e259b6ffa style 3a8d6f7e_811d_79c5_19bf_f9486dc287bd fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
"""Text splitter base interface."""
from __future__ import annotations
import copy
import logging
from abc import ABC, abstractmethod
from dataclasses import dataclass
from enum import Enum
from typing import (
TYPE_CHECKING,
Any,
Literal,
TypeVar,
)
from langchain_core.documents import BaseDocumentTransformer, Document
from typing_extensions import Self, override
if TYPE_CHECKING:
from collections.abc import Callable, Collection, Iterable, Sequence
from collections.abc import Set as AbstractSet
try:
import tiktoken
_HAS_TIKTOKEN = True
except ImportError:
_HAS_TIKTOKEN = False
try:
from transformers.tokenization_utils_base import PreTrainedTokenizerBase
_HAS_TRANSFORMERS = True
except ImportError:
_HAS_TRANSFORMERS = False
logger = logging.getLogger(__name__)
TS = TypeVar("TS", bound="TextSplitter")
class TextSplitter(BaseDocumentTransformer, ABC):
"""Interface for splitting text into chunks."""
def __init__(
self,
chunk_size: int = 4000,
chunk_overlap: int = 200,
length_function: Callable[[str], int] = len,
keep_separator: bool | Literal["start", "end"] = False, # noqa: FBT001,FBT002
add_start_index: bool = False, # noqa: FBT001,FBT002
strip_whitespace: bool = True, # noqa: FBT001,FBT002
) -> None:
"""Create a new `TextSplitter`.
Args:
chunk_size: Maximum size of chunks to return
chunk_overlap: Overlap in characters between chunks
// ... (391 more lines)
Domain
Subdomains
Functions
Dependencies
- abc
- collections.abc
- copy
- dataclasses
- enum
- langchain_core.documents
- logging
- tiktoken
- transformers.tokenization_utils_base
- typing
- typing_extensions
Source
Frequently Asked Questions
What does base.py do?
base.py is a source file in the langchain codebase, written in python. It belongs to the DataProcessing domain, TextSplitters subdomain.
What functions are defined in base.py?
base.py defines 6 function(s): _HAS_TIKTOKEN, _HAS_TRANSFORMERS, collections, split_text_on_tokens, tiktoken, transformers.
What does base.py depend on?
base.py imports 11 module(s): abc, collections.abc, copy, dataclasses, enum, langchain_core.documents, logging, tiktoken, and 3 more.
Where is base.py in the architecture?
base.py is located at libs/text-splitters/langchain_text_splitters/base.py (domain: DataProcessing, subdomain: TextSplitters, directory: libs/text-splitters/langchain_text_splitters).
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free