Home / File/ sentence_transformers.py — langchain Source File

sentence_transformers.py — langchain Source File

Architecture documentation for sentence_transformers.py, a python file in the langchain codebase. 3 imports, 1 dependents.

File python DocumentProcessing TextSplitters 3 imports 1 dependents 2 functions 1 classes

Entity Profile

Dependency Diagram

graph LR
  7a1ee38d_b22f_3305_565e_328c5832dd13["sentence_transformers.py"]
  8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"]
  7a1ee38d_b22f_3305_565e_328c5832dd13 --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3
  885a8262_5dd0_fc53_460c_b7a8de727b5e["langchain_text_splitters.base"]
  7a1ee38d_b22f_3305_565e_328c5832dd13 --> 885a8262_5dd0_fc53_460c_b7a8de727b5e
  7a1ee38d_b22f_3305_565e_328c5832dd13["sentence_transformers.py"]
  7a1ee38d_b22f_3305_565e_328c5832dd13 --> 7a1ee38d_b22f_3305_565e_328c5832dd13
  7a1ee38d_b22f_3305_565e_328c5832dd13["sentence_transformers.py"]
  7a1ee38d_b22f_3305_565e_328c5832dd13 --> 7a1ee38d_b22f_3305_565e_328c5832dd13
  style 7a1ee38d_b22f_3305_565e_328c5832dd13 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

"""Sentence transformers text splitter."""

from __future__ import annotations

from typing import Any, cast

from langchain_text_splitters.base import TextSplitter, Tokenizer, split_text_on_tokens

try:
    # Type ignores needed as long as sentence-transformers doesn't support Python 3.14.
    from sentence_transformers import (  # type: ignore[import-not-found, unused-ignore]
        SentenceTransformer,
    )

    _HAS_SENTENCE_TRANSFORMERS = True
except ImportError:
    _HAS_SENTENCE_TRANSFORMERS = False


class SentenceTransformersTokenTextSplitter(TextSplitter):
    """Splitting text to tokens using sentence model tokenizer."""

    def __init__(
        self,
        chunk_overlap: int = 50,
        model_name: str = "sentence-transformers/all-mpnet-base-v2",
        tokens_per_chunk: int | None = None,
        **kwargs: Any,
    ) -> None:
        """Create a new `TextSplitter`.

        Args:
            chunk_overlap: The number of tokens to overlap between chunks.
            model_name: The name of the sentence transformer model to use.
            tokens_per_chunk: The number of tokens per chunk.

                If `None`, uses the maximum tokens allowed by the model.

        Raises:
            ImportError: If the `sentence_transformers` package is not installed.
        """
        super().__init__(**kwargs, chunk_overlap=chunk_overlap)

        if not _HAS_SENTENCE_TRANSFORMERS:
            msg = (
                "Could not import sentence_transformers python package. "
                "This is needed in order to use SentenceTransformersTokenTextSplitter. "
                "Please install it with `pip install sentence-transformers`."
            )
            raise ImportError(msg)

        self.model_name = model_name
        self._model = SentenceTransformer(self.model_name)
        self.tokenizer = self._model.tokenizer
        self._initialize_chunk_configuration(tokens_per_chunk=tokens_per_chunk)

    def _initialize_chunk_configuration(self, *, tokens_per_chunk: int | None) -> None:
        self.maximum_tokens_per_chunk = self._model.max_seq_length

        if tokens_per_chunk is None:
// ... (64 more lines)

Subdomains

Dependencies

Frequently Asked Questions

What does sentence_transformers.py do?
sentence_transformers.py is a source file in the langchain codebase, written in python. It belongs to the DocumentProcessing domain, TextSplitters subdomain.
What functions are defined in sentence_transformers.py?
sentence_transformers.py defines 2 function(s): _HAS_SENTENCE_TRANSFORMERS, sentence_transformers.
What does sentence_transformers.py depend on?
sentence_transformers.py imports 3 module(s): langchain_text_splitters.base, sentence_transformers.py, typing.
What files import sentence_transformers.py?
sentence_transformers.py is imported by 1 file(s): sentence_transformers.py.
Where is sentence_transformers.py in the architecture?
sentence_transformers.py is located at libs/text-splitters/langchain_text_splitters/sentence_transformers.py (domain: DocumentProcessing, subdomain: TextSplitters, directory: libs/text-splitters/langchain_text_splitters).

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free