sentence_transformers.py — langchain Source File
Architecture documentation for sentence_transformers.py, a python file in the langchain codebase. 3 imports, 1 dependents.
Entity Profile
Dependency Diagram
graph LR 7a1ee38d_b22f_3305_565e_328c5832dd13["sentence_transformers.py"] 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"] 7a1ee38d_b22f_3305_565e_328c5832dd13 --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3 885a8262_5dd0_fc53_460c_b7a8de727b5e["langchain_text_splitters.base"] 7a1ee38d_b22f_3305_565e_328c5832dd13 --> 885a8262_5dd0_fc53_460c_b7a8de727b5e 7a1ee38d_b22f_3305_565e_328c5832dd13["sentence_transformers.py"] 7a1ee38d_b22f_3305_565e_328c5832dd13 --> 7a1ee38d_b22f_3305_565e_328c5832dd13 7a1ee38d_b22f_3305_565e_328c5832dd13["sentence_transformers.py"] 7a1ee38d_b22f_3305_565e_328c5832dd13 --> 7a1ee38d_b22f_3305_565e_328c5832dd13 style 7a1ee38d_b22f_3305_565e_328c5832dd13 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
"""Sentence transformers text splitter."""
from __future__ import annotations
from typing import Any, cast
from langchain_text_splitters.base import TextSplitter, Tokenizer, split_text_on_tokens
try:
# Type ignores needed as long as sentence-transformers doesn't support Python 3.14.
from sentence_transformers import ( # type: ignore[import-not-found, unused-ignore]
SentenceTransformer,
)
_HAS_SENTENCE_TRANSFORMERS = True
except ImportError:
_HAS_SENTENCE_TRANSFORMERS = False
class SentenceTransformersTokenTextSplitter(TextSplitter):
"""Splitting text to tokens using sentence model tokenizer."""
def __init__(
self,
chunk_overlap: int = 50,
model_name: str = "sentence-transformers/all-mpnet-base-v2",
tokens_per_chunk: int | None = None,
**kwargs: Any,
) -> None:
"""Create a new `TextSplitter`.
Args:
chunk_overlap: The number of tokens to overlap between chunks.
model_name: The name of the sentence transformer model to use.
tokens_per_chunk: The number of tokens per chunk.
If `None`, uses the maximum tokens allowed by the model.
Raises:
ImportError: If the `sentence_transformers` package is not installed.
"""
super().__init__(**kwargs, chunk_overlap=chunk_overlap)
if not _HAS_SENTENCE_TRANSFORMERS:
msg = (
"Could not import sentence_transformers python package. "
"This is needed in order to use SentenceTransformersTokenTextSplitter. "
"Please install it with `pip install sentence-transformers`."
)
raise ImportError(msg)
self.model_name = model_name
self._model = SentenceTransformer(self.model_name)
self.tokenizer = self._model.tokenizer
self._initialize_chunk_configuration(tokens_per_chunk=tokens_per_chunk)
def _initialize_chunk_configuration(self, *, tokens_per_chunk: int | None) -> None:
self.maximum_tokens_per_chunk = self._model.max_seq_length
if tokens_per_chunk is None:
// ... (64 more lines)
Domain
Subdomains
Dependencies
- langchain_text_splitters.base
- sentence_transformers.py
- typing
Source
Frequently Asked Questions
What does sentence_transformers.py do?
sentence_transformers.py is a source file in the langchain codebase, written in python. It belongs to the DocumentProcessing domain, TextSplitters subdomain.
What functions are defined in sentence_transformers.py?
sentence_transformers.py defines 2 function(s): _HAS_SENTENCE_TRANSFORMERS, sentence_transformers.
What does sentence_transformers.py depend on?
sentence_transformers.py imports 3 module(s): langchain_text_splitters.base, sentence_transformers.py, typing.
What files import sentence_transformers.py?
sentence_transformers.py is imported by 1 file(s): sentence_transformers.py.
Where is sentence_transformers.py in the architecture?
sentence_transformers.py is located at libs/text-splitters/langchain_text_splitters/sentence_transformers.py (domain: DocumentProcessing, subdomain: TextSplitters, directory: libs/text-splitters/langchain_text_splitters).
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free