SpacyTextSplitter Class — langchain Architecture
Architecture documentation for the SpacyTextSplitter class in spacy.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 3be1f3ee_efc8_250d_4a34_cb435c2d70ac["SpacyTextSplitter"] de97494d_0459_2657_f593_c2b8bb6759e5["TextSplitter"] 3be1f3ee_efc8_250d_4a34_cb435c2d70ac -->|extends| de97494d_0459_2657_f593_c2b8bb6759e5 6511588c_fdc6_97a2_3753_2c61ff504a39["spacy.py"] 3be1f3ee_efc8_250d_4a34_cb435c2d70ac -->|defined in| 6511588c_fdc6_97a2_3753_2c61ff504a39 e0797988_1925_cf16_1939_4f7e0b8d7f57["__init__()"] 3be1f3ee_efc8_250d_4a34_cb435c2d70ac -->|method| e0797988_1925_cf16_1939_4f7e0b8d7f57 6c19aa8d_7891_ed83_3c65_d324a96d3e7e["split_text()"] 3be1f3ee_efc8_250d_4a34_cb435c2d70ac -->|method| 6c19aa8d_7891_ed83_3c65_d324a96d3e7e
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/spacy.py lines 26–58
class SpacyTextSplitter(TextSplitter):
"""Splitting text using Spacy package.
Per default, Spacy's `en_core_web_sm` model is used and
its default max_length is 1000000 (it is the length of maximum character
this model takes which can be increased for large files). For a faster, but
potentially less accurate splitting, you can use `pipeline='sentencizer'`.
"""
def __init__(
self,
separator: str = "\n\n",
pipeline: str = "en_core_web_sm",
max_length: int = 1_000_000,
*,
strip_whitespace: bool = True,
**kwargs: Any,
) -> None:
"""Initialize the spacy text splitter."""
super().__init__(**kwargs)
self._tokenizer = _make_spacy_pipeline_for_splitting(
pipeline, max_length=max_length
)
self._separator = separator
self._strip_whitespace = strip_whitespace
@override
def split_text(self, text: str) -> list[str]:
splits = (
s.text if self._strip_whitespace else s.text_with_ws
for s in self._tokenizer(text).sents
)
return self._merge_splits(splits, self._separator)
Domain
Extends
Source
Frequently Asked Questions
What is the SpacyTextSplitter class?
SpacyTextSplitter is a class in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/spacy.py.
Where is SpacyTextSplitter defined?
SpacyTextSplitter is defined in libs/text-splitters/langchain_text_splitters/spacy.py at line 26.
What does SpacyTextSplitter extend?
SpacyTextSplitter extends TextSplitter.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free