Home / Class/ SpacyTextSplitter Class — langchain Architecture

SpacyTextSplitter Class — langchain Architecture

Architecture documentation for the SpacyTextSplitter class in spacy.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  3be1f3ee_efc8_250d_4a34_cb435c2d70ac["SpacyTextSplitter"]
  de97494d_0459_2657_f593_c2b8bb6759e5["TextSplitter"]
  3be1f3ee_efc8_250d_4a34_cb435c2d70ac -->|extends| de97494d_0459_2657_f593_c2b8bb6759e5
  6511588c_fdc6_97a2_3753_2c61ff504a39["spacy.py"]
  3be1f3ee_efc8_250d_4a34_cb435c2d70ac -->|defined in| 6511588c_fdc6_97a2_3753_2c61ff504a39
  e0797988_1925_cf16_1939_4f7e0b8d7f57["__init__()"]
  3be1f3ee_efc8_250d_4a34_cb435c2d70ac -->|method| e0797988_1925_cf16_1939_4f7e0b8d7f57
  6c19aa8d_7891_ed83_3c65_d324a96d3e7e["split_text()"]
  3be1f3ee_efc8_250d_4a34_cb435c2d70ac -->|method| 6c19aa8d_7891_ed83_3c65_d324a96d3e7e

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/spacy.py lines 26–58

class SpacyTextSplitter(TextSplitter):
    """Splitting text using Spacy package.

    Per default, Spacy's `en_core_web_sm` model is used and
    its default max_length is 1000000 (it is the length of maximum character
    this model takes which can be increased for large files). For a faster, but
    potentially less accurate splitting, you can use `pipeline='sentencizer'`.
    """

    def __init__(
        self,
        separator: str = "\n\n",
        pipeline: str = "en_core_web_sm",
        max_length: int = 1_000_000,
        *,
        strip_whitespace: bool = True,
        **kwargs: Any,
    ) -> None:
        """Initialize the spacy text splitter."""
        super().__init__(**kwargs)
        self._tokenizer = _make_spacy_pipeline_for_splitting(
            pipeline, max_length=max_length
        )
        self._separator = separator
        self._strip_whitespace = strip_whitespace

    @override
    def split_text(self, text: str) -> list[str]:
        splits = (
            s.text if self._strip_whitespace else s.text_with_ws
            for s in self._tokenizer(text).sents
        )
        return self._merge_splits(splits, self._separator)

Extends

Frequently Asked Questions

What is the SpacyTextSplitter class?
SpacyTextSplitter is a class in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/spacy.py.
Where is SpacyTextSplitter defined?
SpacyTextSplitter is defined in libs/text-splitters/langchain_text_splitters/spacy.py at line 26.
What does SpacyTextSplitter extend?
SpacyTextSplitter extends TextSplitter.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free