Home / File/ test_nlp_text_splitters.py — langchain Source File

test_nlp_text_splitters.py — langchain Source File

Architecture documentation for test_nlp_text_splitters.py, a python file in the langchain codebase. 6 imports, 0 dependents.

File python LangChainCore MessageInterface 6 imports 9 functions

Entity Profile

Dependency Diagram

graph LR
  a159cbba_51f0_5d34_8696_299b594bb0fe["test_nlp_text_splitters.py"]
  b7996424_637b_0b54_6edf_2e18e9c1a8bf["re"]
  a159cbba_51f0_5d34_8696_299b594bb0fe --> b7996424_637b_0b54_6edf_2e18e9c1a8bf
  6b931deb_22b7_d48c_1e82_1ca024116ba7["nltk"]
  a159cbba_51f0_5d34_8696_299b594bb0fe --> 6b931deb_22b7_d48c_1e82_1ca024116ba7
  f69d6389_263d_68a4_7fbf_f14c0602a9ba["pytest"]
  a159cbba_51f0_5d34_8696_299b594bb0fe --> f69d6389_263d_68a4_7fbf_f14c0602a9ba
  6a98b0a5_5607_0043_2e22_a46a464c2d62["langchain_core.documents"]
  a159cbba_51f0_5d34_8696_299b594bb0fe --> 6a98b0a5_5607_0043_2e22_a46a464c2d62
  e763912b_9e7e_8cd9_29dd_62b413d3361c["langchain_text_splitters.nltk"]
  a159cbba_51f0_5d34_8696_299b594bb0fe --> e763912b_9e7e_8cd9_29dd_62b413d3361c
  0840293a_3f96_a163_de03_c2b3e3b176c5["langchain_text_splitters.spacy"]
  a159cbba_51f0_5d34_8696_299b594bb0fe --> 0840293a_3f96_a163_de03_c2b3e3b176c5
  style a159cbba_51f0_5d34_8696_299b594bb0fe fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

"""Test text splitting functionality using NLTK and Spacy based sentence splitters."""

import re

import nltk
import pytest
from langchain_core.documents import Document

from langchain_text_splitters.nltk import NLTKTextSplitter
from langchain_text_splitters.spacy import SpacyTextSplitter


def setup_module() -> None:
    nltk.download("punkt_tab")


@pytest.fixture
def spacy() -> None:
    spacy = pytest.importorskip("spacy")

    # Check if en_core_web_sm model is available
    try:
        spacy.load("en_core_web_sm")
    except OSError:
        pytest.skip(
            "en_core_web_sm model not installed. Install with: "
            "uv add --group test_integration "
            "https://github.com/explosion/spacy-models/releases/download/"
            "en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl"
        )


def test_nltk_text_splitting_args() -> None:
    """Test invalid arguments."""
    with pytest.raises(
        ValueError,
        match=re.escape(
            "Got a larger chunk overlap (4) than chunk size (2), should be smaller."
        ),
    ):
        NLTKTextSplitter(chunk_size=2, chunk_overlap=4)


@pytest.mark.usefixtures("spacy")
def test_spacy_text_splitting_args() -> None:
    """Test invalid arguments."""
    with pytest.raises(
        ValueError,
        match=re.escape(
            "Got a larger chunk overlap (4) than chunk size (2), should be smaller."
        ),
    ):
        SpacyTextSplitter(chunk_size=2, chunk_overlap=4)


def test_nltk_text_splitter() -> None:
    """Test splitting by sentence using NLTK."""
    text = "This is sentence one. And this is sentence two."
    separator = "|||"
    splitter = NLTKTextSplitter(separator=separator)
// ... (64 more lines)

Domain

Subdomains

Dependencies

  • langchain_core.documents
  • langchain_text_splitters.nltk
  • langchain_text_splitters.spacy
  • nltk
  • pytest
  • re

Frequently Asked Questions

What does test_nlp_text_splitters.py do?
test_nlp_text_splitters.py is a source file in the langchain codebase, written in python. It belongs to the LangChainCore domain, MessageInterface subdomain.
What functions are defined in test_nlp_text_splitters.py?
test_nlp_text_splitters.py defines 9 function(s): setup_module, spacy, test_nltk_text_splitter, test_nltk_text_splitter_args, test_nltk_text_splitter_with_add_start_index, test_nltk_text_splitting_args, test_spacy_text_splitter, test_spacy_text_splitter_strip_whitespace, test_spacy_text_splitting_args.
What does test_nlp_text_splitters.py depend on?
test_nlp_text_splitters.py imports 6 module(s): langchain_core.documents, langchain_text_splitters.nltk, langchain_text_splitters.spacy, nltk, pytest, re.
Where is test_nlp_text_splitters.py in the architecture?
test_nlp_text_splitters.py is located at libs/text-splitters/tests/integration_tests/test_nlp_text_splitters.py (domain: LangChainCore, subdomain: MessageInterface, directory: libs/text-splitters/tests/integration_tests).

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free