test_nlp_text_splitters.py — langchain Source File
Architecture documentation for test_nlp_text_splitters.py, a python file in the langchain codebase. 6 imports, 0 dependents.
Entity Profile
Dependency Diagram
graph LR a159cbba_51f0_5d34_8696_299b594bb0fe["test_nlp_text_splitters.py"] b7996424_637b_0b54_6edf_2e18e9c1a8bf["re"] a159cbba_51f0_5d34_8696_299b594bb0fe --> b7996424_637b_0b54_6edf_2e18e9c1a8bf 6b931deb_22b7_d48c_1e82_1ca024116ba7["nltk"] a159cbba_51f0_5d34_8696_299b594bb0fe --> 6b931deb_22b7_d48c_1e82_1ca024116ba7 f69d6389_263d_68a4_7fbf_f14c0602a9ba["pytest"] a159cbba_51f0_5d34_8696_299b594bb0fe --> f69d6389_263d_68a4_7fbf_f14c0602a9ba 6a98b0a5_5607_0043_2e22_a46a464c2d62["langchain_core.documents"] a159cbba_51f0_5d34_8696_299b594bb0fe --> 6a98b0a5_5607_0043_2e22_a46a464c2d62 e763912b_9e7e_8cd9_29dd_62b413d3361c["langchain_text_splitters.nltk"] a159cbba_51f0_5d34_8696_299b594bb0fe --> e763912b_9e7e_8cd9_29dd_62b413d3361c 0840293a_3f96_a163_de03_c2b3e3b176c5["langchain_text_splitters.spacy"] a159cbba_51f0_5d34_8696_299b594bb0fe --> 0840293a_3f96_a163_de03_c2b3e3b176c5 style a159cbba_51f0_5d34_8696_299b594bb0fe fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
"""Test text splitting functionality using NLTK and Spacy based sentence splitters."""
import re
import nltk
import pytest
from langchain_core.documents import Document
from langchain_text_splitters.nltk import NLTKTextSplitter
from langchain_text_splitters.spacy import SpacyTextSplitter
def setup_module() -> None:
nltk.download("punkt_tab")
@pytest.fixture
def spacy() -> None:
spacy = pytest.importorskip("spacy")
# Check if en_core_web_sm model is available
try:
spacy.load("en_core_web_sm")
except OSError:
pytest.skip(
"en_core_web_sm model not installed. Install with: "
"uv add --group test_integration "
"https://github.com/explosion/spacy-models/releases/download/"
"en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl"
)
def test_nltk_text_splitting_args() -> None:
"""Test invalid arguments."""
with pytest.raises(
ValueError,
match=re.escape(
"Got a larger chunk overlap (4) than chunk size (2), should be smaller."
),
):
NLTKTextSplitter(chunk_size=2, chunk_overlap=4)
@pytest.mark.usefixtures("spacy")
def test_spacy_text_splitting_args() -> None:
"""Test invalid arguments."""
with pytest.raises(
ValueError,
match=re.escape(
"Got a larger chunk overlap (4) than chunk size (2), should be smaller."
),
):
SpacyTextSplitter(chunk_size=2, chunk_overlap=4)
def test_nltk_text_splitter() -> None:
"""Test splitting by sentence using NLTK."""
text = "This is sentence one. And this is sentence two."
separator = "|||"
splitter = NLTKTextSplitter(separator=separator)
// ... (64 more lines)
Domain
Subdomains
Functions
Dependencies
- langchain_core.documents
- langchain_text_splitters.nltk
- langchain_text_splitters.spacy
- nltk
- pytest
- re
Source
Frequently Asked Questions
What does test_nlp_text_splitters.py do?
test_nlp_text_splitters.py is a source file in the langchain codebase, written in python. It belongs to the LangChainCore domain, MessageInterface subdomain.
What functions are defined in test_nlp_text_splitters.py?
test_nlp_text_splitters.py defines 9 function(s): setup_module, spacy, test_nltk_text_splitter, test_nltk_text_splitter_args, test_nltk_text_splitter_with_add_start_index, test_nltk_text_splitting_args, test_spacy_text_splitter, test_spacy_text_splitter_strip_whitespace, test_spacy_text_splitting_args.
What does test_nlp_text_splitters.py depend on?
test_nlp_text_splitters.py imports 6 module(s): langchain_core.documents, langchain_text_splitters.nltk, langchain_text_splitters.spacy, nltk, pytest, re.
Where is test_nlp_text_splitters.py in the architecture?
test_nlp_text_splitters.py is located at libs/text-splitters/tests/integration_tests/test_nlp_text_splitters.py (domain: LangChainCore, subdomain: MessageInterface, directory: libs/text-splitters/tests/integration_tests).
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free