Home / File/ test_text_splitters.py — langchain Source File

test_text_splitters.py — langchain Source File

Architecture documentation for test_text_splitters.py, a python file in the langchain codebase. 18 imports, 0 dependents.

File python CoreAbstractions Serialization 18 imports 114 functions

Entity Profile

Dependency Diagram

graph LR
  bb4ae43e_9a17_0c4f_3675_b19d54184616["test_text_splitters.py"]
  d1277855_b602_121e_7de9_23a45b72f1fe["random"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> d1277855_b602_121e_7de9_23a45b72f1fe
  67ec3255_645e_8b6e_1eff_1eb3c648ed95["re"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 67ec3255_645e_8b6e_1eff_1eb3c648ed95
  06ab3965_70ce_6e2c_feb9_564d849aa5f4["string"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 06ab3965_70ce_6e2c_feb9_564d849aa5f4
  243100a0_4629_4394_a66b_1f67b00ce784["textwrap"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 243100a0_4629_4394_a66b_1f67b00ce784
  8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3
  120e2591_3e15_b895_72b6_cb26195e40a6["pytest"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 120e2591_3e15_b895_72b6_cb26195e40a6
  b19a8b7e_fbee_95b1_65b8_509a1ed3cad7["langchain_core._api"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> b19a8b7e_fbee_95b1_65b8_509a1ed3cad7
  c554676d_b731_47b2_a98f_c1c2d537c0aa["langchain_core.documents"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> c554676d_b731_47b2_a98f_c1c2d537c0aa
  5d24a664_4d9b_7491_ea6a_e13ddbcc8eeb["langchain_text_splitters"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 5d24a664_4d9b_7491_ea6a_e13ddbcc8eeb
  885a8262_5dd0_fc53_460c_b7a8de727b5e["langchain_text_splitters.base"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 885a8262_5dd0_fc53_460c_b7a8de727b5e
  26e26c06_c107_2778_a237_35607f5a6d20["langchain_text_splitters.character"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 26e26c06_c107_2778_a237_35607f5a6d20
  e39c01af_a371_ebc0_15ec_4d64e7690fd7["langchain_text_splitters.html"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> e39c01af_a371_ebc0_15ec_4d64e7690fd7
  81fc5591_1396_3ebf_74fb_2ed9134e7055["langchain_text_splitters.json"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 81fc5591_1396_3ebf_74fb_2ed9134e7055
  92e7d759_5033_4d96_9770_36e8afadf2b8["langchain_text_splitters.jsx"]
  bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 92e7d759_5033_4d96_9770_36e8afadf2b8
  style bb4ae43e_9a17_0c4f_3675_b19d54184616 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

"""Test text splitting functionality."""

from __future__ import annotations

import random
import re
import string
import textwrap
from typing import TYPE_CHECKING, Any

import pytest
from langchain_core._api import suppress_langchain_beta_warning
from langchain_core.documents import Document

from langchain_text_splitters import (
    Language,
    RecursiveCharacterTextSplitter,
    TextSplitter,
    Tokenizer,
)
from langchain_text_splitters.base import split_text_on_tokens
from langchain_text_splitters.character import CharacterTextSplitter
from langchain_text_splitters.html import (
    HTMLHeaderTextSplitter,
    HTMLSectionSplitter,
    HTMLSemanticPreservingSplitter,
)
from langchain_text_splitters.json import RecursiveJsonSplitter
from langchain_text_splitters.jsx import JSFrameworkTextSplitter
from langchain_text_splitters.markdown import (
    ExperimentalMarkdownSyntaxTextSplitter,
    MarkdownHeaderTextSplitter,
)
from langchain_text_splitters.python import PythonCodeTextSplitter

if TYPE_CHECKING:
    from collections.abc import Callable

    from bs4 import Tag

FAKE_PYTHON_TEXT = """
class Foo:

    def bar():


def foo():

def testing_func():

def bar():
"""


def test_character_text_splitter() -> None:
    """Test splitting by character count."""
    text = "foo bar baz 123"
    splitter = CharacterTextSplitter(separator=" ", chunk_size=7, chunk_overlap=3)
    output = splitter.split_text(text)
    expected_output = ["foo bar", "bar baz", "baz 123"]
// ... (4037 more lines)

Subdomains

Functions

Dependencies

  • bs4
  • collections.abc
  • langchain_core._api
  • langchain_core.documents
  • langchain_text_splitters
  • langchain_text_splitters.base
  • langchain_text_splitters.character
  • langchain_text_splitters.html
  • langchain_text_splitters.json
  • langchain_text_splitters.jsx
  • langchain_text_splitters.markdown
  • langchain_text_splitters.python
  • pytest
  • random
  • re
  • string
  • textwrap
  • typing

Frequently Asked Questions

What does test_text_splitters.py do?
test_text_splitters.py is a source file in the langchain codebase, written in python. It belongs to the CoreAbstractions domain, Serialization subdomain.
What functions are defined in test_text_splitters.py?
test_text_splitters.py defines 114 function(s): __test_iterative_text_splitter, collections, custom_iframe_extractor, html_header_splitter_splitter_factory, test_additional_html_header_text_splitter, test_character_text_splitter, test_character_text_splitter_chunk_size_effect, test_character_text_splitter_discard_regex_separator_on_merge, test_character_text_splitter_discard_separator_regex, test_character_text_splitter_empty_doc, and 104 more.
What does test_text_splitters.py depend on?
test_text_splitters.py imports 18 module(s): bs4, collections.abc, langchain_core._api, langchain_core.documents, langchain_text_splitters, langchain_text_splitters.base, langchain_text_splitters.character, langchain_text_splitters.html, and 10 more.
Where is test_text_splitters.py in the architecture?
test_text_splitters.py is located at libs/text-splitters/tests/unit_tests/test_text_splitters.py (domain: CoreAbstractions, subdomain: Serialization, directory: libs/text-splitters/tests/unit_tests).

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free