test_text_splitters.py — langchain Source File
Architecture documentation for test_text_splitters.py, a python file in the langchain codebase. 18 imports, 0 dependents.
Entity Profile
Dependency Diagram
graph LR bb4ae43e_9a17_0c4f_3675_b19d54184616["test_text_splitters.py"] d1277855_b602_121e_7de9_23a45b72f1fe["random"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> d1277855_b602_121e_7de9_23a45b72f1fe 67ec3255_645e_8b6e_1eff_1eb3c648ed95["re"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 67ec3255_645e_8b6e_1eff_1eb3c648ed95 06ab3965_70ce_6e2c_feb9_564d849aa5f4["string"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 06ab3965_70ce_6e2c_feb9_564d849aa5f4 243100a0_4629_4394_a66b_1f67b00ce784["textwrap"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 243100a0_4629_4394_a66b_1f67b00ce784 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3 120e2591_3e15_b895_72b6_cb26195e40a6["pytest"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 120e2591_3e15_b895_72b6_cb26195e40a6 b19a8b7e_fbee_95b1_65b8_509a1ed3cad7["langchain_core._api"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> b19a8b7e_fbee_95b1_65b8_509a1ed3cad7 c554676d_b731_47b2_a98f_c1c2d537c0aa["langchain_core.documents"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> c554676d_b731_47b2_a98f_c1c2d537c0aa 5d24a664_4d9b_7491_ea6a_e13ddbcc8eeb["langchain_text_splitters"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 5d24a664_4d9b_7491_ea6a_e13ddbcc8eeb 885a8262_5dd0_fc53_460c_b7a8de727b5e["langchain_text_splitters.base"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 885a8262_5dd0_fc53_460c_b7a8de727b5e 26e26c06_c107_2778_a237_35607f5a6d20["langchain_text_splitters.character"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 26e26c06_c107_2778_a237_35607f5a6d20 e39c01af_a371_ebc0_15ec_4d64e7690fd7["langchain_text_splitters.html"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> e39c01af_a371_ebc0_15ec_4d64e7690fd7 81fc5591_1396_3ebf_74fb_2ed9134e7055["langchain_text_splitters.json"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 81fc5591_1396_3ebf_74fb_2ed9134e7055 92e7d759_5033_4d96_9770_36e8afadf2b8["langchain_text_splitters.jsx"] bb4ae43e_9a17_0c4f_3675_b19d54184616 --> 92e7d759_5033_4d96_9770_36e8afadf2b8 style bb4ae43e_9a17_0c4f_3675_b19d54184616 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
"""Test text splitting functionality."""
from __future__ import annotations
import random
import re
import string
import textwrap
from typing import TYPE_CHECKING, Any
import pytest
from langchain_core._api import suppress_langchain_beta_warning
from langchain_core.documents import Document
from langchain_text_splitters import (
Language,
RecursiveCharacterTextSplitter,
TextSplitter,
Tokenizer,
)
from langchain_text_splitters.base import split_text_on_tokens
from langchain_text_splitters.character import CharacterTextSplitter
from langchain_text_splitters.html import (
HTMLHeaderTextSplitter,
HTMLSectionSplitter,
HTMLSemanticPreservingSplitter,
)
from langchain_text_splitters.json import RecursiveJsonSplitter
from langchain_text_splitters.jsx import JSFrameworkTextSplitter
from langchain_text_splitters.markdown import (
ExperimentalMarkdownSyntaxTextSplitter,
MarkdownHeaderTextSplitter,
)
from langchain_text_splitters.python import PythonCodeTextSplitter
if TYPE_CHECKING:
from collections.abc import Callable
from bs4 import Tag
FAKE_PYTHON_TEXT = """
class Foo:
def bar():
def foo():
def testing_func():
def bar():
"""
def test_character_text_splitter() -> None:
"""Test splitting by character count."""
text = "foo bar baz 123"
splitter = CharacterTextSplitter(separator=" ", chunk_size=7, chunk_overlap=3)
output = splitter.split_text(text)
expected_output = ["foo bar", "bar baz", "baz 123"]
// ... (4037 more lines)
Domain
Subdomains
Functions
- __test_iterative_text_splitter()
- collections()
- custom_iframe_extractor()
- html_header_splitter_splitter_factory()
- test_additional_html_header_text_splitter()
- test_character_text_splitter()
- test_character_text_splitter_chunk_size_effect()
- test_character_text_splitter_discard_regex_separator_on_merge()
- test_character_text_splitter_discard_separator_regex()
- test_character_text_splitter_empty_doc()
- test_character_text_splitter_empty_input()
- test_character_text_splitter_handle_chunksize_equal_to_chunkoverlap()
- test_character_text_splitter_keep_separator_regex()
- test_character_text_splitter_keep_separator_regex_end()
- test_character_text_splitter_keep_separator_regex_start()
- test_character_text_splitter_long()
- test_character_text_splitter_longer_words()
- test_character_text_splitter_no_separator_in_text()
- test_character_text_splitter_separtor_empty_doc()
- test_character_text_splitter_short_words_first()
- test_character_text_splitter_whitespace_only()
- test_character_text_splitting_args()
- test_cobol_code_splitter()
- test_cpp_code_splitter()
- test_create_documents()
- test_create_documents_with_metadata()
- test_create_documents_with_start_index()
- test_csharp_code_splitter()
- test_decode_returns_no_chunks()
- test_experimental_markdown_syntax_text_splitter()
- test_experimental_markdown_syntax_text_splitter_header_config_on_multi_files()
- test_experimental_markdown_syntax_text_splitter_header_configuration()
- test_experimental_markdown_syntax_text_splitter_on_multi_files()
- test_experimental_markdown_syntax_text_splitter_split_lines()
- test_experimental_markdown_syntax_text_splitter_split_lines_on_multi_files()
- test_experimental_markdown_syntax_text_splitter_with_header_on_multi_files()
- test_experimental_markdown_syntax_text_splitter_with_headers()
- test_golang_code_splitter()
- test_happy_path_splitting_based_on_header_with_font_size()
- test_happy_path_splitting_based_on_header_with_whitespace_chars()
- test_happy_path_splitting_with_duplicate_header_tag()
- test_haskell_code_splitter()
- test_html_code_splitter()
- test_html_header_text_splitter()
- test_html_no_headers_with_multiple_splitters()
- test_html_splitter_keep_separator_default()
- test_html_splitter_keep_separator_end()
- test_html_splitter_keep_separator_false()
- test_html_splitter_keep_separator_start()
- test_html_splitter_keep_separator_true()
- test_html_splitter_preserve_nested_in_paragraph()
- test_html_splitter_preserved_elements_reverse_order()
- test_html_splitter_replacement_order()
- test_html_splitter_with_allowlist_tags()
- test_html_splitter_with_custom_extractor()
- test_html_splitter_with_denylist_tags()
- test_html_splitter_with_external_metadata()
- test_html_splitter_with_href_links()
- test_html_splitter_with_media_preservation()
- test_html_splitter_with_mixed_preserve_and_filter()
- test_html_splitter_with_nested_div_preserved()
- test_html_splitter_with_nested_elements()
- test_html_splitter_with_nested_preserved_elements()
- test_html_splitter_with_no_further_splits()
- test_html_splitter_with_no_headers()
- test_html_splitter_with_preserved_elements()
- test_html_splitter_with_small_chunk_size()
- test_html_splitter_with_text_normalization()
- test_iterative_text_splitter()
- test_iterative_text_splitter_discard_separator()
- test_iterative_text_splitter_keep_separator()
- test_java_code_splitter()
- test_javascript_code_splitter()
- test_jsx_text_splitter()
- test_kotlin_code_splitter()
- test_latex_code_splitter()
- test_lua_code_splitter()
- test_markdown_code_splitter()
- test_md_header_text_splitter_1()
- test_md_header_text_splitter_2()
- test_md_header_text_splitter_3()
- test_md_header_text_splitter_fenced_code_block()
- test_md_header_text_splitter_fenced_code_block_interleaved()
- test_md_header_text_splitter_mixed_headers()
- test_md_header_text_splitter_preserve_headers_1()
- test_md_header_text_splitter_preserve_headers_2()
- test_md_header_text_splitter_with_custom_headers()
- test_md_header_text_splitter_with_invisible_characters()
- test_merge_splits()
- test_metadata_not_shallow()
- test_php_code_splitter()
- test_powershell_code_splitter_longer_code()
- test_powershell_code_splitter_short_code()
- test_proto_file_splitter()
- test_python_code_splitter()
- test_python_text_splitter()
- test_r_code_splitter()
- test_recursive_character_text_splitter_keep_separators()
- test_rst_code_splitter()
- test_ruby_code_splitter()
- test_rust_code_splitter()
- test_scala_code_splitter()
- test_section_aware_happy_path_splitting_based_on_header_1_2()
- test_solidity_code_splitter()
- test_split_documents()
- test_split_json()
- test_split_json_many_calls()
- test_split_json_with_lists()
- test_split_text_on_tokens()
- test_svelte_text_splitter()
- test_swift_code_splitter()
- test_typescript_code_splitter()
- test_visualbasic6_code_splitter()
- test_vue_text_splitter()
Dependencies
- bs4
- collections.abc
- langchain_core._api
- langchain_core.documents
- langchain_text_splitters
- langchain_text_splitters.base
- langchain_text_splitters.character
- langchain_text_splitters.html
- langchain_text_splitters.json
- langchain_text_splitters.jsx
- langchain_text_splitters.markdown
- langchain_text_splitters.python
- pytest
- random
- re
- string
- textwrap
- typing
Source
Frequently Asked Questions
What does test_text_splitters.py do?
test_text_splitters.py is a source file in the langchain codebase, written in python. It belongs to the CoreAbstractions domain, Serialization subdomain.
What functions are defined in test_text_splitters.py?
test_text_splitters.py defines 114 function(s): __test_iterative_text_splitter, collections, custom_iframe_extractor, html_header_splitter_splitter_factory, test_additional_html_header_text_splitter, test_character_text_splitter, test_character_text_splitter_chunk_size_effect, test_character_text_splitter_discard_regex_separator_on_merge, test_character_text_splitter_discard_separator_regex, test_character_text_splitter_empty_doc, and 104 more.
What does test_text_splitters.py depend on?
test_text_splitters.py imports 18 module(s): bs4, collections.abc, langchain_core._api, langchain_core.documents, langchain_text_splitters, langchain_text_splitters.base, langchain_text_splitters.character, langchain_text_splitters.html, and 10 more.
Where is test_text_splitters.py in the architecture?
test_text_splitters.py is located at libs/text-splitters/tests/unit_tests/test_text_splitters.py (domain: CoreAbstractions, subdomain: Serialization, directory: libs/text-splitters/tests/unit_tests).
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free