TextSplitters — langchain Architecture
Granular logic for token or character-based chunking.
Entity Profile
Dependency Diagram
graph TD subdomain_DocumentProcessing_TextSplitters["TextSplitters"] ee514707_6f57_aecc_16b7_d59c23562098["spacy.py"] subdomain_DocumentProcessing_TextSplitters --> ee514707_6f57_aecc_16b7_d59c23562098 c67269cb_3e1f_66bc_89a3_cf12560e7339["json.py"] subdomain_DocumentProcessing_TextSplitters --> c67269cb_3e1f_66bc_89a3_cf12560e7339 0a45c4a1_846f_03df_b842_eb6b566c6404["nltk.py"] subdomain_DocumentProcessing_TextSplitters --> 0a45c4a1_846f_03df_b842_eb6b566c6404 e969e3be_caa0_f4cc_b1ed_b8ef51787409["jsx.py"] subdomain_DocumentProcessing_TextSplitters --> e969e3be_caa0_f4cc_b1ed_b8ef51787409 d96ff4b9_fcc1_8428_729e_f75b099397b4["base.py"] subdomain_DocumentProcessing_TextSplitters --> d96ff4b9_fcc1_8428_729e_f75b099397b4 2928a4a1_9408_cbea_fa7c_7f66eab697a2["character.py"] subdomain_DocumentProcessing_TextSplitters --> 2928a4a1_9408_cbea_fa7c_7f66eab697a2 e3efe57c_5b49_c26c_6ca5_45acccb8037f["html.py"] subdomain_DocumentProcessing_TextSplitters --> e3efe57c_5b49_c26c_6ca5_45acccb8037f f8344cc4_57af_1c2e_79df_d4d95c61c897["konlpy.py"] subdomain_DocumentProcessing_TextSplitters --> f8344cc4_57af_1c2e_79df_d4d95c61c897 06b29b39_6256_6eec_e45e_4ff2a6e60f25["markdown.py"] subdomain_DocumentProcessing_TextSplitters --> 06b29b39_6256_6eec_e45e_4ff2a6e60f25 3976f2eb_50f0_66de_1143_e61f820ef5ed["python.py"] subdomain_DocumentProcessing_TextSplitters --> 3976f2eb_50f0_66de_1143_e61f820ef5ed c193313a_7e92_d9e7_c847_6457503013ca["latex.py"] subdomain_DocumentProcessing_TextSplitters --> c193313a_7e92_d9e7_c847_6457503013ca 7a1ee38d_b22f_3305_565e_328c5832dd13["sentence_transformers.py"] subdomain_DocumentProcessing_TextSplitters --> 7a1ee38d_b22f_3305_565e_328c5832dd13 style subdomain_DocumentProcessing_TextSplitters fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Domain
Functions
- _HAS_BS4()
- _HAS_KONLPY()
- _HAS_LXML()
- _HAS_NLTK()
- _HAS_NLTK()
- _HAS_SENTENCE_TRANSFORMERS()
- _HAS_SPACY()
- _HAS_TIKTOKEN()
- _HAS_TRANSFORMERS()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- __init__()
- _complete_chunk_doc()
- _create_documents()
- _encode()
- _filter_tags()
- _find_all_strings()
- _find_all_tags()
- _further_split_chunk()
- _generate_documents()
- _initialize_chunk_configuration()
- _is_custom_header()
- _join_docs()
- _json_size()
- _json_split()
- _list_to_dict_preprocessing()
- _make_spacy_pipeline_for_splitting()
- _match_code()
- _match_header()
- _match_horz()
- _merge_splits()
- _normalize_and_clean_text()
- _process_html()
- _process_links()
- _process_media()
- _reinsert_preserved_elements()
- _resolve_code_chunk()
- _resolve_header_stack()
- _set_nested_dict()
- _split_text()
- _split_text_with_regex()
- aggregate_lines_to_chunks()
- bs4()
- collections()
- collections()
- convert_possible_tags_to_header()
- count_tokens()
- create_documents()
- create_documents()
- create_documents()
- from_huggingface_tokenizer()
- from_language()
- from_tiktoken_encoder()
- get_separators_for_language()
- konlpy()
- lxml()
- nltk()
- nltk()
- sentence_transformers()
- spacy()
- split_documents()
- split_documents()
- split_html_by_headers()
- split_json()
- split_text()
- split_text()
- split_text()
- split_text()
- split_text()
- split_text()
- split_text()
- split_text()
- split_text()
- split_text()
- split_text()
- split_text()
- split_text()
- split_text()
- split_text()
- split_text_from_file()
- split_text_from_file()
- split_text_from_url()
- split_text_on_tokens()
- tiktoken()
- transform_documents()
- transform_documents()
- transformers()
Source Files
- libs/text-splitters/langchain_text_splitters/base.py
- libs/text-splitters/langchain_text_splitters/character.py
- libs/text-splitters/langchain_text_splitters/html.py
- libs/text-splitters/langchain_text_splitters/json.py
- libs/text-splitters/langchain_text_splitters/jsx.py
- libs/text-splitters/langchain_text_splitters/konlpy.py
- libs/text-splitters/langchain_text_splitters/latex.py
- libs/text-splitters/langchain_text_splitters/markdown.py
- libs/text-splitters/langchain_text_splitters/nltk.py
- libs/text-splitters/langchain_text_splitters/python.py
- libs/text-splitters/langchain_text_splitters/sentence_transformers.py
- libs/text-splitters/langchain_text_splitters/spacy.py
Source
- libs/text-splitters/langchain_text_splitters/base.py
- libs/text-splitters/langchain_text_splitters/character.py
- libs/text-splitters/langchain_text_splitters/html.py
- libs/text-splitters/langchain_text_splitters/json.py
- libs/text-splitters/langchain_text_splitters/jsx.py
- libs/text-splitters/langchain_text_splitters/konlpy.py
- libs/text-splitters/langchain_text_splitters/latex.py
- libs/text-splitters/langchain_text_splitters/markdown.py
- libs/text-splitters/langchain_text_splitters/nltk.py
- libs/text-splitters/langchain_text_splitters/python.py
- libs/text-splitters/langchain_text_splitters/sentence_transformers.py
- libs/text-splitters/langchain_text_splitters/spacy.py
Frequently Asked Questions
What is the TextSplitters subdomain?
TextSplitters is a subdomain in the langchain codebase, part of the DocumentProcessing domain. Granular logic for token or character-based chunking. It contains 12 source files.
Which domain does TextSplitters belong to?
TextSplitters belongs to the DocumentProcessing domain.
What functions are in TextSplitters?
The TextSplitters subdomain contains 102 function(s): _HAS_BS4, _HAS_KONLPY, _HAS_LXML, _HAS_NLTK, _HAS_NLTK, _HAS_SENTENCE_TRANSFORMERS, _HAS_SPACY, _HAS_TIKTOKEN, and 94 more.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free