split_text() — langchain Function Reference
Architecture documentation for the split_text() function in markdown.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD ca4b44a0_217b_9ee3_738c_a86f47cf5d13["split_text()"] cd7394a9_9856_dc15_cb00_078cf42f0529["ExperimentalMarkdownSyntaxTextSplitter"] ca4b44a0_217b_9ee3_738c_a86f47cf5d13 -->|defined in| cd7394a9_9856_dc15_cb00_078cf42f0529 b18c92c3_4d24_0e77_6322_b71c795c08ff["split_text()"] b18c92c3_4d24_0e77_6322_b71c795c08ff -->|calls| ca4b44a0_217b_9ee3_738c_a86f47cf5d13 e4272ad6_fa6c_2270_2b87_1b76f6930a95["_match_header()"] ca4b44a0_217b_9ee3_738c_a86f47cf5d13 -->|calls| e4272ad6_fa6c_2270_2b87_1b76f6930a95 a98db0b3_1cd6_99eb_e5f6_1cfcd76b1885["_match_code()"] ca4b44a0_217b_9ee3_738c_a86f47cf5d13 -->|calls| a98db0b3_1cd6_99eb_e5f6_1cfcd76b1885 79e4075c_b84b_1d21_02bd_fab2d78f732e["_match_horz()"] ca4b44a0_217b_9ee3_738c_a86f47cf5d13 -->|calls| 79e4075c_b84b_1d21_02bd_fab2d78f732e b10a7aa1_da71_adf1_c53e_7aa30fa1d45c["_complete_chunk_doc()"] ca4b44a0_217b_9ee3_738c_a86f47cf5d13 -->|calls| b10a7aa1_da71_adf1_c53e_7aa30fa1d45c d853fccb_3a1e_9745_c402_faf93b6c62b2["_resolve_header_stack()"] ca4b44a0_217b_9ee3_738c_a86f47cf5d13 -->|calls| d853fccb_3a1e_9745_c402_faf93b6c62b2 7401677c_acf5_67de_69f0_eb4e531b0f66["_resolve_code_chunk()"] ca4b44a0_217b_9ee3_738c_a86f47cf5d13 -->|calls| 7401677c_acf5_67de_69f0_eb4e531b0f66 b18c92c3_4d24_0e77_6322_b71c795c08ff["split_text()"] ca4b44a0_217b_9ee3_738c_a86f47cf5d13 -->|calls| b18c92c3_4d24_0e77_6322_b71c795c08ff style ca4b44a0_217b_9ee3_738c_a86f47cf5d13 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/markdown.py lines 372–432
def split_text(self, text: str) -> list[Document]:
"""Split the input text into structured chunks.
This method processes the input text line by line, identifying and handling
specific patterns such as headers, code blocks, and horizontal rules to split it
into structured chunks based on headers, code blocks, and horizontal rules.
Args:
text: The input text to be split into chunks.
Returns:
A list of `Document` objects representing the structured
chunks of the input text. If `return_each_line` is enabled, each line
is returned as a separate `Document`.
"""
# Reset the state for each new file processed
self.chunks.clear()
self.current_chunk = Document(page_content="")
self.current_header_stack.clear()
raw_lines = text.splitlines(keepends=True)
while raw_lines:
raw_line = raw_lines.pop(0)
header_match = self._match_header(raw_line)
code_match = self._match_code(raw_line)
horz_match = self._match_horz(raw_line)
if header_match:
self._complete_chunk_doc()
if not self.strip_headers:
self.current_chunk.page_content += raw_line
# add the header to the stack
header_depth = len(header_match.group(1))
header_text = header_match.group(2)
self._resolve_header_stack(header_depth, header_text)
elif code_match:
self._complete_chunk_doc()
self.current_chunk.page_content = self._resolve_code_chunk(
raw_line, raw_lines
)
self.current_chunk.metadata["Code"] = code_match.group(1)
self._complete_chunk_doc()
elif horz_match:
self._complete_chunk_doc()
else:
self.current_chunk.page_content += raw_line
self._complete_chunk_doc()
# I don't see why `return_each_line` is a necessary feature of this splitter.
# It's easy enough to do outside of the class and the caller can have more
# control over it.
if self.return_each_line:
return [
Document(page_content=line, metadata=chunk.metadata)
for chunk in self.chunks
for line in chunk.page_content.splitlines()
if line and not line.isspace()
]
return self.chunks
Domain
Subdomains
Calls
Called By
Source
Frequently Asked Questions
What does split_text() do?
split_text() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/markdown.py.
Where is split_text() defined?
split_text() is defined in libs/text-splitters/langchain_text_splitters/markdown.py at line 372.
What does split_text() call?
split_text() calls 7 function(s): _complete_chunk_doc, _match_code, _match_header, _match_horz, _resolve_code_chunk, _resolve_header_stack, split_text.
What calls split_text()?
split_text() is called by 1 function(s): split_text.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free