aggregate_lines_to_chunks() — langchain Function Reference
Architecture documentation for the aggregate_lines_to_chunks() function in markdown.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD cd7326ce_b97a_382e_8cf3_d4647f3a82a6["aggregate_lines_to_chunks()"] 6a11b5bb_e2e9_6671_54b0_3ed10f3c9672["MarkdownHeaderTextSplitter"] cd7326ce_b97a_382e_8cf3_d4647f3a82a6 -->|defined in| 6a11b5bb_e2e9_6671_54b0_3ed10f3c9672 b18c92c3_4d24_0e77_6322_b71c795c08ff["split_text()"] b18c92c3_4d24_0e77_6322_b71c795c08ff -->|calls| cd7326ce_b97a_382e_8cf3_d4647f3a82a6 style cd7326ce_b97a_382e_8cf3_d4647f3a82a6 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/markdown.py lines 88–132
def aggregate_lines_to_chunks(self, lines: list[LineType]) -> list[Document]:
"""Combine lines with common metadata into chunks.
Args:
lines: Line of text / associated header metadata
Returns:
List of `Document` objects with common metadata aggregated.
"""
aggregated_chunks: list[LineType] = []
for line in lines:
if (
aggregated_chunks
and aggregated_chunks[-1]["metadata"] == line["metadata"]
):
# If the last line in the aggregated list
# has the same metadata as the current line,
# append the current content to the last lines's content
aggregated_chunks[-1]["content"] += " \n" + line["content"]
elif (
aggregated_chunks
and aggregated_chunks[-1]["metadata"] != line["metadata"]
# may be issues if other metadata is present
and len(aggregated_chunks[-1]["metadata"]) < len(line["metadata"])
and aggregated_chunks[-1]["content"].split("\n")[-1][0] == "#"
and not self.strip_headers
):
# If the last line in the aggregated list
# has different metadata as the current line,
# and has shallower header level than the current line,
# and the last line is a header,
# and we are not stripping headers,
# append the current content to the last line's content
aggregated_chunks[-1]["content"] += " \n" + line["content"]
# and update the last line's metadata
aggregated_chunks[-1]["metadata"] = line["metadata"]
else:
# Otherwise, append the current line to the aggregated list
aggregated_chunks.append(line)
return [
Document(page_content=chunk["content"], metadata=chunk["metadata"])
for chunk in aggregated_chunks
]
Domain
Subdomains
Called By
Source
Frequently Asked Questions
What does aggregate_lines_to_chunks() do?
aggregate_lines_to_chunks() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/markdown.py.
Where is aggregate_lines_to_chunks() defined?
aggregate_lines_to_chunks() is defined in libs/text-splitters/langchain_text_splitters/markdown.py at line 88.
What calls aggregate_lines_to_chunks()?
aggregate_lines_to_chunks() is called by 1 function(s): split_text.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free