Home / Function/ aggregate_lines_to_chunks() — langchain Function Reference

aggregate_lines_to_chunks() — langchain Function Reference

Architecture documentation for the aggregate_lines_to_chunks() function in markdown.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  cd7326ce_b97a_382e_8cf3_d4647f3a82a6["aggregate_lines_to_chunks()"]
  6a11b5bb_e2e9_6671_54b0_3ed10f3c9672["MarkdownHeaderTextSplitter"]
  cd7326ce_b97a_382e_8cf3_d4647f3a82a6 -->|defined in| 6a11b5bb_e2e9_6671_54b0_3ed10f3c9672
  b18c92c3_4d24_0e77_6322_b71c795c08ff["split_text()"]
  b18c92c3_4d24_0e77_6322_b71c795c08ff -->|calls| cd7326ce_b97a_382e_8cf3_d4647f3a82a6
  style cd7326ce_b97a_382e_8cf3_d4647f3a82a6 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/markdown.py lines 88–132

    def aggregate_lines_to_chunks(self, lines: list[LineType]) -> list[Document]:
        """Combine lines with common metadata into chunks.

        Args:
            lines: Line of text / associated header metadata

        Returns:
            List of `Document` objects with common metadata aggregated.
        """
        aggregated_chunks: list[LineType] = []

        for line in lines:
            if (
                aggregated_chunks
                and aggregated_chunks[-1]["metadata"] == line["metadata"]
            ):
                # If the last line in the aggregated list
                # has the same metadata as the current line,
                # append the current content to the last lines's content
                aggregated_chunks[-1]["content"] += "  \n" + line["content"]
            elif (
                aggregated_chunks
                and aggregated_chunks[-1]["metadata"] != line["metadata"]
                # may be issues if other metadata is present
                and len(aggregated_chunks[-1]["metadata"]) < len(line["metadata"])
                and aggregated_chunks[-1]["content"].split("\n")[-1][0] == "#"
                and not self.strip_headers
            ):
                # If the last line in the aggregated list
                # has different metadata as the current line,
                # and has shallower header level than the current line,
                # and the last line is a header,
                # and we are not stripping headers,
                # append the current content to the last line's content
                aggregated_chunks[-1]["content"] += "  \n" + line["content"]
                # and update the last line's metadata
                aggregated_chunks[-1]["metadata"] = line["metadata"]
            else:
                # Otherwise, append the current line to the aggregated list
                aggregated_chunks.append(line)

        return [
            Document(page_content=chunk["content"], metadata=chunk["metadata"])
            for chunk in aggregated_chunks
        ]

Subdomains

Called By

Frequently Asked Questions

What does aggregate_lines_to_chunks() do?
aggregate_lines_to_chunks() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/markdown.py.
Where is aggregate_lines_to_chunks() defined?
aggregate_lines_to_chunks() is defined in libs/text-splitters/langchain_text_splitters/markdown.py at line 88.
What calls aggregate_lines_to_chunks()?
aggregate_lines_to_chunks() is called by 1 function(s): split_text.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free