_merge_splits() — langchain Function Reference
Architecture documentation for the _merge_splits() function in base.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 38fe665f_16f3_7590_557b_a39c4678e7f6["_merge_splits()"] c86e37d5_f962_cc1e_9821_b665e1359ae8["TextSplitter"] 38fe665f_16f3_7590_557b_a39c4678e7f6 -->|defined in| c86e37d5_f962_cc1e_9821_b665e1359ae8 20289806_e8d6_9514_562e_2bd46282553b["_join_docs()"] 38fe665f_16f3_7590_557b_a39c4678e7f6 -->|calls| 20289806_e8d6_9514_562e_2bd46282553b style 38fe665f_16f3_7590_557b_a39c4678e7f6 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/base.py lines 152–194
def _merge_splits(self, splits: Iterable[str], separator: str) -> list[str]:
# We now want to combine these smaller pieces into medium size
# chunks to send to the LLM.
separator_len = self._length_function(separator)
docs = []
current_doc: list[str] = []
total = 0
for d in splits:
len_ = self._length_function(d)
if (
total + len_ + (separator_len if len(current_doc) > 0 else 0)
> self._chunk_size
):
if total > self._chunk_size:
logger.warning(
"Created a chunk of size %d, which is longer than the "
"specified %d",
total,
self._chunk_size,
)
if len(current_doc) > 0:
doc = self._join_docs(current_doc, separator)
if doc is not None:
docs.append(doc)
# Keep on popping if:
# - we have a larger chunk than in the chunk overlap
# - or if we still have any chunks and the length is long
while total > self._chunk_overlap or (
total + len_ + (separator_len if len(current_doc) > 0 else 0)
> self._chunk_size
and total > 0
):
total -= self._length_function(current_doc[0]) + (
separator_len if len(current_doc) > 1 else 0
)
current_doc = current_doc[1:]
current_doc.append(d)
total += len_ + (separator_len if len(current_doc) > 1 else 0)
doc = self._join_docs(current_doc, separator)
if doc is not None:
docs.append(doc)
return docs
Domain
Subdomains
Calls
Source
Frequently Asked Questions
What does _merge_splits() do?
_merge_splits() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/base.py.
Where is _merge_splits() defined?
_merge_splits() is defined in libs/text-splitters/langchain_text_splitters/base.py at line 152.
What does _merge_splits() call?
_merge_splits() calls 1 function(s): _join_docs.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free