_create_documents() — langchain Function Reference
Architecture documentation for the _create_documents() function in html.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8["_create_documents()"] 5af47ada_f6e1_33df_ed07_12ca64351fa0["HTMLSemanticPreservingSplitter"] 5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8 -->|defined in| 5af47ada_f6e1_33df_ed07_12ca64351fa0 252723d0_ba69_6fd1_f520_2ee9bc89cc3e["_process_html()"] 252723d0_ba69_6fd1_f520_2ee9bc89cc3e -->|calls| 5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8 f7ca6eae_27af_591b_5082_e978259ac965["_reinsert_preserved_elements()"] 5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8 -->|calls| f7ca6eae_27af_591b_5082_e978259ac965 1ad208c7_864b_e6dd_1344_b5ed70211298["_further_split_chunk()"] 5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8 -->|calls| 1ad208c7_864b_e6dd_1344_b5ed70211298 style 5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/html.py lines 991–1013
def _create_documents(
self, headers: dict[str, str], content: str, preserved_elements: dict[str, str]
) -> list[Document]:
"""Creates Document objects from the provided headers, content, and elements.
Args:
headers: The headers to attach as metadata to the `Document`.
content: The content of the `Document`.
preserved_elements: Preserved elements to be reinserted into the content.
Returns:
A list of `Document` objects.
"""
content = re.sub(r"\s+", " ", content).strip()
metadata = {**headers, **self._external_metadata}
if len(content) <= self._max_chunk_size:
page_content = self._reinsert_preserved_elements(
content, preserved_elements
)
return [Document(page_content=page_content, metadata=metadata)]
return self._further_split_chunk(content, metadata, preserved_elements)
Domain
Subdomains
Called By
Source
Frequently Asked Questions
What does _create_documents() do?
_create_documents() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is _create_documents() defined?
_create_documents() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 991.
What does _create_documents() call?
_create_documents() calls 2 function(s): _further_split_chunk, _reinsert_preserved_elements.
What calls _create_documents()?
_create_documents() is called by 1 function(s): _process_html.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free