Home / Function/ _create_documents() — langchain Function Reference

_create_documents() — langchain Function Reference

Architecture documentation for the _create_documents() function in html.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8["_create_documents()"]
  5af47ada_f6e1_33df_ed07_12ca64351fa0["HTMLSemanticPreservingSplitter"]
  5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8 -->|defined in| 5af47ada_f6e1_33df_ed07_12ca64351fa0
  252723d0_ba69_6fd1_f520_2ee9bc89cc3e["_process_html()"]
  252723d0_ba69_6fd1_f520_2ee9bc89cc3e -->|calls| 5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8
  f7ca6eae_27af_591b_5082_e978259ac965["_reinsert_preserved_elements()"]
  5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8 -->|calls| f7ca6eae_27af_591b_5082_e978259ac965
  1ad208c7_864b_e6dd_1344_b5ed70211298["_further_split_chunk()"]
  5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8 -->|calls| 1ad208c7_864b_e6dd_1344_b5ed70211298
  style 5c2975ee_08fc_2de6_69ef_e1ab9fb5ded8 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/html.py lines 991–1013

    def _create_documents(
        self, headers: dict[str, str], content: str, preserved_elements: dict[str, str]
    ) -> list[Document]:
        """Creates Document objects from the provided headers, content, and elements.

        Args:
            headers: The headers to attach as metadata to the `Document`.
            content: The content of the `Document`.
            preserved_elements: Preserved elements to be reinserted into the content.

        Returns:
            A list of `Document` objects.
        """
        content = re.sub(r"\s+", " ", content).strip()

        metadata = {**headers, **self._external_metadata}

        if len(content) <= self._max_chunk_size:
            page_content = self._reinsert_preserved_elements(
                content, preserved_elements
            )
            return [Document(page_content=page_content, metadata=metadata)]
        return self._further_split_chunk(content, metadata, preserved_elements)

Subdomains

Called By

Frequently Asked Questions

What does _create_documents() do?
_create_documents() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/html.py.
Where is _create_documents() defined?
_create_documents() is defined in libs/text-splitters/langchain_text_splitters/html.py at line 991.
What does _create_documents() call?
_create_documents() calls 2 function(s): _further_split_chunk, _reinsert_preserved_elements.
What calls _create_documents()?
_create_documents() is called by 1 function(s): _process_html.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free