Home / Class/ RecursiveJsonSplitter Class — langchain Architecture

RecursiveJsonSplitter Class — langchain Architecture

Architecture documentation for the RecursiveJsonSplitter class in json.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048["RecursiveJsonSplitter"]
  c67269cb_3e1f_66bc_89a3_cf12560e7339["json.py"]
  6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|defined in| c67269cb_3e1f_66bc_89a3_cf12560e7339
  1e9b95f4_2d2a_fb5c_8df1_21b8bd62d2e7["__init__()"]
  6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| 1e9b95f4_2d2a_fb5c_8df1_21b8bd62d2e7
  94b9fce8_9d5d_edc5_43e5_d5f5c75c727e["_json_size()"]
  6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| 94b9fce8_9d5d_edc5_43e5_d5f5c75c727e
  100a8874_86d6_7aab_05cd_e0ff79bf9bed["_set_nested_dict()"]
  6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| 100a8874_86d6_7aab_05cd_e0ff79bf9bed
  ab35e0b0_5a75_02e6_13a4_6e87c8da8bfc["_list_to_dict_preprocessing()"]
  6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| ab35e0b0_5a75_02e6_13a4_6e87c8da8bfc
  d81ccdd5_8091_6de6_425e_3dba9ac5dc9f["_json_split()"]
  6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| d81ccdd5_8091_6de6_425e_3dba9ac5dc9f
  e3fb30a1_bf0b_b803_4160_e962db65ecfe["split_json()"]
  6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| e3fb30a1_bf0b_b803_4160_e962db65ecfe
  03a7225b_8bf6_9fff_f8d4_0a36c276bded["split_text()"]
  6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| 03a7225b_8bf6_9fff_f8d4_0a36c276bded
  66dd54d0_0fcc_541a_96cd_6fc946fe44a0["create_documents()"]
  6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| 66dd54d0_0fcc_541a_96cd_6fc946fe44a0

Relationship Graph

Source Code

libs/text-splitters/langchain_text_splitters/json.py lines 12–190

class RecursiveJsonSplitter:
    """Splits JSON data into smaller, structured chunks while preserving hierarchy.

    This class provides methods to split JSON data into smaller dictionaries or
    JSON-formatted strings based on configurable maximum and minimum chunk sizes.
    It supports nested JSON structures, optionally converts lists into dictionaries
    for better chunking, and allows the creation of document objects for further use.
    """

    max_chunk_size: int = 2000
    """The maximum size for each chunk."""

    min_chunk_size: int = 1800
    """The minimum size for each chunk, derived from `max_chunk_size` if not
    explicitly provided.
    """

    def __init__(
        self, max_chunk_size: int = 2000, min_chunk_size: int | None = None
    ) -> None:
        """Initialize the chunk size configuration for text processing.

        This constructor sets up the maximum and minimum chunk sizes, ensuring that
        the `min_chunk_size` defaults to a value slightly smaller than the
        `max_chunk_size` if not explicitly provided.

        Args:
            max_chunk_size: The maximum size for a chunk.
            min_chunk_size: The minimum size for a chunk.

                If `None`, defaults to the maximum chunk size minus 200, with a lower
                bound of 50.
        """
        super().__init__()
        self.max_chunk_size = max_chunk_size
        self.min_chunk_size = (
            min_chunk_size
            if min_chunk_size is not None
            else max(max_chunk_size - 200, 50)
        )

    @staticmethod
    def _json_size(data: dict[str, Any]) -> int:
        """Calculate the size of the serialized JSON object."""
        return len(json.dumps(data))

    @staticmethod
    def _set_nested_dict(
        d: dict[str, Any],
        path: list[str],
        value: Any,  # noqa: ANN401
    ) -> None:
        """Set a value in a nested dictionary based on the given path."""
        for key in path[:-1]:
            d = d.setdefault(key, {})
        d[path[-1]] = value

    def _list_to_dict_preprocessing(
        self,
        data: Any,  # noqa: ANN401
    ) -> Any:  # noqa: ANN401
        if isinstance(data, dict):
            # Process each key-value pair in the dictionary
            return {k: self._list_to_dict_preprocessing(v) for k, v in data.items()}
        if isinstance(data, list):
            # Convert the list to a dictionary with index-based keys
            return {
                str(i): self._list_to_dict_preprocessing(item)
                for i, item in enumerate(data)
            }
        # Base case: the item is neither a dict nor a list, so return it unchanged
        return data

    def _json_split(
        self,
        data: Any,  # noqa: ANN401
        current_path: list[str] | None = None,
        chunks: list[dict[str, Any]] | None = None,
    ) -> list[dict[str, Any]]:
        """Split json into maximum size dictionaries while preserving structure."""
        current_path = current_path or []

Frequently Asked Questions

What is the RecursiveJsonSplitter class?
RecursiveJsonSplitter is a class in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/json.py.
Where is RecursiveJsonSplitter defined?
RecursiveJsonSplitter is defined in libs/text-splitters/langchain_text_splitters/json.py at line 12.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free