RecursiveJsonSplitter Class — langchain Architecture
Architecture documentation for the RecursiveJsonSplitter class in json.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048["RecursiveJsonSplitter"] c67269cb_3e1f_66bc_89a3_cf12560e7339["json.py"] 6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|defined in| c67269cb_3e1f_66bc_89a3_cf12560e7339 1e9b95f4_2d2a_fb5c_8df1_21b8bd62d2e7["__init__()"] 6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| 1e9b95f4_2d2a_fb5c_8df1_21b8bd62d2e7 94b9fce8_9d5d_edc5_43e5_d5f5c75c727e["_json_size()"] 6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| 94b9fce8_9d5d_edc5_43e5_d5f5c75c727e 100a8874_86d6_7aab_05cd_e0ff79bf9bed["_set_nested_dict()"] 6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| 100a8874_86d6_7aab_05cd_e0ff79bf9bed ab35e0b0_5a75_02e6_13a4_6e87c8da8bfc["_list_to_dict_preprocessing()"] 6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| ab35e0b0_5a75_02e6_13a4_6e87c8da8bfc d81ccdd5_8091_6de6_425e_3dba9ac5dc9f["_json_split()"] 6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| d81ccdd5_8091_6de6_425e_3dba9ac5dc9f e3fb30a1_bf0b_b803_4160_e962db65ecfe["split_json()"] 6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| e3fb30a1_bf0b_b803_4160_e962db65ecfe 03a7225b_8bf6_9fff_f8d4_0a36c276bded["split_text()"] 6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| 03a7225b_8bf6_9fff_f8d4_0a36c276bded 66dd54d0_0fcc_541a_96cd_6fc946fe44a0["create_documents()"] 6fcdab9c_3ecd_51f0_a2ae_3b0c8bfac048 -->|method| 66dd54d0_0fcc_541a_96cd_6fc946fe44a0
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/json.py lines 12–190
class RecursiveJsonSplitter:
"""Splits JSON data into smaller, structured chunks while preserving hierarchy.
This class provides methods to split JSON data into smaller dictionaries or
JSON-formatted strings based on configurable maximum and minimum chunk sizes.
It supports nested JSON structures, optionally converts lists into dictionaries
for better chunking, and allows the creation of document objects for further use.
"""
max_chunk_size: int = 2000
"""The maximum size for each chunk."""
min_chunk_size: int = 1800
"""The minimum size for each chunk, derived from `max_chunk_size` if not
explicitly provided.
"""
def __init__(
self, max_chunk_size: int = 2000, min_chunk_size: int | None = None
) -> None:
"""Initialize the chunk size configuration for text processing.
This constructor sets up the maximum and minimum chunk sizes, ensuring that
the `min_chunk_size` defaults to a value slightly smaller than the
`max_chunk_size` if not explicitly provided.
Args:
max_chunk_size: The maximum size for a chunk.
min_chunk_size: The minimum size for a chunk.
If `None`, defaults to the maximum chunk size minus 200, with a lower
bound of 50.
"""
super().__init__()
self.max_chunk_size = max_chunk_size
self.min_chunk_size = (
min_chunk_size
if min_chunk_size is not None
else max(max_chunk_size - 200, 50)
)
@staticmethod
def _json_size(data: dict[str, Any]) -> int:
"""Calculate the size of the serialized JSON object."""
return len(json.dumps(data))
@staticmethod
def _set_nested_dict(
d: dict[str, Any],
path: list[str],
value: Any, # noqa: ANN401
) -> None:
"""Set a value in a nested dictionary based on the given path."""
for key in path[:-1]:
d = d.setdefault(key, {})
d[path[-1]] = value
def _list_to_dict_preprocessing(
self,
data: Any, # noqa: ANN401
) -> Any: # noqa: ANN401
if isinstance(data, dict):
# Process each key-value pair in the dictionary
return {k: self._list_to_dict_preprocessing(v) for k, v in data.items()}
if isinstance(data, list):
# Convert the list to a dictionary with index-based keys
return {
str(i): self._list_to_dict_preprocessing(item)
for i, item in enumerate(data)
}
# Base case: the item is neither a dict nor a list, so return it unchanged
return data
def _json_split(
self,
data: Any, # noqa: ANN401
current_path: list[str] | None = None,
chunks: list[dict[str, Any]] | None = None,
) -> list[dict[str, Any]]:
"""Split json into maximum size dictionaries while preserving structure."""
current_path = current_path or []
Domain
Source
Frequently Asked Questions
What is the RecursiveJsonSplitter class?
RecursiveJsonSplitter is a class in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/json.py.
Where is RecursiveJsonSplitter defined?
RecursiveJsonSplitter is defined in libs/text-splitters/langchain_text_splitters/json.py at line 12.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free