jsx.py — langchain Source File

Architecture documentation for jsx.py, a python file in the langchain codebase. 3 imports, 0 dependents.

File python DocumentProcessing TextSplitters 3 imports 1 classes

Entity Profile

DocumentProcessing→ TextSplitters→ jsx.py — langchain Source File

Dependency Diagram

graph LR
  e969e3be_caa0_f4cc_b1ed_b8ef51787409["jsx.py"]
  67ec3255_645e_8b6e_1eff_1eb3c648ed95["re"]
  e969e3be_caa0_f4cc_b1ed_b8ef51787409 --> 67ec3255_645e_8b6e_1eff_1eb3c648ed95
  8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"]
  e969e3be_caa0_f4cc_b1ed_b8ef51787409 --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3
  5d24a664_4d9b_7491_ea6a_e13ddbcc8eeb["langchain_text_splitters"]
  e969e3be_caa0_f4cc_b1ed_b8ef51787409 --> 5d24a664_4d9b_7491_ea6a_e13ddbcc8eeb
  style e969e3be_caa0_f4cc_b1ed_b8ef51787409 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

"""JavaScript framework text splitter."""

import re
from typing import Any

from langchain_text_splitters import RecursiveCharacterTextSplitter


class JSFrameworkTextSplitter(RecursiveCharacterTextSplitter):
    """Text splitter that handles React (JSX), Vue, and Svelte code.

    This splitter extends `RecursiveCharacterTextSplitter` to handle React (JSX), Vue,
    and Svelte code by:

    1. Detecting and extracting custom component tags from the text
    2. Using those tags as additional separators along with standard JS syntax

    The splitter combines:

    * Custom component tags as separators (e.g. `<Component`, `<div`)
    * JavaScript syntax elements (function, const, if, etc)
    * Standard text splitting on newlines

    This allows chunks to break at natural boundaries in React, Vue, and Svelte
    component code.
    """

    def __init__(
        self,
        separators: list[str] | None = None,
        chunk_size: int = 2000,
        chunk_overlap: int = 0,
        **kwargs: Any,
    ) -> None:
        """Initialize the JS Framework text splitter.

        Args:
            separators: Optional list of custom separator strings to use
            chunk_size: Maximum size of chunks to return
            chunk_overlap: Overlap in characters between chunks
            **kwargs: Additional arguments to pass to parent class
        """
        super().__init__(chunk_size=chunk_size, chunk_overlap=chunk_overlap, **kwargs)
        self._separators = separators or []

    def split_text(self, text: str) -> list[str]:
        """Split text into chunks.

        This method splits the text into chunks by:

        * Extracting unique opening component tags using regex
        * Creating separators list with extracted tags and JS separators
        * Splitting the text using the separators by calling the parent class method

        Args:
            text: String containing code to split

        Returns:
            List of text chunks split on component and JS boundaries
        """
        # Extract unique opening component tags using regex
        # Regex to match opening tags, excluding self-closing tags
        opening_tags = re.findall(r"<\s*([a-zA-Z0-9]+)[^>]*>", text)

        component_tags = []
        for tag in opening_tags:
            if tag not in component_tags:
                component_tags.append(tag)
        component_separators = [f"<{tag}" for tag in component_tags]

        js_separators = [
            "\nexport ",
            " export ",
            "\nfunction ",
            "\nasync function ",
            " async function ",
            "\nconst ",
            "\nlet ",
            "\nvar ",
            "\nclass ",
            " class ",
            "\nif ",
            " if ",
            "\nfor ",
            " for ",
            "\nwhile ",
            " while ",
            "\nswitch ",
            " switch ",
            "\ncase ",
            " case ",
            "\ndefault ",
            " default ",
        ]
        separators = (
            self._separators
            + js_separators
            + component_separators
            + ["<>", "\n\n", "&&\n", "||\n"]
        )
        self._separators = separators
        return super().split_text(text)

Domain

DocumentProcessing

Subdomains

TextSplitters

Classes

JSFrameworkTextSplitter

Dependencies

langchain_text_splitters
re
typing

Source

View on GitHub

Frequently Asked Questions

What does jsx.py do?

jsx.py is a source file in the langchain codebase, written in python. It belongs to the DocumentProcessing domain, TextSplitters subdomain.

What does jsx.py depend on?

jsx.py imports 3 module(s): langchain_text_splitters, re, typing.

Where is jsx.py in the architecture?

jsx.py is located at libs/text-splitters/langchain_text_splitters/jsx.py (domain: DocumentProcessing, subdomain: TextSplitters, directory: libs/text-splitters/langchain_text_splitters).

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free