character.py — langchain Source File
Architecture documentation for character.py, a python file in the langchain codebase. 3 imports, 0 dependents.
Entity Profile
Dependency Diagram
graph LR 2928a4a1_9408_cbea_fa7c_7f66eab697a2["character.py"] 67ec3255_645e_8b6e_1eff_1eb3c648ed95["re"] 2928a4a1_9408_cbea_fa7c_7f66eab697a2 --> 67ec3255_645e_8b6e_1eff_1eb3c648ed95 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3["typing"] 2928a4a1_9408_cbea_fa7c_7f66eab697a2 --> 8e2034b7_ceb8_963f_29fc_2ea6b50ef9b3 885a8262_5dd0_fc53_460c_b7a8de727b5e["langchain_text_splitters.base"] 2928a4a1_9408_cbea_fa7c_7f66eab697a2 --> 885a8262_5dd0_fc53_460c_b7a8de727b5e style 2928a4a1_9408_cbea_fa7c_7f66eab697a2 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
"""Character text splitters."""
from __future__ import annotations
import re
from typing import Any, Literal
from langchain_text_splitters.base import Language, TextSplitter
class CharacterTextSplitter(TextSplitter):
"""Splitting text that looks at characters."""
def __init__(
self,
separator: str = "\n\n",
is_separator_regex: bool = False, # noqa: FBT001,FBT002
**kwargs: Any,
) -> None:
"""Create a new TextSplitter."""
super().__init__(**kwargs)
self._separator = separator
self._is_separator_regex = is_separator_regex
def split_text(self, text: str) -> list[str]:
"""Split into chunks without re-inserting lookaround separators.
Args:
text: The text to split.
Returns:
A list of text chunks.
"""
# 1. Determine split pattern: raw regex or escaped literal
sep_pattern = (
self._separator if self._is_separator_regex else re.escape(self._separator)
)
# 2. Initial split (keep separator if requested)
splits = _split_text_with_regex(
text, sep_pattern, keep_separator=self._keep_separator
)
# 3. Detect zero-width lookaround so we never re-insert it
lookaround_prefixes = ("(?=", "(?<!", "(?<=", "(?!")
is_lookaround = self._is_separator_regex and any(
self._separator.startswith(p) for p in lookaround_prefixes
)
# 4. Decide merge separator:
# - if keep_separator or lookaround -> don't re-insert
# - else -> re-insert literal separator
merge_sep = ""
if not (self._keep_separator or is_lookaround):
merge_sep = self._separator
# 5. Merge adjacent splits and return
return self._merge_splits(splits, merge_sep)
// ... (744 more lines)
Domain
Subdomains
Functions
Dependencies
- langchain_text_splitters.base
- re
- typing
Source
Frequently Asked Questions
What does character.py do?
character.py is a source file in the langchain codebase, written in python. It belongs to the DocumentProcessing domain, TextSplitters subdomain.
What functions are defined in character.py?
character.py defines 1 function(s): _split_text_with_regex.
What does character.py depend on?
character.py imports 3 module(s): langchain_text_splitters.base, re, typing.
Where is character.py in the architecture?
character.py is located at libs/text-splitters/langchain_text_splitters/character.py (domain: DocumentProcessing, subdomain: TextSplitters, directory: libs/text-splitters/langchain_text_splitters).
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free