__init__() — langchain Function Reference
Architecture documentation for the __init__() function in nltk.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 91006521_b301_e1cb_c4a5_d534eba20af9["__init__()"] 213eacdd_0034_0c08_d507_c27d0748affc["NLTKTextSplitter"] 91006521_b301_e1cb_c4a5_d534eba20af9 -->|defined in| 213eacdd_0034_0c08_d507_c27d0748affc style 91006521_b301_e1cb_c4a5_d534eba20af9 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/text-splitters/langchain_text_splitters/nltk.py lines 22–55
def __init__(
self,
separator: str = "\n\n",
language: str = "english",
*,
use_span_tokenize: bool = False,
**kwargs: Any,
) -> None:
"""Initialize the NLTK splitter.
Args:
separator: The separator to use when combining splits.
language: The language to use.
use_span_tokenize: Whether to use `span_tokenize` instead of
`sent_tokenize`.
Raises:
ImportError: If NLTK is not installed.
ValueError: If `use_span_tokenize` is `True` and separator is not `''`.
"""
super().__init__(**kwargs)
self._separator = separator
self._language = language
self._use_span_tokenize = use_span_tokenize
if self._use_span_tokenize and self._separator:
msg = "When use_span_tokenize is True, separator should be ''"
raise ValueError(msg)
if not _HAS_NLTK:
msg = "NLTK is not installed, please install it with `pip install nltk`."
raise ImportError(msg)
if self._use_span_tokenize:
self._tokenizer = nltk.tokenize._get_punkt_tokenizer(self._language) # noqa: SLF001
else:
self._tokenizer = nltk.tokenize.sent_tokenize
Domain
Subdomains
Source
Frequently Asked Questions
What does __init__() do?
__init__() is a function in the langchain codebase, defined in libs/text-splitters/langchain_text_splitters/nltk.py.
Where is __init__() defined?
__init__() is defined in libs/text-splitters/langchain_text_splitters/nltk.py at line 22.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free