detect_url() — langchain Function Reference
Architecture documentation for the detect_url() function in _redaction.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD a08cac36_7cd2_7bab_390b_6e02470777c2["detect_url()"] be639235_9a02_cc2d_5a7d_637d822fb3b3["_redaction.py"] a08cac36_7cd2_7bab_390b_6e02470777c2 -->|defined in| be639235_9a02_cc2d_5a7d_637d822fb3b3 style a08cac36_7cd2_7bab_390b_6e02470777c2 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/langchain_v1/langchain/agents/middleware/_redaction.py lines 149–206
def detect_url(content: str) -> list[PIIMatch]:
"""Detect URLs in content using regex and stdlib validation.
Args:
content: The text content to scan for URLs.
Returns:
A list of detected URL matches.
"""
matches: list[PIIMatch] = []
# Pattern 1: URLs with scheme (http:// or https://)
scheme_pattern = r"https?://[^\s<>\"{}|\\^`\[\]]+"
for match in re.finditer(scheme_pattern, content):
url = match.group()
result = urlparse(url)
if result.scheme in {"http", "https"} and result.netloc:
matches.append(
PIIMatch(
type="url",
value=url,
start=match.start(),
end=match.end(),
)
)
# Pattern 2: URLs without scheme (www.example.com or example.com/path)
# More conservative to avoid false positives
bare_pattern = (
r"\b(?:www\.)?[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?"
r"(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+(?:/[^\s]*)?"
)
for match in re.finditer(bare_pattern, content):
start, end = match.start(), match.end()
# Skip if already matched with scheme
if any(m["start"] <= start < m["end"] or m["start"] < end <= m["end"] for m in matches):
continue
url = match.group()
# Only accept if it has a path or starts with www
# This reduces false positives like "example.com" in prose
if "/" in url or url.startswith("www."):
# Add scheme for validation (required for urlparse to work correctly)
test_url = f"http://{url}"
result = urlparse(test_url)
if result.netloc and "." in result.netloc:
matches.append(
PIIMatch(
type="url",
value=url,
start=start,
end=end,
)
)
return matches
Domain
Subdomains
Source
Frequently Asked Questions
What does detect_url() do?
detect_url() is a function in the langchain codebase, defined in libs/langchain_v1/langchain/agents/middleware/_redaction.py.
Where is detect_url() defined?
detect_url() is defined in libs/langchain_v1/langchain/agents/middleware/_redaction.py at line 149.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free