Home / Function/ detect_url() — langchain Function Reference

detect_url() — langchain Function Reference

Architecture documentation for the detect_url() function in _redaction.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  a08cac36_7cd2_7bab_390b_6e02470777c2["detect_url()"]
  be639235_9a02_cc2d_5a7d_637d822fb3b3["_redaction.py"]
  a08cac36_7cd2_7bab_390b_6e02470777c2 -->|defined in| be639235_9a02_cc2d_5a7d_637d822fb3b3
  style a08cac36_7cd2_7bab_390b_6e02470777c2 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/langchain_v1/langchain/agents/middleware/_redaction.py lines 149–206

def detect_url(content: str) -> list[PIIMatch]:
    """Detect URLs in content using regex and stdlib validation.

    Args:
        content: The text content to scan for URLs.

    Returns:
        A list of detected URL matches.
    """
    matches: list[PIIMatch] = []

    # Pattern 1: URLs with scheme (http:// or https://)
    scheme_pattern = r"https?://[^\s<>\"{}|\\^`\[\]]+"

    for match in re.finditer(scheme_pattern, content):
        url = match.group()
        result = urlparse(url)
        if result.scheme in {"http", "https"} and result.netloc:
            matches.append(
                PIIMatch(
                    type="url",
                    value=url,
                    start=match.start(),
                    end=match.end(),
                )
            )

    # Pattern 2: URLs without scheme (www.example.com or example.com/path)
    # More conservative to avoid false positives
    bare_pattern = (
        r"\b(?:www\.)?[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?"
        r"(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+(?:/[^\s]*)?"
    )

    for match in re.finditer(bare_pattern, content):
        start, end = match.start(), match.end()
        # Skip if already matched with scheme
        if any(m["start"] <= start < m["end"] or m["start"] < end <= m["end"] for m in matches):
            continue

        url = match.group()
        # Only accept if it has a path or starts with www
        # This reduces false positives like "example.com" in prose
        if "/" in url or url.startswith("www."):
            # Add scheme for validation (required for urlparse to work correctly)
            test_url = f"http://{url}"
            result = urlparse(test_url)
            if result.netloc and "." in result.netloc:
                matches.append(
                    PIIMatch(
                        type="url",
                        value=url,
                        start=start,
                        end=end,
                    )
                )

    return matches

Domain

Subdomains

Frequently Asked Questions

What does detect_url() do?
detect_url() is a function in the langchain codebase, defined in libs/langchain_v1/langchain/agents/middleware/_redaction.py.
Where is detect_url() defined?
detect_url() is defined in libs/langchain_v1/langchain/agents/middleware/_redaction.py at line 149.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free