Home / Function/ _cosine_similarity() — langchain Function Reference

_cosine_similarity() — langchain Function Reference

Architecture documentation for the _cosine_similarity() function in utils.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  4c281b40_1396_20c6_d4c7_0be61771cba1["_cosine_similarity()"]
  530fd015_66ee_ef3b_a35b_3710e1b1764c["utils.py"]
  4c281b40_1396_20c6_d4c7_0be61771cba1 -->|defined in| 530fd015_66ee_ef3b_a35b_3710e1b1764c
  0c88bb05_2797_2c68_7ac3_726357557644["maximal_marginal_relevance()"]
  0c88bb05_2797_2c68_7ac3_726357557644 -->|calls| 4c281b40_1396_20c6_d4c7_0be61771cba1
  style 4c281b40_1396_20c6_d4c7_0be61771cba1 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/core/langchain_core/vectorstores/utils.py lines 35–103

def _cosine_similarity(x: Matrix, y: Matrix) -> np.ndarray:
    """Row-wise cosine similarity between two equal-width matrices.

    Args:
        x: A matrix of shape `(n, m)`.
        y: A matrix of shape `(k, m)`.

    Returns:
        A matrix of shape `(n, k)` where each element `(i, j)` is the cosine similarity
            between the `i`th row of `x` and the `j`th row of `y`.

    Raises:
        ValueError: If the number of columns in `x` and `y` are not the same.
        ImportError: If numpy is not installed.
    """
    if not _HAS_NUMPY:
        msg = (
            "cosine_similarity requires numpy to be installed. "
            "Please install numpy with `pip install numpy`."
        )
        raise ImportError(msg)

    if len(x) == 0 or len(y) == 0:
        return np.array([[]])

    x = np.array(x)
    y = np.array(y)

    # Check for NaN
    if np.any(np.isnan(x)) or np.any(np.isnan(y)):
        warnings.warn(
            "NaN found in input arrays, unexpected return might follow",
            category=RuntimeWarning,
            stacklevel=2,
        )

    # Check for Inf
    if np.any(np.isinf(x)) or np.any(np.isinf(y)):
        warnings.warn(
            "Inf found in input arrays, unexpected return might follow",
            category=RuntimeWarning,
            stacklevel=2,
        )

    if x.shape[1] != y.shape[1]:
        msg = (
            f"Number of columns in X and Y must be the same. X has shape {x.shape} "
            f"and Y has shape {y.shape}."
        )
        raise ValueError(msg)
    if not _HAS_SIMSIMD:
        logger.debug(
            "Unable to import simsimd, defaulting to NumPy implementation. If you want "
            "to use simsimd please install with `pip install simsimd`."
        )
        x_norm = np.linalg.norm(x, axis=1)
        y_norm = np.linalg.norm(y, axis=1)
        # Ignore divide by zero errors run time warnings as those are handled below.
        with np.errstate(divide="ignore", invalid="ignore"):
            similarity = np.dot(x, y.T) / np.outer(x_norm, y_norm)
        if np.isnan(similarity).all():
            msg = "NaN values found, please remove the NaN values and try again"
            raise ValueError(msg) from None
        similarity[np.isnan(similarity) | np.isinf(similarity)] = 0.0
        return cast("np.ndarray", similarity)

    x = np.array(x, dtype=np.float32)
    y = np.array(y, dtype=np.float32)
    return 1 - np.array(simd.cdist(x, y, metric="cosine"))

Subdomains

Frequently Asked Questions

What does _cosine_similarity() do?
_cosine_similarity() is a function in the langchain codebase, defined in libs/core/langchain_core/vectorstores/utils.py.
Where is _cosine_similarity() defined?
_cosine_similarity() is defined in libs/core/langchain_core/vectorstores/utils.py at line 35.
What calls _cosine_similarity()?
_cosine_similarity() is called by 1 function(s): maximal_marginal_relevance.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free