Home / Class/ BaseDocumentTransformer Class — langchain Architecture

BaseDocumentTransformer Class — langchain Architecture

Architecture documentation for the BaseDocumentTransformer class in transformers.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  5a5d7f30_8459_ca4f_038d_31d1ce6c81ac["BaseDocumentTransformer"]
  63e976e6_76e9_64a6_6944_2d428350d5d4["transformers.py"]
  5a5d7f30_8459_ca4f_038d_31d1ce6c81ac -->|defined in| 63e976e6_76e9_64a6_6944_2d428350d5d4
  13cbeb53_c069_a9c3_bd64_993c83bf7832["transform_documents()"]
  5a5d7f30_8459_ca4f_038d_31d1ce6c81ac -->|method| 13cbeb53_c069_a9c3_bd64_993c83bf7832
  38f2b5f1_ac03_e565_797e_89ef8f500ee0["atransform_documents()"]
  5a5d7f30_8459_ca4f_038d_31d1ce6c81ac -->|method| 38f2b5f1_ac03_e565_797e_89ef8f500ee0

Relationship Graph

Source Code

libs/core/langchain_core/documents/transformers.py lines 16–79

class BaseDocumentTransformer(ABC):
    """Abstract base class for document transformation.

    A document transformation takes a sequence of `Document` objects and returns a
    sequence of transformed `Document` objects.

    Example:
        ```python
        class EmbeddingsRedundantFilter(BaseDocumentTransformer, BaseModel):
            embeddings: Embeddings
            similarity_fn: Callable = cosine_similarity
            similarity_threshold: float = 0.95

            class Config:
                arbitrary_types_allowed = True

            def transform_documents(
                self, documents: Sequence[Document], **kwargs: Any
            ) -> Sequence[Document]:
                stateful_documents = get_stateful_documents(documents)
                embedded_documents = _get_embeddings_from_stateful_docs(
                    self.embeddings, stateful_documents
                )
                included_idxs = _filter_similar_embeddings(
                    embedded_documents,
                    self.similarity_fn,
                    self.similarity_threshold,
                )
                return [stateful_documents[i] for i in sorted(included_idxs)]

            async def atransform_documents(
                self, documents: Sequence[Document], **kwargs: Any
            ) -> Sequence[Document]:
                raise NotImplementedError
        ```
    """

    @abstractmethod
    def transform_documents(
        self, documents: Sequence[Document], **kwargs: Any
    ) -> Sequence[Document]:
        """Transform a list of documents.

        Args:
            documents: A sequence of `Document` objects to be transformed.

        Returns:
            A sequence of transformed `Document` objects.
        """

    async def atransform_documents(
        self, documents: Sequence[Document], **kwargs: Any
    ) -> Sequence[Document]:
        """Asynchronously transform a list of documents.

        Args:
            documents: A sequence of `Document` objects to be transformed.

        Returns:
            A sequence of transformed `Document` objects.
        """
        return await run_in_executor(
            None, self.transform_documents, documents, **kwargs
        )

Frequently Asked Questions

What is the BaseDocumentTransformer class?
BaseDocumentTransformer is a class in the langchain codebase, defined in libs/core/langchain_core/documents/transformers.py.
Where is BaseDocumentTransformer defined?
BaseDocumentTransformer is defined in libs/core/langchain_core/documents/transformers.py at line 16.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free