BaseDocumentTransformer Class — langchain Architecture
Architecture documentation for the BaseDocumentTransformer class in transformers.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD 5a5d7f30_8459_ca4f_038d_31d1ce6c81ac["BaseDocumentTransformer"] 63e976e6_76e9_64a6_6944_2d428350d5d4["transformers.py"] 5a5d7f30_8459_ca4f_038d_31d1ce6c81ac -->|defined in| 63e976e6_76e9_64a6_6944_2d428350d5d4 13cbeb53_c069_a9c3_bd64_993c83bf7832["transform_documents()"] 5a5d7f30_8459_ca4f_038d_31d1ce6c81ac -->|method| 13cbeb53_c069_a9c3_bd64_993c83bf7832 38f2b5f1_ac03_e565_797e_89ef8f500ee0["atransform_documents()"] 5a5d7f30_8459_ca4f_038d_31d1ce6c81ac -->|method| 38f2b5f1_ac03_e565_797e_89ef8f500ee0
Relationship Graph
Source Code
libs/core/langchain_core/documents/transformers.py lines 16–79
class BaseDocumentTransformer(ABC):
"""Abstract base class for document transformation.
A document transformation takes a sequence of `Document` objects and returns a
sequence of transformed `Document` objects.
Example:
```python
class EmbeddingsRedundantFilter(BaseDocumentTransformer, BaseModel):
embeddings: Embeddings
similarity_fn: Callable = cosine_similarity
similarity_threshold: float = 0.95
class Config:
arbitrary_types_allowed = True
def transform_documents(
self, documents: Sequence[Document], **kwargs: Any
) -> Sequence[Document]:
stateful_documents = get_stateful_documents(documents)
embedded_documents = _get_embeddings_from_stateful_docs(
self.embeddings, stateful_documents
)
included_idxs = _filter_similar_embeddings(
embedded_documents,
self.similarity_fn,
self.similarity_threshold,
)
return [stateful_documents[i] for i in sorted(included_idxs)]
async def atransform_documents(
self, documents: Sequence[Document], **kwargs: Any
) -> Sequence[Document]:
raise NotImplementedError
```
"""
@abstractmethod
def transform_documents(
self, documents: Sequence[Document], **kwargs: Any
) -> Sequence[Document]:
"""Transform a list of documents.
Args:
documents: A sequence of `Document` objects to be transformed.
Returns:
A sequence of transformed `Document` objects.
"""
async def atransform_documents(
self, documents: Sequence[Document], **kwargs: Any
) -> Sequence[Document]:
"""Asynchronously transform a list of documents.
Args:
documents: A sequence of `Document` objects to be transformed.
Returns:
A sequence of transformed `Document` objects.
"""
return await run_in_executor(
None, self.transform_documents, documents, **kwargs
)
Source
Frequently Asked Questions
What is the BaseDocumentTransformer class?
BaseDocumentTransformer is a class in the langchain codebase, defined in libs/core/langchain_core/documents/transformers.py.
Where is BaseDocumentTransformer defined?
BaseDocumentTransformer is defined in libs/core/langchain_core/documents/transformers.py at line 16.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free