_get_document_with_hash() — langchain Function Reference
Architecture documentation for the _get_document_with_hash() function in api.py from the langchain codebase.
Entity Profile
Dependency Diagram
graph TD adeaf2c1_ef58_0e0c_bf53_4534663c6164["_get_document_with_hash()"] 203188c0_72d6_6932_bc21_edf25c4c00ef["api.py"] adeaf2c1_ef58_0e0c_bf53_4534663c6164 -->|defined in| 203188c0_72d6_6932_bc21_edf25c4c00ef 5721a97d_0581_0694_e3e6_0ae44f2b3fb0["index()"] 5721a97d_0581_0694_e3e6_0ae44f2b3fb0 -->|calls| adeaf2c1_ef58_0e0c_bf53_4534663c6164 02b67c59_d093_f33d_633c_d77332eb191e["aindex()"] 02b67c59_d093_f33d_633c_d77332eb191e -->|calls| adeaf2c1_ef58_0e0c_bf53_4534663c6164 620ce5e7_2594_a746_99a6_c56af4fd553a["_calculate_hash()"] adeaf2c1_ef58_0e0c_bf53_4534663c6164 -->|calls| 620ce5e7_2594_a746_99a6_c56af4fd553a style adeaf2c1_ef58_0e0c_bf53_4534663c6164 fill:#6366f1,stroke:#818cf8,color:#fff
Relationship Graph
Source Code
libs/core/langchain_core/indexing/api.py lines 169–224
def _get_document_with_hash(
document: Document,
*,
key_encoder: Callable[[Document], str]
| Literal["sha1", "sha256", "sha512", "blake2b"],
) -> Document:
"""Calculate a hash of the document, and assign it to the uid.
When using one of the predefined hashing algorithms, the hash is calculated
by hashing the content and the metadata of the document.
Args:
document: Document to hash.
key_encoder: Hashing algorithm to use for hashing the document.
If not provided, a default encoder using SHA-1 will be used.
SHA-1 is not collision-resistant, and a motivated attacker
could craft two different texts that hash to the
same cache key.
New applications should use one of the alternative encoders
or provide a custom and strong key encoder function to avoid this risk.
When changing the key encoder, you must change the
index as well to avoid duplicated documents in the cache.
Raises:
ValueError: If the metadata cannot be serialized using json.
Returns:
Document with a unique identifier based on the hash of the content and metadata.
"""
metadata: dict[str, Any] = dict(document.metadata or {})
if callable(key_encoder):
# If key_encoder is a callable, we use it to generate the hash.
hash_ = key_encoder(document)
else:
# The hashes are calculated separate for the content and the metadata.
content_hash = _calculate_hash(document.page_content, algorithm=key_encoder)
try:
serialized_meta = json.dumps(metadata, sort_keys=True)
except Exception as e:
msg = (
f"Failed to hash metadata: {e}. "
f"Please use a dict that can be serialized using json."
)
raise ValueError(msg) from e
metadata_hash = _calculate_hash(serialized_meta, algorithm=key_encoder)
hash_ = _calculate_hash(content_hash + metadata_hash, algorithm=key_encoder)
return Document(
# Assign a unique identifier based on the hash.
id=hash_,
page_content=document.page_content,
metadata=document.metadata,
)
Domain
Subdomains
Defined In
Calls
Source
Frequently Asked Questions
What does _get_document_with_hash() do?
_get_document_with_hash() is a function in the langchain codebase, defined in libs/core/langchain_core/indexing/api.py.
Where is _get_document_with_hash() defined?
_get_document_with_hash() is defined in libs/core/langchain_core/indexing/api.py at line 169.
What does _get_document_with_hash() call?
_get_document_with_hash() calls 1 function(s): _calculate_hash.
What calls _get_document_with_hash()?
_get_document_with_hash() is called by 2 function(s): aindex, index.
Analyze Your Own Codebase
Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.
Try Supermodel Free