Home / Function/ test_awithin_batch_deduplication_counting() — langchain Function Reference

test_awithin_batch_deduplication_counting() — langchain Function Reference

Architecture documentation for the test_awithin_batch_deduplication_counting() function in test_indexing.py from the langchain codebase.

Entity Profile

Dependency Diagram

graph TD
  9ab225eb_3637_657a_c080_de4492fd7e51["test_awithin_batch_deduplication_counting()"]
  a9fb4c74_0865_0941_ade3_563a79762cee["test_indexing.py"]
  9ab225eb_3637_657a_c080_de4492fd7e51 -->|defined in| a9fb4c74_0865_0941_ade3_563a79762cee
  style 9ab225eb_3637_657a_c080_de4492fd7e51 fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/core/tests/unit_tests/indexing/test_indexing.py lines 2113–2166

async def test_awithin_batch_deduplication_counting(
    arecord_manager: InMemoryRecordManager, vector_store: VectorStore
) -> None:
    """Test that within-batch deduplicated documents are counted in num_skipped."""
    # Create documents with within-batch duplicates
    docs = [
        Document(
            page_content="Document A",
            metadata={"source": "1"},
        ),
        Document(
            page_content="Document A",  # Duplicate in same batch
            metadata={"source": "1"},
        ),
        Document(
            page_content="Document B",
            metadata={"source": "2"},
        ),
        Document(
            page_content="Document B",  # Duplicate in same batch
            metadata={"source": "2"},
        ),
        Document(
            page_content="Document C",
            metadata={"source": "3"},
        ),
    ]

    # Index with large batch size to ensure all docs are in one batch
    result = await aindex(
        docs,
        arecord_manager,
        vector_store,
        batch_size=10,  # All docs in one batch
        cleanup="full",
        key_encoder="sha256",
    )

    # Should have 3 unique documents added
    assert result["num_added"] == 3
    # Should have 2 documents skipped due to within-batch deduplication
    assert result["num_skipped"] == 2
    # Total should match input
    assert result["num_added"] + result["num_skipped"] == len(docs)
    assert result["num_deleted"] == 0
    assert result["num_updated"] == 0

    # Verify the content
    assert isinstance(vector_store, InMemoryVectorStore)
    ids = list(vector_store.store.keys())
    contents = sorted(
        [document.page_content for document in vector_store.get_by_ids(ids)]
    )
    assert contents == ["Document A", "Document B", "Document C"]

Domain

Subdomains

Frequently Asked Questions

What does test_awithin_batch_deduplication_counting() do?
test_awithin_batch_deduplication_counting() is a function in the langchain codebase, defined in libs/core/tests/unit_tests/indexing/test_indexing.py.
Where is test_awithin_batch_deduplication_counting() defined?
test_awithin_batch_deduplication_counting() is defined in libs/core/tests/unit_tests/indexing/test_indexing.py at line 2113.

Analyze Your Own Codebase

Get architecture documentation, dependency graphs, and domain analysis for your codebase in minutes.

Try Supermodel Free