_stream() — langchain Function Reference

Architecture documentation for the _stream() function in custom_chat_model.py from the langchain codebase.

Function python LangChainCore LanguageModelBase

Entity Profile

LangChainCore→ LanguageModelBase→ _stream() — langchain Function Reference

Dependency Diagram

graph TD
  ea8bbe3d_c7fc_58bc_8614_aed85432697f["_stream()"]
  827d4990_b3c8_3ba7_bcd4_0a554daa3db4["ChatParrotLink"]
  ea8bbe3d_c7fc_58bc_8614_aed85432697f -->|defined in| 827d4990_b3c8_3ba7_bcd4_0a554daa3db4
  style ea8bbe3d_c7fc_58bc_8614_aed85432697f fill:#6366f1,stroke:#818cf8,color:#fff

Relationship Graph

Source Code

libs/standard-tests/tests/unit_tests/custom_chat_model.py lines 105–168

    def _stream(
        self,
        messages: list[BaseMessage],
        stop: list[str] | None = None,
        run_manager: CallbackManagerForLLMRun | None = None,
        **kwargs: Any,
    ) -> Iterator[ChatGenerationChunk]:
        """Stream the output of the model.

        This method should be implemented if the model can generate output
        in a streaming fashion. If the model does not support streaming,
        do not implement it. In that case streaming requests will be automatically
        handled by the _generate method.

        Args:
            messages: the prompt composed of a list of messages.
            stop: a list of strings on which the model should stop generating.
                  If generation stops due to a stop token, the stop token itself
                  SHOULD BE INCLUDED as part of the output. This is not enforced
                  across models right now, but it's a good practice to follow since
                  it makes it much easier to parse the output of the model
                  downstream and understand why generation stopped.
            run_manager: A run manager with callbacks for the LLM.
            **kwargs: Additional keyword arguments.

        """
        _ = stop  # Mark as used to avoid unused variable warning
        _ = kwargs  # Mark as used to avoid unused variable warning
        last_message = messages[-1]
        tokens = str(last_message.content[: self.parrot_buffer_length])
        ct_input_tokens = sum(len(message.content) for message in messages)

        for token in tokens:
            usage_metadata = UsageMetadata(
                {
                    "input_tokens": ct_input_tokens,
                    "output_tokens": 1,
                    "total_tokens": ct_input_tokens + 1,
                }
            )
            ct_input_tokens = 0
            chunk = ChatGenerationChunk(
                message=AIMessageChunk(content=token, usage_metadata=usage_metadata)
            )

            if run_manager:
                # This is optional in newer versions of LangChain
                # The on_llm_new_token will be called automatically
                run_manager.on_llm_new_token(token, chunk=chunk)

            yield chunk

        # Let's add some other information (e.g., response metadata)
        chunk = ChatGenerationChunk(
            message=AIMessageChunk(
                content="",
                response_metadata={"time_in_sec": 3, "model_name": self.model_name},
            )
        )
        if run_manager:
            # This is optional in newer versions of LangChain
            # The on_llm_new_token will be called automatically
            run_manager.on_llm_new_token(token, chunk=chunk)
        yield chunk