Skip to content

Hashing

transcript_indexer.hashing

canonical_conversation_text(turns)

Stable text representation of a conversation for content hashing.

Format: one line per turn, "\t\t", trailing newline. Turns are emitted in the order given (caller sorts by idx).

Source code in src/transcript_indexer/hashing.py
def canonical_conversation_text(turns: Iterable[tuple[int, str, str]]) -> str:
    """Stable text representation of a conversation for content hashing.

    Format: one line per turn, "<idx>\\t<speaker>\\t<text>", trailing newline.
    Turns are emitted in the order given (caller sorts by idx).
    """
    return "\n".join(f"{idx}\t{speaker}\t{text}" for idx, speaker, text in turns) + "\n"