Streaming Message Drafts (`stream`)

This plugin lets you stream long text messages to Telegram. Any iterator of string snippets can be streamed right into any private chat.

For example, you can make LLM output appear gradually while generating the response.

Quickstart

The plugin installs ctx.replyWithStream on the context object.

Streaming messages performs many API calls very rapidly. It is strongly recommended to use the auto-retry plugin alongside the stream plugin.

TypeScriptJavaScriptDeno

import { Bot, type Context } from "grammy";
import { autoRetry } from "@grammyjs/auto-retry";
import { stream, type StreamFlavor } from "@grammyjs/stream";

type MyContext = StreamFlavor<Context>;
const bot = new Bot<MyContext>("");

bot.api.config.use(autoRetry()); // strongly recommended!
bot.use(stream());

async function* slowText() {
  // emulate slow text generation
  yield "This is som";
  await new Promise((r) => setTimeout(r, 2000));
  yield "e slowly gen";
  await new Promise((r) => setTimeout(r, 2000));
  yield "erated text";
}

// Telegram only supports streaming in private chats.
bot.chatType("private")
  .command("stream", async (ctx) => {
    // Stream the message!
    await ctx.replyWithStream(slowText());
  });

bot.start();

const { Bot } = require("grammy");
const { autoRetry } = require("@grammyjs/auto-retry");
const { stream } = require("@grammyjs/stream");

const bot = new Bot("");

bot.api.config.use(autoRetry()); // strongly recommended!
bot.use(stream());

async function* slowText() {
  // emulate slow text generation
  yield "This is som";
  await new Promise((r) => setTimeout(r, 2000));
  yield "e slowly gen";
  await new Promise((r) => setTimeout(r, 2000));
  yield "erated text";
}

// Telegram only supports streaming in private chats.
bot.chatType("private")
  .command("stream", async (ctx) => {
    // Stream the message!
    await ctx.replyWithStream(slowText());
  });

bot.start();

import { Bot, type Context } from "https://deno.land/x/grammy@v1.43.0/mod.ts";
import { autoRetry } from "https://deno.land/x/grammy_auto_retry@v2.0.2/mod.ts";
import {
  stream,
  type StreamFlavor,
} from "https://deno.land/x/grammy_stream@v1.0.1/mod.ts";

type MyContext = StreamFlavor<Context>;
const bot = new Bot<MyContext>("");

bot.api.config.use(autoRetry()); // strongly recommended!
bot.use(stream());

async function* slowText() {
  // emulate slow text generation
  yield "This is som";
  await new Promise((r) => setTimeout(r, 2000));
  yield "e slowly gen";
  await new Promise((r) => setTimeout(r, 2000));
  yield "erated text";
}

// Telegram only supports streaming in private chats.
bot.chatType("private")
  .command("stream", async (ctx) => {
    // Stream the message!
    await ctx.replyWithStream(slowText());
  });

bot.start();

That’s it!

LLM Integration

Most LLM integrations let you stream the output while it is being generated. You can use this plugin to make the LLM output appear gradually in any private chat.

For example, if you use the AI SDK, your setup could look like this:

Node.jsDeno

import { streamText } from "ai";
import { google } from "@ai-sdk/google";

bot.chatType("private")
  .command("credits", async (ctx) => {
    // Send prompt to LLM:
    const { textStream } = streamText({
      model: google("gemini-2.5-flash"),
      prompt: "How cool are grammY bots?",
    });

    // Automatically stream response with grammY:
    await ctx.replyWithStream(textStream);
  });

import { streamText } from "npm:ai";
import { google } from "npm:@ai-sdk/google";

bot.chatType("private")
  .command("credits", async (ctx) => {
    // Send prompt to LLM:
    const { textStream } = streamText({
      model: google("gemini-2.5-flash"),
      prompt: "How cool are grammY bots?",
    });

    // Automatically stream response with grammY:
    await ctx.replyWithStream(textStream);
  });

Make sure to replace gemini-2.5-flash by whatever the latest model is.

Streaming Formatted Messages

This is much harder than you think.

LLMs generate probabilistic Markdown. It is often correct, but sometimes not. It follows no specific standard. In particular, they do not always generate Telegram-compatible Markdown. This means that trying to send/stream it to Telegram will fail.
LLMs generate partial Markdown entities. Even if the output is perfectly aligned with Telegram’s MarkdownV2 specification, individual output chunks might be broken. If you open a section of italic text but only close it in the next chunk, the streaming will crash and no message will be sent.
LLMs sometimes generate formatting that is not supported by Telegram (even if you instruct them not to). For example, most LLMs love tables, bullet points, and enumerations. Telegram clients cannot render these things.

Telegram also accepts HTML formatting. This has the exact same problems as Markdown. Also, HTML output consumes a lot more tokens, which is needlessly expensive.

So … what now?

Unfortunately, there is no good solution. However, here are some ideas:

Tell your LLM to output text without formatting
Hope that your LLM does not make mistakes in generating Markdown, and simply retry with plain text if it fails
Use HTML formatting and hope that this improves things a bit
Write a custom transformer function which retries failing requests automatically
Use a streaming markdown parser and build your own MessageEntity arrays for formatting each MessageDraftPiece
Stream Markdown in plain text and then use a regular markdown parser to apply the formatting only after the stream is complete and all messages are sent
Come up with a genius solution that nobody else has thought of before, and tell us about it in the group chat

Plugin Summary

Name: stream
Source
Reference

Streaming Message Drafts (stream) ​

Quickstart ​

LLM Integration ​

Streaming Formatted Messages ​

Plugin Summary ​

Streaming Message Drafts (`stream`)

Quickstart

LLM Integration

Streaming Formatted Messages

Plugin Summary