Streaming Message Drafts (stream)
This plugin lets you stream long text messages to Telegram. Any iterator of string snippets can be streamed right into any private chat.
For example, you can make LLM output appear gradually while generating the response.
Quickstart
The plugin installs ctx on the context object.
Streaming messages performs many API calls very rapidly. It is strongly recommended to use the auto
-retry plugin alongside the stream plugin.
import { Bot, type Context } from "grammy";
import { autoRetry } from "@grammyjs/auto-retry";
import { stream, type StreamFlavor } from "@grammyjs/stream";
type MyContext = StreamFlavor<Context>;
const bot = new Bot<MyContext>("");
bot.api.config.use(autoRetry()); // strongly recommended!
bot.use(stream());
async function* slowText() {
// emulate slow text generation
yield "This is som";
await new Promise((r) => setTimeout(r, 2000));
yield "e slowly gen";
await new Promise((r) => setTimeout(r, 2000));
yield "erated text";
}
// Telegram only supports streaming in private chats.
bot.chatType("private")
.command("stream", async (ctx) => {
// Stream the message!
await ctx.replyWithStream(slowText());
});
bot.start();2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
const { Bot } = require("grammy");
const { autoRetry } = require("@grammyjs/auto-retry");
const { stream } = require("@grammyjs/stream");
const bot = new Bot("");
bot.api.config.use(autoRetry()); // strongly recommended!
bot.use(stream());
async function* slowText() {
// emulate slow text generation
yield "This is som";
await new Promise((r) => setTimeout(r, 2000));
yield "e slowly gen";
await new Promise((r) => setTimeout(r, 2000));
yield "erated text";
}
// Telegram only supports streaming in private chats.
bot.chatType("private")
.command("stream", async (ctx) => {
// Stream the message!
await ctx.replyWithStream(slowText());
});
bot.start();2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import { Bot, type Context } from "https://deno.land/x/grammy@v1.42.0/mod.ts";
import { autoRetry } from "https://deno.land/x/grammy_auto_retry@v2.0.2/mod.ts";
import {
stream,
type StreamFlavor,
} from "https://deno.land/x/grammy_stream@v1.0.1/mod.ts";
type MyContext = StreamFlavor<Context>;
const bot = new Bot<MyContext>("");
bot.api.config.use(autoRetry()); // strongly recommended!
bot.use(stream());
async function* slowText() {
// emulate slow text generation
yield "This is som";
await new Promise((r) => setTimeout(r, 2000));
yield "e slowly gen";
await new Promise((r) => setTimeout(r, 2000));
yield "erated text";
}
// Telegram only supports streaming in private chats.
bot.chatType("private")
.command("stream", async (ctx) => {
// Stream the message!
await ctx.replyWithStream(slowText());
});
bot.start();2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
That’s it!
LLM Integration
Most LLM integrations let you stream the output while it is being generated. You can use this plugin to make the LLM output appear gradually in any private chat.
For example, if you use the AI SDK, your setup could look like this:
import { streamText } from "ai";
import { google } from "@ai-sdk/google";
bot.chatType("private")
.command("credits", async (ctx) => {
// Send prompt to LLM:
const { textStream } = streamText({
model: google("gemini-2.5-flash"),
prompt: "How cool are grammY bots?",
});
// Automatically stream response with grammY:
await ctx.replyWithStream(textStream);
});2
3
4
5
6
7
8
9
10
11
12
13
14
import { streamText } from "npm:ai";
import { google } from "npm:@ai-sdk/google";
bot.chatType("private")
.command("credits", async (ctx) => {
// Send prompt to LLM:
const { textStream } = streamText({
model: google("gemini-2.5-flash"),
prompt: "How cool are grammY bots?",
});
// Automatically stream response with grammY:
await ctx.replyWithStream(textStream);
});2
3
4
5
6
7
8
9
10
11
12
13
14
Make sure to replace gemini by whatever the latest model is.
Streaming Formatted Messages
This is much harder than you think.
- LLMs generate probabilistic Markdown. It is often correct, but sometimes not. It follows no specific standard. In particular, they do not always generate Telegram-compatible Markdown. This means that trying to send/stream it to Telegram will fail.
- LLMs generate partial Markdown entities. Even if the output is perfectly aligned with Telegram’s MarkdownV2 specification, individual output chunks might be broken. If you open a section of italic text but only close it in the next chunk, the streaming will crash and no message will be sent.
- LLMs sometimes generate formatting that is not supported by Telegram (even if you instruct them not to). For example, most LLMs love tables, bullet points, and enumerations. Telegram clients cannot render these things.
Telegram also accepts HTML formatting. This has the exact same problems as Markdown. Also, HTML output consumes a lot more tokens, which is needlessly expensive.
So … what now?
Unfortunately, there is no good solution. However, here are some ideas:
- Tell your LLM to output text without formatting
- Hope that your LLM does not make mistakes in generating Markdown, and simply retry with plain text if it fails
- Use HTML formatting and hope that this improves things a bit
- Write a custom transformer function which retries failing requests automatically
- Use a streaming markdown parser and build your own
Messagearrays for formatting eachEntity MessageDraft Piece - Stream Markdown in plain text and then use a regular markdown parser to apply the formatting only after the stream is complete and all messages are sent
- Come up with a genius solution that nobody else has thought of before, and tell us about it in the group chat