When developing Telegram bots, ensuring that messages are formatted correctly is crucial for a pleasant user experience. Telegram's MarkdownV2 offers rich formatting options, but improper escaping of special characters can lead to malformed messages. This comprehensive Python class addresses all aspects of escaping and formatting text according to Telegram's MarkdownV2 specifications, ensuring that your messages render flawlessly.
MarkdownV2 extends the basic Markdown syntax with additional formatting options and stricter rules for special characters. To maintain message integrity, certain characters must be escaped using a preceding backslash (\
). The primary categories of characters that require escaping include:
_
, *
, [
, ]
, (
, )
, ~
, `
, >
, #
, +
, -
, =
, |
, {
, }
, .
, !
Additionally, within specific contexts like code blocks and URLs, certain characters require special handling:
`
and \
must be escaped.)
and \
must be escaped.The MarkdownV2Escaper
class is designed to handle all aspects of escaping special characters for Telegram's MarkdownV2 formatting. It ensures that your messages retain their intended formatting without causing parsing errors.
The class comprises several methods, each responsible for handling different parts of the text formatting process. Below is the complete implementation:
import re
class MarkdownV2Escaper:
"""
A utility class to properly escape strings for use in Telegram's MarkdownV2 formatting.
"""
# List of characters requiring escaping in MarkdownV2
SPECIAL_CHARACTERS = r"_*[]()~`>#+-=|{}.!\\"
def escape_text(self, text: str) -> str:
"""
Escapes all special characters in the text for Telegram MarkdownV2.
"""
# Use regex to prepend a backslash to each special character
escaped_text = re.sub(f'([{re.escape(self.SPECIAL_CHARACTERS)}])', r'\\\1', text)
return escaped_text
def escape_code(self, code: str) -> str:
"""
Escapes backticks and backslashes in code blocks for Telegram MarkdownV2.
"""
escaped_code = code.replace('\\', '\\\\').replace('`', '\\`')
return escaped_code
def escape_link(self, url: str) -> str:
"""
Escapes parentheses and backslashes in URLs for Telegram MarkdownV2.
"""
escaped_url = url.replace('\\', '\\\\').replace(')', '\\)')
return escaped_url
def format_bold(self, text: str) -> str:
"""
Formats text as bold.
"""
return f"*{self.escape_text(text)}*"
def format_italic(self, text: str) -> str:
"""
Formats text as italic.
"""
return f"_{self.escape_text(text)}_"
def format_underline(self, text: str) -> str:
"""
Formats text as underlined.
"""
return f"__{self.escape_text(text)}__"
def format_strikethrough(self, text: str) -> str:
"""
Formats text with strikethrough.
"""
return f"~{self.escape_text(text)}~"
def format_spoiler(self, text: str) -> str:
"""
Formats text as a spoiler.
"""
return f"||{self.escape_text(text)}||"
def format_inline_code(self, code: str) -> str:
"""
Formats text as inline code.
"""
return f"`{self.escape_code(code)}`"
def format_code_block(self, code: str, language: str = "") -> str:
"""
Formats text as a code block with an optional language specifier.
"""
if language:
return f"{language}\n{self.escape_code(code)}\n```"
return f"```\n{self.escape_code(code)}\n```"
def format_inline_link(self, text: str, url: str) -> str:
"""
Formats text as an inline link.
"""
escaped_url = self.escape_link(url)
escaped_text = self.escape_text(text)
return f"[{escaped_text}]({escaped_url})"
def format_message(self, text: str,
code_blocks: list = None,
inline_links: dict = None) -> str:
"""
Formats an entire message with proper escaping for Telegram MarkdownV2.
:param text: The main text to be formatted.
:param code_blocks: A list of code blocks to be escaped and inserted.
:param inline_links: A dictionary of inline links to be escaped and inserted.
:return: The fully formatted and escaped message.
"""
formatted_text = self.escape_text(text)
if code_blocks:
for code in code_blocks:
escaped_code = self.escape_code(code)
formatted_text = formatted_text.replace(code, f"```\n{escaped_code}\n```")
if inline_links:
for placeholder, url in inline_links.items():
escaped_url = self.escape_link(url)
formatted_text = formatted_text.replace(placeholder, f"[{placeholder}]({escaped_url})")
return formatted_text
escape_text
Method:
Utilizes regular expressions to identify and escape all special characters in the input text. This ensures that characters like *
and _
are treated as literal characters rather than formatting commands.
escape_code
Method:
Specifically targets backticks and backslashes within code snippets, escaping them to maintain the integrity of the code blocks.
escape_link
Method:
Handles the escaping of parentheses and backslashes within URLs, which is essential for maintaining valid link syntax in MarkdownV2.
Formatting Methods: format_bold
, format_italic
, format_underline
, format_strikethrough
, format_spoiler
, format_inline_code
, format_code_block
, and format_inline_link
methods provide convenient ways to apply specific MarkdownV2 formatting to text segments.
format_message
Method:
This method allows for the bulk formatting of a message, accommodating multiple code blocks and inline links. It ensures that all parts of the message are appropriately escaped and formatted.
Below is an example demonstrating how to utilize the MarkdownV2Escaper
class to format a complex message with various MarkdownV2 elements:
from markdownv2_escaper import MarkdownV2Escaper
escaper = MarkdownV2Escaper()
# Define text with special characters
text = "This is a *bold* text with _italic_ and __underline__. Here's some `inline code` and a [link](http://example.com)."
# Define code blocks
code_blocks = [
"def hello_world():",
" print('Hello, world!')"
]
# Define inline links
inline_links = {
"link": "http://example.com"
}
# Format the message
formatted_message = escaper.format_message(text, code_blocks, inline_links)
print(formatted_message)
The above code will produce the following properly escaped and formatted MarkdownV2 message:
This is a \*bold\* text with \_italic\_ and \_\_underline\_\_. Here's some \`inline code\` and a [link](http://example.com).
def hello_world():
print('Hello, world!')
```
While the MarkdownV2Escaper
class covers the majority of scenarios, certain edge cases require special attention to prevent formatting conflicts:
MarkdownV2 processes formatting entities from left to right. To avoid ambiguity, especially when using multiple formatting styles simultaneously, it's recommended to separate different formatting commands. For example:
Instead of writing:
*bold _italic bold~strikethrough||spoiler||~__underline italic bold___ bold*
Use:
*bold _italic bold ~strikethrough ||spoiler||~ __underline italic bold__* bold*
This approach prevents MarkdownV2 from misinterpreting the nested formatting directives.
When applying multiple formatting styles to the same text segment, ensure that the escape methods are applied correctly to maintain the intended appearance. The MarkdownV2Escaper
class handles nested formatting by escaping characters appropriately, but it's essential to structure the formatting calls in a logical sequence.
To utilize the MarkdownV2Escaper
class within a Telegram bot, integrate it with your message-sending logic as follows:
from telegram import Bot, ParseMode
# Initialize the bot with your token
bot = Bot(token='YOUR_TELEGRAM_BOT_TOKEN')
# Create an instance of the escaper
escaper = MarkdownV2Escaper()
# Define your message components
text = "Hello, *user*! Here is some `inline code` and a [link](http://example.com)."
code_block = "def greet():\n print('Hello, user!')"
# Format the message
formatted_message = escaper.format_message(text, code_blocks=[code_block])
# Send the message
bot.send_message(
chat_id='CHAT_ID',
text=formatted_message,
parse_mode=ParseMode.MARKDOWN_V2
)
Properly escaping and formatting messages using Telegram's MarkdownV2 is essential for maintaining the readability and professionalism of your bot's communications. The MarkdownV2Escaper
class provides a robust solution, handling all necessary escaping and formatting requirements. By integrating this class into your Telegram bot workflow, you can ensure that your messages are both visually appealing and free from formatting errors.