Telegram's MarkdownV2 parse mode offers versatile formatting options for messages sent via the Telegram Bot API. However, correctly escaping special characters is crucial to prevent unintended formatting and ensure that messages appear as intended. This guide provides an in-depth Python class implementation that intelligently handles escaping within MarkdownV2 syntax, ensuring that only non-syntax characters are escaped.
MarkdownV2 is an enhanced version of Markdown supported by Telegram's Bot API, allowing for rich text formatting. It supports various styles such as bold, italic, underline, strikethrough, spoiler, inline links, and code blocks. Proper usage of MarkdownV2 enables developers to create visually appealing and well-structured messages.
Feature | MarkdownV2 Syntax | Example |
---|---|---|
Bold | *bold text* | *bold text* |
Italic | _italic text_ | _italic text_ |
Underline | __underline__ | __underline__ |
Strikethrough | ~strikethrough~ | ~strikethrough~ |
Spoiler | ||spoiler|| | ||spoiler|| |
Inline URL | [text](http://example.com) | [Google](http://google.com) |
Inline Code | `code` | `print("Hello")` |
Code Block | code block``` | ```python print("Hello, World!") ``` |
While MarkdownV2 provides robust formatting capabilities, it introduces complexity in message formatting due to the necessity of escaping special characters. Improper escaping can lead to broken formatting, unintended styles, or even message delivery failures. The primary challenges include:
Below is a comprehensive Python class named MarkdownV2Escaper
designed to handle escaping in MarkdownV2 formatted strings for the Telegram API. This class meticulously detects Markdown syntax elements and escapes only the necessary characters outside these elements.
import re
class MarkdownV2Escaper:
"""
A class to escape text for Telegram MarkdownV2 formatting.
It ensures that only characters outside MarkdownV2 syntax are escaped.
"""
def __init__(self):
# Define special characters that need to be escaped in MarkdownV2
self.special_chars = r'_*\[\]()~`>#+-=|{}.!'
# Compile regex patterns for MarkdownV2 elements
self.markdown_patterns = [
r'\*[^*]+\*', # Bold
r'_[^_]+_', # Italic
r'__[^_]+__', # Underline
r'~[^~]+~', # Strikethrough
r'\|\|[^|]+\|\|', # Spoiler
r'\[([^\]]+)\]\(([^)]+)\)', # Inline URL
r'`[^`]+`', # Inline code
r'(?:[^`]*?)```', # Code blocks
r'```python\n[\s\S]*?\n```' # Python code blocks
]
# Combine all patterns into a single regex
self.combined_pattern = re.compile('|'.join(self.markdown_patterns))
def escape(self, text: str) -> str:
"""
Escapes special characters in the text that are not part of MarkdownV2 syntax.
Args:
text (str): The input string with MarkdownV2 formatting.
Returns:
str: The escaped string safe for Telegram API.
"""
if not text:
return text
# Find all MarkdownV2 syntax matches
matches = list(self.combined_pattern.finditer(text))
# Initialize variables
escaped_text = []
last_end = 0
for match in matches:
start, end = match.start(), match.end()
# Escape non-Markdown text before the current match
if last_end < start:
non_markdown_part = text[last_end:start]
escaped_non_markdown = self._escape_non_markdown(non_markdown_part)
escaped_text.append(escaped_non_markdown)
# Append the Markdown syntax without escaping
escaped_text.append(match.group())
last_end = end
# Escape any remaining non-Markdown text after the last match
if last_end < len(text):
remaining_text = text[last_end:]
escaped_remaining = self._escape_non_markdown(remaining_text)
escaped_text.append(escaped_remaining)
return ''.join(escaped_text)
def _escape_non_markdown(self, text: str) -> str:
"""
Escapes special characters in non-Markdown text.
Args:
text (str): Text outside of Markdown syntax.
Returns:
str: Escaped text.
"""
escaped = ''
for char in text:
if char in self.special_chars:
escaped += '\\' + char
else:
escaped += char
return escaped
The MarkdownV2Escaper
class is structured to efficiently process and escape text intended for Telegram messages using MarkdownV2 formatting. Here's a breakdown of its components:
Upon instantiation, the class initializes a list of special characters that require escaping in MarkdownV2. It also compiles a comprehensive regular expression pattern that matches all supported MarkdownV2 syntax elements.
The escape
method is the core function that processes the input text. It performs the following steps:
The _escape_non_markdown
method handles the actual escaping of characters. It iterates through each character in the non-markdown segment and prefixes it with a backslash if it is a special character.
from markdown_v2_escaper import MarkdownV2Escaper
escaper = MarkdownV2Escaper()
input_text = "Hello, *world*! Check out [Google](https://www.google.com) and enjoy ~strikethrough~."
escaped_text = escaper.escape(input_text)
print(escaped_text)
Output: Hello, *world*! Check out [Google](https://www.google.com) and enjoy ~strikethrough~\.
input_text = (
"*bold* _italic_ __underline__ ~strikethrough~ ||spoiler|| "
"*bold _italic bold ~italic bold strikethrough ||italic bold strikethrough spoiler||~ __underline italic bold___ bold* "
"[inline URL](http://www.example.com/) `inline code` "
"
pre-formatted code block
``` "
"```python
pre-formatted Python code block
```"
)
escaped_text = escaper.escape(input_text)
print(escaped_text)
Output: The output will preserve all MarkdownV2 formatting while escaping any special characters outside the syntax.
Properly escaping characters in MarkdownV2 is essential for maintaining the integrity and readability of Telegram messages. The MarkdownV2Escaper
class provides a robust solution by meticulously separating Markdown syntax from regular text and applying escapes only where necessary. This approach not only preserves intended formatting but also ensures that messages are rendered correctly across various Telegram clients.