使用 Markdown 格式化富文本内容¶

本指南概述了如何在 TextFields 中使用 Markdown 和 HTML 来格式化聊天对话，并允许对图像、音频、视频和 PDF 进行基本的多模态支持。

TextField 和 TextQuestion 提供了通过设置 use_markdown=True 来启用 Markdown 和 HTML 的选项。鉴于 HTML 的灵活性，我们可以很好地控制向标注者展示数据的方式。我们在下面的示例中提供了一些开箱即用的多模态和聊天模板方法。

主要方法

image_to_htmlaudio_to_htmlvideo_to_htmlpdf_to_htmlchat_to_html

image_to_html("local_image_file.png")

audio_to_html("local_audio_file.mp3")

audio_to_html("local_video_file.mp4")

pdf_to_html("local_pdf_file.pdf")

chat_to_html([{"role": "user", "content": "hello"}])

查看 Markdown - Python 参考以详细了解 rg.markdown 方法的参数。

提示

你可以使用 HTML 获得相当多的创意。例如，考虑可视化图形和表格。你可以使用一些有趣的 Python 包方法，例如 pandas.DataFrame.to_html 和 plotly.io.to_html。

Argilla 以不同的方式提供基本的多模态支持，每种方式都有优点和缺点，但它们都提供相同的 UI 体验，因为它们都依赖于 HTML。

通过 DataURLs 实现本地内容¶

DataURL 是一种方案，允许将数据编码为 base64 编码的字符串，然后直接嵌入到 HTML 中。为了方便这一点，我们提供了一些函数：image_to_html、audio_to_html、video_to_thml 和 pdf_to_html。这些函数接受文件路径或文件的字节数据，并返回相应的 HTMurl，以在 Argilla 用户界面中呈现媒体文件。此外，您还可以为视频和图像设置像素或百分比的 width 和 height（默认为原始尺寸），并为音频和视频设置 autoplay 和 loop 属性为 True（默认为 False）。

警告

DataURL 会增加原始文件大小的内存使用量。此外，不同的浏览器对渲染 DataURL 强制执行不同的尺寸限制，这可能会阻止每个用户的可视化体验。

图像音频视频PDF

from argilla.markdown import image_to_html

html = image_to_html(
    "local_image_file.png",
    width="300px",
    height="300px"
)

rg.Record(
    fields={"markdown_enabled_field": html}
)

from argilla.markdown import audio_to_html

html = audio_to_html(
    "local_audio_file.mp3",
    width="300px",
    height="300px",
    autoplay=True,
    loop=True
)

rg.Record(
    fields={"markdown_enabled_field": html}
)

from argilla.markdown import video_to_thml

html = video_to_html(
    "local_video_file.mp4",
    width="300px",
    height="300px",
    autoplay=True,
    loop=True
)

rg.Record(
    fields={"markdown_enabled_field": html}
)

from argilla.markdown import pdf_to_html

html = pdf_to_html(
    "local_pdf_file.pdf",
    width="300px",
    height="300px"
)

rg.Record(
    fields={"markdown_enabled_field": html}
)

托管内容¶

除了通过 DataURL 上传本地文件外，我们还可以直接可视化链接到媒体文件的 URL，例如托管在公共或私有服务器上的图像、音频、视频和 PDF。在这种情况下，您可以使用基本的 HTML 来可视化 Google Drive 等平台上的内容，或者决定配置私有媒体服务器。

警告

当尝试从私有媒体服务器访问内容时，您必须确保 Argilla 服务器具有对私有媒体服务器的网络访问权限，这可以通过 IP 白名单等方式完成。

图像音频视频PDF

html = "<img src='https://example.com/public-image-file.jpg'>"

rg.Record(
    fields={"markdown_enabled_field": html}
)

html = """
<audio controls>
    <source src="https://example.com/public-audio-file.mp3" type="audio/mpeg">
</audio>
""""

rg.Record(
    fields={"markdown_enabled_field": html}
)

html = """
<video width="320" height="240" controls>
    <source src="https://example.com/public-video-file.mp4" type="video/mp4">
</video>
""""

rg.Record(
    fields={"markdown_enabled_field": html}
)

html = """
<iframe
    src="https://example.com/public-pdf-file.pdf"
    width="600"
    height="500">
</iframe>
""""

rg.Record(
    fields={"markdown_enabled_field": html}
)

聊天和对话支持¶

当处理来自与大型语言模型的多次交互的聊天数据时，能够以类似于常见聊天界面的方式可视化对话可能会很好。为了方便这一点，我们提供了 chat_to_html 函数，该函数将 OpenAI 聊天格式的消息转换为 HTML 格式的聊天界面。

OpenAI 聊天格式

OpenAI 聊天格式是一种将消息列表结构化为来自用户的输入，并将模型生成的消息作为输出的方式。这些消息只能包含角色 "user"（用于人类消息）和 "assistant"、"system" 或 "model"（用于模型生成的消息）。

from argilla.markdown import chat_to_html

messages = [
    {"role": "user", "content": "Hello! How are you?"},
    {"role": "assistant", "content": "I'm good, thank you!"}
]

html = chat_to_html(messages)

rg.Record(
    fields={"markdown_enabled_field": html}
)

使用 Markdown 格式化富文本内容¶

多模态支持：图像、音频、视频、PDF 等¶

通过 DataURLs 实现本地内容¶

托管内容¶

聊天和对话支持¶