微软开源的Visual ChatGPT是个什么东西？-duidaima 堆代码

微软开源的Visual ChatGPT是个什么东西？

发布于 2个月前
 660 热度

 0 评论

比肩天涯
1 粉丝 42 篇博客

微软开源 Visual ChatGPT：用图像交互的方式，跟 ChatGPT 聊天、画画和编辑
github地址:
https://github.com/microsoft/visual-chatgpt
论文地址：
https://arxiv.org/pdf/2303.04671.pdf

几天前，微软在 GitHub 开源了一个重磅项目：Visual ChatGPT，为 ChatGPT 赋能，让用户能使用图像交互的方式，跟 ChatGPT 进行互动。过去数日，在 GitHub Trending 榜单页面，受欢迎程度异常火爆。短短一周，Star数已逼近20000！那么，Visual ChatGPT 到底是什么？它有什么特点和优势？它能给我们带来什么新的体验和可能性？本文将为您介绍这个新颖而有趣的项目。

Demo效果演示

什么是 Visual ChatGPT？
Visual ChatGPT 是一个结合不同视觉基础模型（Visual Foundation Models）的系统，允许用户通过发送和接收语言和图像与 AI 系统进行交互。众所周知，目前 ChatGPT 的信息交互方式主要还是文字，虽说已经能实现写小说、改 Bug、整理文献、编写代码、撰写周报等操作，但是用久了，难免还是希望这种交互方式有进一步的提升。

微软开源的 Visual ChatGPT，将 ChatGPT 的交互从单纯的文字，成功拓展到了文字+图片。Visual ChatGPT 将视觉基础模型与 ChatGPT 连接起来，使得用户不仅可以用语言与 ChatGPT 交流，还可以用图像与之交流，并且提供一些复杂的视觉问题或视觉编辑指令，要求多个 AI 模型之间进行协作和多步骤操作。同时，用户还可以给出反馈，并要求修改结果，使用户能够通过以下方式与 ChatGPT 交互：
1) 发送和接收不仅是语言而且是图像
2) 提供复杂的视觉问题或视觉编辑指令，需要多个 AI 模型之间的协作和多步骤操作
3) 提供反馈并要求修改结果

Visual ChatGPT 的系统架构

VFM 全称是 Visual Foundation Model（视觉基础模型），像 Stable Diffusion、ControlNet、BLIP 等图像处理类模型，都属于该分类。作为 ChatGPT 和 VFM 之间的桥梁，提示管理器(Prompt Manger)明确告知 ChatGPT 每个 VFM 的功能并指定必要的输入输出格式。

它将各种类型的视觉信息（例如 png 图像、深度图像和遮罩矩阵）转换为语言格式以帮助 ChatGPT 理解。同时管理不同 VFM 的历史记录、优先级和冲突，通过使用提示管理器，ChatGPT 可以有效地利用 VFM 并以迭代的方式接收他们的反馈，直到满足用户的要求或达到结束条件。

系统架构：

上面这张图片，拆分为左、中、右三部分
1) 左：
项目 Demo 示例，在该 Demo 中，用户与 ChatGPT 进行了三次交流。
第一次交流（Q1 & A1）：用户发送了一张沙发图片，ChatGPT 回复「收到」。
第二次交流（Q2 & A2）：用户让 ChatGPT 将图片中的沙发替换为桌子，并让其看起来像一幅水墨画。ChatGPT 收到指令并生成了两幅示例图。
第三次交流（Q3 & A3）：用户问 ChatGPT，图像中墙壁的颜色，ChatGPT 回答「蓝色」。

2) 中：代表的是 Visual ChatGPT 的工作流程，在模型接收到提问（Query）后，会判断是否需要使用 VFM 进行处理。
3 )右：代表的是 VFM 详细处理说明，分别表示模型在接到不同消息指令时，具体的处理与答复流程。

Visual ChatGPT 的特点和优势
它扩展了聊天机器人的输入和输出范围。传统的聊天机器人只能处理文本信息，而 Visual ChatGPT 可以处理文本和图像信息，并且可以根据用户需求生成相应格式的回复。它提高了聊天机器人的智能水平。传统的聊天机器人只能在单一领域或任务上表现出智能行为，而 Visual ChatGPT 可以在多个领域或任务上表现出智能行为，并且可以根据上下文切换不同模式。它增加了聊天机器人的趣味性和互动性。传统的聊天机器人只能进行简单而枯燥的对话，而 Visual ChatGPT 可以进行富有创意和想象力的对话，并且可以根据用户喜好调整风格。

使用
说明：如果计算机配置高，需要显卡，可以进行尝试，或者通过Google Colab来进行配置

环境安装：
conda create -n visgpt python=3.8 #创建环境
conda activate visgpt #激活环境
pip install -r requirement.txt #准备环境
bash download.sh #下载模型
快速开始：

# 堆代码 duidaima.com
# clone the repo
git clone https://github.com/microsoft/visual-chatgpt.git
# Go to directory
cd visual-chatgpt
# create a new environment
conda create -n visgpt python=3.8
# activate the new environment
conda activate visgpt
#  prepare the basic environments
pip install -r requirements.txt
# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}
# prepare your private OpenAI key (for Windows)
set OPENAI_API_KEY={Your_Private_Openai_Key}

# Start Visual ChatGPT !
# You can specify the GPU/CPU assignment by "--load", the parameter indicates which 
# Visual Foundation Model to use and where it will be loaded to
# The model and device are sperated by underline '_', the different models are seperated by comma ','
# The available Visual Foundation Models can be found in the following table
# For example, if you want to load ImageCaptioning to cpu and Text2Image to cuda:0
# You can use: "ImageCaptioning_cpu,Text2Image_cuda:0"

# Advice for CPU Users
python visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu
# Advice for 1 Tesla T4 15GB  (Google Colab)                       
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,Text2Image_cuda:0"  
# Advice for 4 Tesla V100 32GB                            
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,ImageEditing_cuda:0,
    Text2Image_cuda:1,Image2Canny_cpu,CannyText2Image_cuda:1,
    Image2Depth_cpu,DepthText2Image_cuda:1,VisualQuestionAnswering_cuda:2,
    InstructPix2Pix_cuda:2,Image2Scribble_cpu,ScribbleText2Image_cuda:2,
    Image2Seg_cpu,SegText2Image_cuda:2,Image2Pose_cpu,PoseText2Image_cuda:2,
    Image2Hed_cpu,HedText2Image_cuda:3,Image2Normal_cpu,
    NormalText2Image_cuda:3,Image2Line_cpu,LineText2Image_cuda:3"

详情可以访问开头贴的github地址

 用户评论

其它组件.工具
 55 成员 |  458 话题
+我要提问 +随便写写

可能感兴趣的话题

Webhooks -比Jenkins更适合小项目的自动化部署工具

Windows下有什么使用 Linux 桌面的好办法吗？

llm.codes 让苹果文档变成 AI 可读的格式

AI编程工具Cursor的使用技巧总结