Files
X-Agents/account/admin/skills/image-understander/SKILL.md
2026-03-11 16:26:22 +08:00

85 lines
1.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: openakita/skills@image-understander
description: Analyze images using GPT-4 Vision. Supports image description, OCR text extraction, object recognition, and visual Q&A. When you need to understand image content using OpenAI GPT-4 Vision API.
---
# 图片理解技能 (Image Understander)
## 📋 概述
一个基于 OpenAI GPT-4 Vision 的图片理解工具,支持图片描述、文字识别(OCR)、物体识别和图片问答。
## 🚀 功能
| 功能 | 命令 | 说明 |
|------|------|------|
| 图片描述 | `-m describe` | 详细描述图片内容 |
| 文字提取 | `-m ocr` | 提取图片中的所有文字 |
| 物体识别 | `-m objects` | 识别并列出图片中的物体 |
| 图片问答 | `-m qa` | 针对图片回答问题 |
## 📦 安装
```bash
# 安装依赖
pip install openai pillow requests
```
## 🔧 配置
### 方式一:环境变量
```bash
set OPENAI_API_KEY=sk-your-api-key-here
```
### 方式二:命令行传入
```bash
python scripts/main.py -i photo.jpg -a sk-your-key
```
## 📖 使用方法
### 基本使用
```bash
# 描述图片
python scripts/main.py -i photo.jpg -m describe
# 提取文字OCR
python scripts/main.py -i screenshot.png -m ocr
# 识别物体
python scripts/main.py -i photo.jpg -m objects
# 图片问答
python scripts/main.py -i photo.jpg -m qa -q "这个图片里有什么?"
```
### 完整参数
```bash
python scripts/main.py \
--image PATH_TO_IMAGE \
--mode describe|ocr|objects|qa \
--api-key YOUR_API_KEY \
--prompt "你的问题" \
--output OUTPUT.json \
--verbose
```
## 📁 输出示例
```json
{
"mode": "describe",
"image": "photo.jpg",
"result": "A beautiful sunset over the ocean with orange and purple sky...",
"objects": [],
"text": ""
}
```
## ⚠️ 注意事项
- 需要 OpenAI API Key支持 GPT-4 Vision
- 支持的图片格式PNG、JPG、GIF、BMP
- 图片大小建议小于 20MB