ArtHouse Vision: Feeding Images Back to the AI
You know that feeling when you're building something and you realize you've accidentally solved a fundamental problem with generative AI? The one where the model generates something and then... it's done? You see the output, think "huh, that's not quite right," but the model has no idea what you're thinking?
That's what we solved with ArtHouse.
The Image Feedback Loop Problem
Here's the fundamental issue with most image generation workflows:
- You type a prompt
- The model generates an image
- You look at it
- You realize it's not quite right
- You type a new, improved prompt
- Repeat until satisfied
The problem is steps 3-4. When you look at the generated image and think "hmm, the colors are too warm," the model has no way of knowing what you're seeing.
What if the model could actually see the image it created and help you fix it?
The Arthouse Solution
ArtHouse flips the script by creating a feedback loop where the image goes back to the vision model. Here's how it works:
User → Chat → Vision Model → Prompt → Image Generation → Image → Vision Model Analysis → Refinement
Let me walk you through it:
A Simple Request
I type: "A mystical forest with glowing mushrooms at twilight"
The vision model asks clarifying questions:
- "Would you like this to be more fantasy-style or photorealistic?"
- "What kind of mood are you going for?"
We chat back and forth, refining the vision. Then I click "Generate Image."
The Magic Happens
The prompt gets sent to ComfyUI, the image is generated, and then something cool happens:
The image goes back to the vision model for analysis.
The vision model looks at the generated image and says: "Hmm, I notice the mushrooms are more orange than the glowing blue you described. Would you like me to adjust the prompt?"
The Complete Loop
Now I'm not just guessing at what to change. The AI that generated the image is analyzing it and telling me exactly what needs adjustment. It's like having an art director sitting next to you.
It's the first time I've used a generative AI system where the AI actually understands the gap between what was generated and what was intended.
The Tech That Makes It Work
ArtHouse uses a vision-language model that can both understand text AND analyze images. The key function sends both the conversation history AND the generated image to the model:
async def chat_with_image(
self,
messages: list[dict],
image_data: bytes,
) -> str:
# Convert image to base64
image_b64 = base64.b64encode(image_data).decode("utf-8")
# Build the image message content
image_content = {
"type": "image_url",
"image_url": {
"url": f"data:image/{img_format.lower()};base64,{thumbnail_b64}"
},
}
# Call the LLM with both text and image
response = await loop.run_in_executor(None, _call_llm)
return response.choices[0].message.content
Three URL prefixes handle different resolutions:
/square/- 1328×1328/portrait/- 928×1664/landscape/- 1664×928
The system has three key tools:
recommend_prompt- Generates an optimized prompt from conversationshow_prompt_modal- Shows the prompt to the user in a modal for reviewanalyze_image- Takes the generated image and suggests improvements
The analyze_image tool sends the image back with the original prompt and gets back analysis like: "The composition is strong, but the lighting could be more dramatic."
The Web Interface
The frontend handles the WebSocket connection, modal display, and image rendering. The modal shows the generated prompt and lets you review it before clicking "Generate Image." No cutting and pasting.
What This Changes
ArtHouse isn't just a cool toy - it demonstrates something fundamental:
Generative AI needs feedback loops to be truly useful.
The model that can see its own outputs and understand how to improve them is fundamentally more capable than the model that just generates and waits.
We're not there yet with text generation - models still can't "see" their previous outputs in a meaningful way. But with vision-language models, that day is coming.
ArtHouse is a glimpse of that future. It's a system where the AI doesn't just generate, but collaborates, analyzes, and improves.
The Future We're Building
We've got multiple AI models working together:
- A fast model for interactive chat
- A powerful model for heavy lifting
- An image generation pipeline
- A vision model for analysis
That's not just a server farm - that's an AI creative studio. And the best part? Every component is modular, independent, and can be upgraded without breaking the whole system.
The future of AI-assisted creativity isn't just about better models. It's about better workflows. It's about closing the loop between generation and feedback.
ArtHouse is a step in that direction. And honestly? I'm pretty excited to see what happens next.
Try it. Describe an image, chat about the details, and watch as the AI helps you refine it until it's exactly what you imagined.