hullo bo

fail#1 archviz

archviz-fail

This is a post to document my attempt at disrupting vray/enscape using AI. It failed, but the history is valuable (to me, at least).

Goal + Motivation

The goal of this project was to create a one-button-magic UX to substitute VRAY for interior design renders. I was motivated by advances in AI image models, specially after OpenAI released what was to become the gpt-image-1 model.

Architects and interior designers spend a significant amount of time generating photoreal images using VRAY/Enscape, which require (i) a robust computer (costly), (ii) specific knowledge on the use of these tools (scarce resource) and (iii) significant productive downtime (productivity sink). Thus, leveraging AI that could get ~95% of the work right and done in seconds instead of hours, without costly equipment or specialized knowledge was a sure-fire way to generate value.

Attempts

1. ComfyUI

My first attemp was using ComfyUI to create specific flows in order to achieve both photorealism, but also to understand style in an automatized way.

For this, I relied mainly on opensource models, be them the base models or optimized by other people, along with canny and depth models to ensure fidelity.

Difficulties

I was making good progress with ComfyUI, but the hardship in testing locally faster and making the models obey styleguides made me think this wasn't the right path. It was not, however, and exhausted one.

input Input image

ComfyUI_temp_onsol_00001_

ComfyUI's best result

2. gpt-image-1

When OpenAI released gpt-image-1 within ChatGPT-4o, it seemed like the pivotal moment for me to launch the VRAY killer. Prompting it with "make this image photorealistic" delivered a very good result, even if not 95% correct vs what was expected from a VRAY render.

input

Input image

Imagem do ChatGPT

ChatGPT First Pass

I got to working on a proper frontend flow, and making sure the workflow was seamless. Didn't need a fancy solution or interface, but wanted to having something usable for dogfeeding purposes (my wife is an architect with a studio which employs people that do renderings with VRAY). I attached Google's gemini-flash-2.0-exp model to generate images just to see it working all the way, and it did. When OpenAI finally release gpt-image-1, I was thrilled: I had it done to be shipped in ~1 day - just needed to solve auth and db, iterate a bit on the prompt and I was done.

Except I wasn't - iterating on the prompt didn't quite solve it. The prompts.txt file show's the different versions I've used, and none delivered the quality I was expecting.

With the prompt iterations, I started iterating with OpenAI's o3, 04-mini and o4-mini-high to find alternatives to tame the model. They're excellent coworkers, albeit with an inferred idea on what is gpt-image-1 - they usually treated as a diffusion model, which I should be able to tweak prompt strenght, model creativity, use canny, depth, etc, along. I played with these ideas, to no avail. And I tried a lot of stuff. I learned a lot too (felzenswalb mapping is goat, and created by a brazilian, lfg!), which is the biggest win. I even think I understand a bit more on how gpt-image-1 works when editing images, which makes me aware of it's limitations, inspite of not having been able to get around them.

Difficulties

And I tried! I paired gpt-image-1 with traditional canny, sobel, laplacian ones. Paired it with SLIC, Felzenswelb pixel mappings, with GMM, KMeans and Meanshift maps... to little to no avail.

I was iterating then with a different image, that seemed easier than my first one.

input New input

soft_canny A soft canny

input_boundaries My SLIC input that gave me one good result, not reproducible

Cozinha Moderna Naturais (1) My average/modal result

OpenAI Playground Apr 28 2025 My best result, using canny, depth and SLIC boundaries

X Pro Image The VRAY benchmark

Final thoughts

Working with cutting edge models is nice, but trying to make them work in precise ways, abstracting the actual interface they propose into buttons is still what I find to be the biggest challenge.

Most of the best AI-wrapper products (Lovable, Cursor) are still leaning on prompts, and leaving this mechanics exposed to the user. They're tooling the models quite a lot, and having them work with a bunch of extensions, but there's an interface of control given to users that make the outcome their co-responsiblity. Abstracting this and being totally responsible for an outcome is still a challenge to be won.

There are attempts I could make at this, using the "make AI-SaaS to make money" goal: fine-tune models per style or per studio, expose parameters, lower users' expectations as to what the end result should be. But as the main project and purpose of defeating VRAY, it's a fail.

It was a very cool one to work on, nonetheless.

Here you'll find the github repo with scripts I've used in this attempt, if you want to take a look. If you think you know how to solve this, let's chat! Let's have fun together.

#build in public #dev #fail