Solheim: An uncanny valley

Øyvind Bugge Solheim

When editing becomes generating

Yesterday, I decided to try the new image generation by ChatGPT and got a taste of the LLM’s parallel world or upside-down. I wanted to create an image of my office with a flying drone for a silly tongue in cheek conversation with a colleague (don’t ask). I took a photo of my office and uploaded it to ChatGPT and asked it to add a drone. I expected this to work similarly to the first image generators that developed from tools to remove noise from images but suddenly could start with only noise and create new images based on the description provided.¹ That was all wrong.

Figure 1: A drone in my office

I sent my photo and got a new one and it had a drone in it. The drone did look a bit photoshoped but it served my purpose. I didn’t think about it at first but then I realized the text in the green part in the background was all scrambled. The Norwegian text was suddenly illegible. This has been a typical problem with image generators, but it was strange if ChatGPT had just put a drone into the photo of my office. Looking more around the generated photo showed more strangeness. Suddenly everything was off.

The taskbar on my monitor had moved from the left side to the (more common) bottom, the curtains still had their dots, but just a little bit different dots. All over the photo, ChatGPT had changed small but very visible elements. The photo was in many ways just the same, only a little bit different like a paralel dimension or the “upside-down”. Below you see the two images side by side.²

Figure 2: The Upside-Down

I guess this is the part where I should have explained what happened. Unfortunately, this will only be speculative as I haven’t researched exactly what the implemented image generator is or what kind of technology that it is built on. I’m also a political scientist, not anyone who can really understand the technology. However, for the untrained eye the new image looks like it was created by a very specific prompt describing the first image. All the details are almost exactly the same, colors, dots, plants, and the two images would have been described very similarly. It is only by looking at each specific element that the differences are extremely visible.

In many ways this is similar to my experiences with the LLMs. If you use a LLM as a copy editor or spell checker, you need to be extremely vigilant not to get something else than what you put in. The LLM will give you a nicer version of what you wrote, but maybe not identical to what you entered in the first place. Similar to the hallucinations or confabulations of the LLMs your work window will get an extra hatch, the curtains fewer dots or the taskbar will move. Maybe not the worst thing in the world, but somewhat disturbing for us not living in the Matrix.

I know this is probably not a very good technological description.↩︎
Putting them side by side showed that they have a little bit different dimensions as well.↩︎

An uncanny valley

When editing becomes generating

Citation