Modifying the Rough and Re-Reasoning From Zero

Managing Workflow in Generative Chat Sessions

May 30, 2024

After the initial excitement subsides, you start to notice some of the subtle issues with generative AI. One issue that I've observed is the difficulty of doing incremental development. It's actually easier with code than with other types of thought-based work, but I think I can better outline the problem using an image generation example.

A while ago, I wanted to create an image for a presentation. I opened a chat window and wrote a detailed prompt that explained exactly what I needed. The system generated an image that was almost, but not quite right. Seeing this, I decided to modify the prompt to add the missing element. However, that didn't work very well. GPT started over and gave me a different "take" on my prompt. It was a completely new image with different shapes, colors, and style. I wanted it to use the previous image as a base, but it just started fresh, as if the first generated image didn't exist.

I know enough about transformers and stable diffusion to appreciate how hard it is to make spot edits of generated images, but that didn't help alleviate my frustration. It was a laborious process, and I came close to giving up. I wanted to refine the thing I was working on, but the tools were not making that easy.

Sculptors have a word for "the thing that you are working on" - roughout (or rough for short). It refers to a work in progress that isn't done yet. Upon reflection, we realize that roughs exist in most areas of work. We have drafts of documents and sections of code that we iterate and refine. At some point, we say that we are done. None of us have had the need for the word because our process is usually simple. We take an existing piece of code, image, or idea and refine it until it is good enough. Generative AI makes this process more complex in several ways:

Roughs are now easy to create. We can generate as many as we like simply by prompting again.
Roughs are hard to transport. If you build one in a session, you can't easily open another session and continue working on it. The rough isn't just the thing you are working on; it encompasses all of the knowledge pertaining to it that you created in that session.1
You can't undo refinements of a rough within a session. In a way, this is understandable, as it is hard for humans to un-learn things. However, as developers, we do appreciate the ability to roll back our work.

So, yes, this is all frustrating. It seems that the only lever we can lean on is re-generation. It is a bit like rolling the dice - if I just nudge things the right way, I'll get what I want.

The good news for developers is that refining roughs of code is currently much easier than refining roughs of images. I've been able to make spot changes to code in a session over many iterations. Eventually, most LLMs lose the plot, i.e., they start to unlearn knowledge that was developed early in the session, but it is workable.

That brings us to another issue: the impact on us as we use AI in this way.

Many people have noted that generative AI increases the review load that each of us experiences. Nearly everything that we generate requires review - a very fatiguing type of review. We have to forget what the code looked like a second ago and then look at something that is just like the thing we were looking at, except different in some large or small way. We have to re-reason our way through this nearly equivalent code from zero.

In normal development, we build up our knowledge incrementally. If we've been one of the people developing the code, we don't have to start from zero the next time we visit that code. Unless there has been a massive change, the knowledge you've built up over time is durable.

This might not all sound great, but I do like to consider the positives. The transport and rollback problems are likely to be solved soon. Their existence points out the usefulness of modularity. We can develop roughs separately and then integrate them. It serves as yet another reminder that organizing our work in a way that fits in our heads is useful. Knowing this, I don't mind our current session limits as much as others might. It’s nice to have reminders.

Yes, you can start a new session and re-prompt it using the same exact prompts, but the stochasticity inherent in LLMs makes it unlikely you’ll end up near the same place.

mechanisms

Discussion about this post