How to Write a Good Prompt: Semantic Core

Mastering the subject and the logic of token weights for better AI generations.

The main problem of modern generative neural networks (Midjourney, DALL-E, Stable Diffusion) lies not in technologies, but in the interaction interface. When you are facing an empty input field, the temptation is great to describe the task abstractly: "make a beautiful modern interior" or "draw a masterpiece." In most cases, this leads to an averaged, gray (not by color, but by content) result.

But we were promised that modern neural networks perfectly understand plain human language...

They do understand, but what do we understand by the concept of "understanding"? For a machine, the words "house" or "girl" do not carry our human meaning, visual experience, or emotions. During the query input, tokenization happens: your words turn into mathematical abstractions, sets of numbers and vectors in a multidimensional latent space.

When a neural network starts creating a picture, it uses a seed (a numerical grain) to form the initial basis of the composition—a canvas of digital noise. Then it kind of scatters these token-seeds, which in the process of a highly complex calculation of probabilities sprout into light and dark spots, acquire shape, color, texture, and ultimately the vector cloud, invisible to the eye, turns into the pixels of the finished picture.

A neural network does not possess artistic taste; it operates with statistical probabilities, since in the training process it simply analyzed millions of images and text descriptions. Because of this, the process often turns into an endless game of roulette: generation after generation hoping for a random stroke of luck.

To get a predictable and high-quality result, it is necessary to abandon attempts to "talk" to the AI and move to the systemic construction of the image. The more exact the query is, the better the neural network will cope with calculating the probabilities.

THE COMPOSITE SKETCH ANALOGY

A simple example is putting together a police sketch. Imagine a bulletin: "Male, dark-skinned, average height, hair dark, curly." Is it realistic to find a person by such a blurry description? Millions fit it.

But if you add special marks, for example, a scar on the right cheek, the chances will rise sharply. And if you go further and describe the details with forensic precision: an aquiline nose profile, deep-set and almond-shaped eyes, massive lower jaws, the exact ratio of the height of the forehead and chin — this will be a highly useful and working verbal portrait for law enforcement agencies.

A neural network works exactly like this: the more specific details, parameters, and spatial restrictions it finds out, the better it will pick the movement vectors from the starting noise (seeds) to the final pixels.

Rule 1: Think like a professional in the field of visual arts, not like a writer

The principles of visual arts have been forming for over 500 years. A professional (no matter whether an artist, photographer, or director) does not think in adjectives, they think in images and scene parameters. They arrange objects into a visual hierarchy so the viewer immediately understands what is important here, and what is secondary. They clearly define how the object interacts with space. They tell a story without words, operating only with visual images.

The creative process of a professional happens largely on a non-verbal level. An artist ponders over the composition, makes sketches. But they rarely spell out: okay, this is my main object. According to the rule of thirds, I will place it here. The horizon line must be high because I want to emphasize...

When it comes down to writing a prompt, even experienced users often miss the need for clear query structuring, hoping that the neural network will understand. As a result, the neural network is left with too much freedom for improvisation.

We created DEUTLI to help amateurs and professionals reach a high-quality result faster. We divided the process of creating the ideal prompt into logical steps to direct the user's thoughts into the right channel. Following from field to field, they formulate their ideas more precisely regarding the content and character of the picture.

We separated the entity (the semantic core, the subject of the image) and how this subject is shown in the scene. This is directly connected to the attention mechanics of neural networks—the so-called token weights. Most modern engines read the query linearly, from beginning to end. The words standing at the very beginning of the prompt get the maximum mathematical weight and have an influence on the final composition.

Yes, this is an absolute technical truth: if at the beginning of the query you set a long description of the atmosphere, style, and lighting, and leave the detailed portrait of the main character at the very end of the text, the result will be noticeably worse. The neural network will exhaust its "attention limit" on the background and special effects, due to which the main subject may lose detail or get distorted.

Look at these images. They are similar. But there are subtle differences, unnoticeable at first glance.

The first picture was generated using a structured query (Subject First):

A tired, aging fox miller, wearing a battered straw hat and patched denim overalls,
sitting slumped on a wooden crate, holding a steaming mug of tea, looking weary.
Sharp focus on the fox's face and detailed clothing textures.
Old wooden mill gears visible, massive piles of flour and grain sacks
marked "FLOUR" and "GRAIN", a flickering oil lantern on a shelf.
Whimsical watercolor illustration, soft fairytale ambiance,
warm golden lighting filtering through a dusty window. Medium shot.

The second one—using a chaotic query (Style First):

Whimsical watercolor illustration style, soft fairytale ambiance,
warm golden lighting filtering through a dusty window,
old wooden mill gears visible, massive piles of flour and grain sacks
clearly marked "FLOUR" and "GRAIN", a flickering oil lantern on a shelf.
Medium shot. A tired, aging fox miller, wearing a battered straw hat
and patched denim overalls, sitting slumped on a wooden crate,
holding a steaming mug of tea, looking weary. Focus is diffuse across the scene.

If you just read them, there is no special difference. A fox miller. An old mill. Image style. The same words, the same length.

But look at the main character in full size. Pay attention to the subtle differences in detail on the clothes and especially on the muzzle. This happened because in the second case (right image), the neural network found out that a main character was needed at the very end of the query. It honestly tried to fix the flaw of the prompt and placed the fox miller in the needed place. But its attention was no longer enough for the detailed elaboration of the image—it spent its tokens on the environment.

Exactly because of this, the query must be structural. We force the system to work correctly: first, the basic geometry and the main object ("what") are strictly fixed, and only then optics, light, and stylistics ("how") are layered onto it.

In turn, the semantic core of the prompt is divided into main text fields and a special Avoid field. Instead of one long line, we offer you to separate the visual intent into four fundamental semantic vectors. Inside the description of exactly what should be depicted in the picture, the word order is no longer so critical. Nevertheless, based on our internal tests, which are confirmed by the advice of numerous prompt engineering experts, it is better to follow a simple logic:

Subject: What exactly is in the frame? What are the physical properties, texture, and materials of the main object?

Action: What is happening to the subject or how does it interact with the environment?

Location: Where is the object located? What is the architecture or geometry of the surrounding space?

Atmosphere: What mood and emotional state does the scene convey? The overall feeling of the shot is described here. An important point: in DEUTLI you do not need to type in the lighting type or virtual camera parameters with text. To avoid conflicts in the query, these critical parameters are moved to separate functional buttons (presets) on the control panel. You simply "snap in" the needed options with one click, and the system itself flawlessly builds it into the prompt structure.

CORE DATA

RESET ALL

SUBJECT

A minimalist geometric glass perfume bottle

ACTION

Resting on a block of rough dark obsidian

ENVIRONMENT

A sleek studio setup with shallow water reflections

ATMOSPHERE

Luxurious, serene, high-end commercial, soft ethereal glow

The text input fields work as rigid guides — when creating a prompt in DEUTLI, you physically will not be able to miss important details simply because you forgot to think about them. The system itself leads you through the needed structure. Along with that, the system is moderately flexible. If the atmosphere is not required to solve your specific task, simply leave this field empty; the prompt will assemble without information about the atmosphere (action or location). The subject is critically important; without it there can be no image, even if you require a background gradient with a light texture.

AVOID | NEGATIVE PROMPT

labels, text, messy background, plastic, cheap materials, low contrast

The Avoid field (or Negative Prompt) is a critical tool for working with AI and one of the main features of the DEUTLI interface. It works on the principle of biological exclusion. In our daily speech, we rarely say what is NOT present in the room. But when you face a neural network, everything changes. The machine "hallucinates" the details from the noise, and if you do not strictly set the boundaries of what is NOT allowed, it can add random garbage, artifacts, or unwanted characters into the frame.

The system takes your Avoid tokens and sews them into the final formula in a specific way, forcing the neural network's attention to skip certain vectors in the multidimensional latent space. This allows you to surgically precisely clean the scene from visual noise and undesirable elements. Although developers of some modern models do not recommend the direct use of negative prompts so as not to provoke unpredictable hallucinations, you can always experiment. We actively use data from this field in the process of automatic improvement and balancing of the final query based on our internal algorithms.

But that is not all. The semantic core defines what is in the image, but now it is time to define how it is captured. Learn how to control optics, lighting, and photorealism without typing a single word.

Read the next part: How to Master Media Style, Lighting, and Optics

Stop typing. Start snapping.

You know the theory. Now put it into practice. Build your first structured visual formula in seconds and export your .deut file.

Launch DEUTLI