There is no problem unless you insist on reflecting what you had in mind exactly. That needs minute controls, but no matter the medium and tools you use, unless you're doing it in your own quest for artistic perfection, the economic constraints will make you stop short of your idea - there's always a point past which any further refinement will not make a difference to the audience (which doesn't have access to the thing in your head to use as reference), and the costs of continuing will exceed any value (monetary or otherwise) you expect to get from the work.
AI or not, no one but you cares about the lower order bits of your idea.
Nobody else really cares about the lower order bits of the idea but they do care that those lower order bits are consistent. The simplest example is color grading: most viewers are generally ignorant of artistic choices in color palettes unless it’s noticeable like the Netflix blue tint but a movie where the scenes haven’t been made consistently color graded is obviously jarring and even an expensive production can come off amateur.
GenAI is great at filling in those lower order bits but until stuff like ControlNet gets much better precision and UX, I think genAI will be stuck in the uncanny valley because they’re inconsistent between scenes, frames, etc.
Yup, 100% agreed on that, and mentioned this caveat elsewhere. As you say - people don't pay attention to details (or lack of it), as long as the details are consistent. Inconsistencies stand out like sore thumbs. Which is why IMO it's best to have less details than to be inconsistent with them.
>There is no problem unless you insist on reflecting what you had in mind exactly.
Not disagreeing, just noting: this is not how [most?] people's minds work {I don't think you're holding to that opinion particularly, I'm just reflecting on this point}. We have vague ideas until an implementation is shown, then we examine it and latch on to a detail and decide if it matches our idea or not. For me, if I'm imagining "a superhero planting vegetables in his garden" I've no idea what they're actually wearing, but when an artist or genAI shows me it's a brown coat then I'll say "no something more marvel". Then when ultimately they show me something that matches the idea I had _and_ matches my current conception of the idea I had... then I'll point out the fingernails are too long, when in the idea I hadn't even perceived the person had fingers, never mind too-long fingernails!
I'd warrant any actualised artistic work has some delta with the artists current perception of the work; and a larger delta with their initial perception of it.
I disagree. Even without exactness, adding any reasonable constraints is impossible. Ask it to generate a realistic circuit diagram or chess board or any other thing where precision matters. Good luck going back and forth getting it right.
These are situations with relatively simple logical constraints, but an infinite number of valid solutions.
Keep in mind that we are not requiring any particular configuration of circuit diagram, just any diagram that makes sense. There are an infinite number of valid ones.
That's using the wrong tool for a job :). Asking diffusion models to give you a valid circuit diagram is like asking a painter to paint you pixel-perfect 300DPI image on a regular canvas, using their standard paintbrush. It ain't gonna work.
That doesn't mean it can't work with AI - it's that you may need to add something extra to the generative pipeline, something that can do circuit diagrams, and make the diffusion model supply style and extra noise (er, beautifying elements).
> Keep in mind that we are not requiring any particular configuration of circuit diagram, just any diagram that makes sense. There are an infinite number of valid ones.
On that note. I'm the kind of person that loves to freeze-frame movies to look at markings, labels, and computer screens, and one thing I learned is that humans fail at this task too. Most of the time the problems are big and obvious, ruining my suspension of disbelief, and importantly, they could be trivially solved if the producers grabbed a random STEM-interested intern and asked for advice. Alas, it seems they don't care.
This is just a specific instance of the general problem of "whatever you work with or are interested in, you'll see movies keep getting it wrong". Most of the time, it's somewhat defensible - e.g. most movies get guns wrong, but in way people are used to, and makes the scenes more streamlined and entertaining. But with labels, markings and computer screens, doing it right isn't any more expensive, nor would it make the movie any less entertaining. It seems that the people responsible don't know better or care.
Let's keep that in mind when comparing AI output to the "real deal", as to not set an impossible standards that human productions don't match, and never did.
The issue isn’t any particular constraint. The issue is the inability to add any constraints at all.
In particular, internal consistency is one of the important constraints which viewers will immediately notice. If you’re just using sora for 5 second unrelated videos it may be less of an issue but if you want to do anything interesting you’ll need the clips to tie together which requires internal consistency.