AI's Baffling Spelling Mistakes Explained

The rapid advancement of AI has been a sight to behold over the past two years, but in spite of the fact that this is the case, it still tends to make some silly mistakes here and there. One of the most frequent mistakes that AI such as DALL-E tend to make has to do with misspelling common words. For example, if you were to ask the AI to generate the image of a menu for a Pakistani restaurant, words like "Seekh Kebab", or "Kebab" "Pakora", and the like would end up getting misspelled or appear in an alien language.


Image: Bing Image Creator - DIW-Aigen

It is important to note that ChatGPT is also frequently making mistakes like these. In situations where it was asked to write a ten letter word without the letters A or E in it, it generated balaclava.

The AI that Instagram created works somewhat in the similar way as well. In one instance, an individual tried to get the AI to generate a sticker with the text “new post” on it, but the image it created apparently contained something that is too adult oriented to mention.

It bears mentioning that that all of these models have different underlying frameworks, but they still tend to make the same mistakes with all things having been considered and taken into account. This is largely due to these models being a whole lot better at detecting larger artifacts as opposed to smaller ones, at least in the case of image generators.

They use diffusion models that generate images from noise, and the finer details can often get lost in this aforementioned noise. The assumption that they make is that the actual text is not very significant, which makes them less likely to construct it with any degree of accuracy than might have been the case otherwise.

As for text generators, they utilize Large Language Models. These models are making complicated calculations for the purposes of recognizing patterns, and they don’t automatically or intuitively understand things that we take for granted.

Anyone would know that humans have five fingers, but the images that diffusion models are trained on don’t always show this many fingers. They might be obscured by an object, or something else of that sort.


With text generators, sometimes the words in the data that they are fed are misspelled intentionally or otherwise, which is a big part of the reason why the models struggle with spellings to such a great extent.

There are ways to get rid of these issues once and for all. Datasets can be changed and more data can be added for the purposes of training AI how hands look, and how things are actually supposed to be spelled. This will very likely be the next stage in the evolution of AI, but it will require a lot of work from the people that are actually creating these models.

Some models also sidestep these problems by avoiding text entirely, such as in the case of Adobe Firefly. Prompts involving billboards and flyers will simply generate a blank billboard or flyers. It will be interesting to see how further advancement will change AI’s ability to add the details that people might be looking for with their prompts.

Read next: AI Image Generators Stumped: Can't Create Simple White or Black Backgrounds, Researchers Discover
Previous Post Next Post