Some of the most prominent LLMs out there such as GPT and others are trained mostly on sources that use the English language with all things having been considered and taken into account. One might assume that these LLMs don’t actually have a first language, but in spite of the fact that this is the case, a recent study found that they actually used English internally. This held true even for those prompts that came from another language entirely.
This study was conducted by the Ecole Polytechnique Federale de Lausanne, or EPFL for short, and it indicated something troubling for the future. AI is starting to become an ever more prevalent element of the manner in which we live our lives. The internal use of English at this current point in time is concerning because of the fact that this is the sort of thing that could potentially end up creating an implicit bias down the line.
These researchers worked at the Data Science Laboratory at their institution. The study involved analyzing what languages were being used at various points within the computational chain. According to the head of DLAB, Professor Robert West, the way that LLMs work is that they predict words by assigning number values to them.
These numbers are called word vectors, act as a kind of address for the word or a set of coordinates. The various layers of computation within LLMs transform the coordinates and after 80 transformations are complete, a new vector is formed for the subsequent word. The more layers that are used, the stronger the model will be and its predictions would end up becoming more accurate as well.
The researchers fed words from multiple languages into the LLMs and tried to make it predict the translation. Even though the model was supposed to translate from French to Chinese or vice versa, it usually ended up using English instead. This bias can have unforeseen consequences in the near to distant future, so it’s essential that more work be done in order to make sure that the bias is reduced by a significant margin.
Image: DIW-Aigen
Read next: Google’s AI-Powered Search Engine Could Cost Company A Loss Of Billions Due To Traffic Decline
This study was conducted by the Ecole Polytechnique Federale de Lausanne, or EPFL for short, and it indicated something troubling for the future. AI is starting to become an ever more prevalent element of the manner in which we live our lives. The internal use of English at this current point in time is concerning because of the fact that this is the sort of thing that could potentially end up creating an implicit bias down the line.
These researchers worked at the Data Science Laboratory at their institution. The study involved analyzing what languages were being used at various points within the computational chain. According to the head of DLAB, Professor Robert West, the way that LLMs work is that they predict words by assigning number values to them.
These numbers are called word vectors, act as a kind of address for the word or a set of coordinates. The various layers of computation within LLMs transform the coordinates and after 80 transformations are complete, a new vector is formed for the subsequent word. The more layers that are used, the stronger the model will be and its predictions would end up becoming more accurate as well.
The researchers fed words from multiple languages into the LLMs and tried to make it predict the translation. Even though the model was supposed to translate from French to Chinese or vice versa, it usually ended up using English instead. This bias can have unforeseen consequences in the near to distant future, so it’s essential that more work be done in order to make sure that the bias is reduced by a significant margin.
Image: DIW-Aigen
Read next: Google’s AI-Powered Search Engine Could Cost Company A Loss Of Billions Due To Traffic Decline