Google has confirmed that its SynthID Text tool will soon be open-sourced. The company says the goal is to make it generally available for the masses.
The tool can be installed from the leading AI app Hugging Face and also through the company’s Responsible GenAI Toolkit. This was confirmed in a post through X which signaled its free use by developers and brands. Hence, many can now better identify content made from AI.
For those still wondering how the tool works, here’s a little breakdown. When a prompt is added in the form of a query, the model predicts the sequence of tokens. This could be a single term or a character and can be looked at as building blocks to process data.
Every model assigns tokens a score which is the likelihood that it would be a part of the output. The tool will add more data if needed by modulating the probability of token generation.
The final design of scores for every model’s choice of terms is totaled with the probabilities to give a watermark. So the pattern for scoring varies for watermarked content and those without a watermark. So this helps SynthID detect when and if content is made using AI or if a combo of different sources was used to give the output.
As per Google, it’s been a part of Gemini since the start of Spring. Hence, it won’t compromise quality, accuracy, or even the speed at which text is generated. It also works great for text that’s cropped or paraphrased.
However, the firm even admits that the watermarking approach does come with its a fair share of restrictions. For instance, it doesn’t do a great job when short text is considered. The same is the case for text rewritten from other languages or replies to factual queries.
In terms of replies to actual prompts, there are fewer chances to make adjustments for tokens without impacting the real accuracy. For instance, asking for the capitals of certain cities or content where little variation can be seen will give inaccuracies.
Google is not the only firm working on AI text watermarking. We’ve seen the makers of ChatGPT experiment with different watermarking methods for a while now. There is a delay in the release, thanks to commercial and technical considerations.
As per experts, this kind of tool can assist in removing inaccuracies in content and better detect AI. today, a growing problem is the rise of AI detectors that falsely flag content when they use generic terms. So if this new tech is adopted, it could be a revolution.
Today, we’ve seen the government of China make watermarking tech a mandatory option for all AI content. The same goes for the state of California which is trying to adopt similar tech. And given the urgency of the matter, more nations will follow. As per the EU, predictions from research studies go as far as to claim that 90% of all online content could be synthetically generated by 2026.
Read next: AI’s Limitations in Storytelling: What This Study Reveals About Reader Preferences
The tool can be installed from the leading AI app Hugging Face and also through the company’s Responsible GenAI Toolkit. This was confirmed in a post through X which signaled its free use by developers and brands. Hence, many can now better identify content made from AI.
For those still wondering how the tool works, here’s a little breakdown. When a prompt is added in the form of a query, the model predicts the sequence of tokens. This could be a single term or a character and can be looked at as building blocks to process data.
Every model assigns tokens a score which is the likelihood that it would be a part of the output. The tool will add more data if needed by modulating the probability of token generation.
The final design of scores for every model’s choice of terms is totaled with the probabilities to give a watermark. So the pattern for scoring varies for watermarked content and those without a watermark. So this helps SynthID detect when and if content is made using AI or if a combo of different sources was used to give the output.
As per Google, it’s been a part of Gemini since the start of Spring. Hence, it won’t compromise quality, accuracy, or even the speed at which text is generated. It also works great for text that’s cropped or paraphrased.
However, the firm even admits that the watermarking approach does come with its a fair share of restrictions. For instance, it doesn’t do a great job when short text is considered. The same is the case for text rewritten from other languages or replies to factual queries.
In terms of replies to actual prompts, there are fewer chances to make adjustments for tokens without impacting the real accuracy. For instance, asking for the capitals of certain cities or content where little variation can be seen will give inaccuracies.
Google is not the only firm working on AI text watermarking. We’ve seen the makers of ChatGPT experiment with different watermarking methods for a while now. There is a delay in the release, thanks to commercial and technical considerations.
As per experts, this kind of tool can assist in removing inaccuracies in content and better detect AI. today, a growing problem is the rise of AI detectors that falsely flag content when they use generic terms. So if this new tech is adopted, it could be a revolution.
Today, we’ve seen the government of China make watermarking tech a mandatory option for all AI content. The same goes for the state of California which is trying to adopt similar tech. And given the urgency of the matter, more nations will follow. As per the EU, predictions from research studies go as far as to claim that 90% of all online content could be synthetically generated by 2026.
Read next: AI’s Limitations in Storytelling: What This Study Reveals About Reader Preferences