Gemini, ChatGPT, DeepSeek: The Biggest AI Data Collectors Revealed

Nothing on the internet is free, whether you are giving money for it or not. If you are not giving money for a product or service online, it means that you are a product and you are selling your information that can be used to track your data for marketing purposes or something else. It is an era of AI tools and models and you would be naive if you assume that AI chatbots do not collect your data, sorry to break it to you but they do, and some do it more than the others. But internet safety is the top priority so people only want to use products and services that collect their least amount of data because we can never escape from data collection or the digital surveillance no matter how hard we try. Most of the AI apps need your birthday, phone numbers, keystrokes and chat histories that you have to give them one way or another. AI chatbots like CoPilot, DeepSeek and others collect your data and in this post we will tell you which AI apps collect the most user data and which ones collect the least information.

Many social networks like Facebook, Instagram TikTok and even its alternatives like RedNote and Lemon8 are pretty data hungry apps so experts always suggest that we use browser versions of them as they cannot take much of our data that way. Similarly, many AI apps like DeepSeek, CoPilot, Gemini, ChatGPT and others are quite data hungry but not as much as social media apps. But they still collect a good amount of our data for different purposes. To find out how much these AI apps are collecting our data, we analyzed the privacy reports of popular AI-powered chatbots and found that Google’s privacy policy page is longer than other platforms.

Gemini, ChatGPT, DeepSeek: The Biggest AI Data Collectors Revealed
Image: DIW-Aigen

Google’s Gemini collects user data the most like their texts, emails, videos, images, contact list, browsing history and search history. On the other hand, Alibaba's Qwen AI (aka QwenLM on web) on the Apple App Store says that it only collects data about app interactions and device ID. But Qwen’s privacy page says something which doesn’t match with the former description. The privacy reports on Google’s Play Store and Apple’s App Store are information given by the companies directly and it doesn’t contain third-party privacy information from other organizations or even app stores themselves. Surprisingly, in our testing when we asked Qwenlm.AI about it privacy policy page/link it provided a 404 page

If someone wants to know which apps have the best privacy policies, they can usually read privacy policy reports on those apps’ websites. The privacy policies of AI apps are different from each other, with OpenAI and DeepSeek having not very long privacy polices while Qwen having some typos in its report. Google’s privacy policy for Gemini is very transparent, descriptive and easy to read and it tells users everything they need to know about their privacy on the app, however, it is too long that average users bother not to read them all. Google says that they use user data to train Gemini models but only in specific scenarios and that they do not share user data with advertisers.

Looking at all the privacy reports of AI apps, every app consumes customer data except for Microsoft’s CoPilot. Its privacy report says that CoPilot uses minimal user data, doesn’t use it to train their AI models and doesn’t give data to advertisers. It is also important to know that Microsoft and OpenAI use the same GPT to run their AI models but ChatGPT consumes more user data then CoPilot. The privacy report of ChatGPT states that it uses user data to train their AI models and also share that data with advertisers. On the other hand, CoPilot meets various privacy standards like HIPAA, FedRAMP and SOC. For a complete comparison take a look at the table below.

TikTok was recently banned in the US because of its data collection policies and other political matters. Similarly, DeepSeek is also banned in China because it collects a lot of user data, just like it is doing with the US users now. Director of Information Security and Engagement at the National Cybersecurity Alliance (NCA), Cliff Steinhauer, says that DeepSeek has gotten quite popular now but its popularity should also make experts look at its data collection policies. It is important for all AI apps, whether Chinese or US based, to protect user privacy according to the international standards.

Now comes the question that if AI apps are using so much of our data, how can we protect ourselves from it? The answer is to simply stop using AI apps on your smartphones and use them locally on your computers. Using decent hardware and appropriate software can help a lot as it will stop AI websites from collecting your data. You can try open-source AIs which can run on macOS, Linux and Windows. You won’t be able to access the same exact AI model you were using on the apps, but computer-based will enable you to use open-source LLM which won’t be able to collect your data as much. Another option is to make your own personalized AI model which will be able to process your data on the device even without an internet connection and will keep your data safe.

AI AssistantData CollectedUses Data to Train AI?Shares Data with Advertisers?
CopilotPrompts, responsesNoNo
GeminiAddresses, contacts, call/message logs, chat transcripts, device info, dialer, feedback, Gemini Apps data, Google Assistant info (smart home, playlists), images, installed apps, IP, location, permissions, language, product usage, screen contextYes (specific cases only)No
ChatGPTAccount details, audio, browser info, contact details, date of birth, device info, files, general location, images, IP, name, payment info, service interactions, text prompts, transaction history, device/computer type, connection typeYesYes
DeepSeekAdvertising data, chat history, crash reports, date of birth, device info, diagnostics, email, feedback, IP, keystroke patterns, OS, password, payment info, performance logs, phone number, prompts, service-related info, system language, text/audio input, uploaded files, usernameYesYes
QwenAudio, browser info, communication logs, computer type, connection type, contact details, content viewed, cookies, country, timestamps, device info, email, feedback, files, images, IP, name, product interactions, text, third-party data, time zone, user agent/versionYesYes

Read next: DeepSeek vs. ChatGPT: 21% Praise DeepSeek’s Performance, 36% Find ChatGPT More Helpful, But Pricing and Censorship Divide Users
Previous Post Next Post