Blog Post on Google Data Collection
https://web.archive.org/web/20150211072349/http://moz.com/blog/the-evil-side-of-google-exploring-googles-user-data-collection: Blog post explaining google data collection.
References
Notes
Note: Written in 2008
Google started as a company that manipulated publicly available data better than its competition. When the cost of doing a software business decreased, Google has to turn to get access to private data in order to maintain their competitive advantage.
The most cost effective way of doing this for the engines is by collecting data from the users that already use their services. Google has been increasingly serving its users by using their personal data to manipulate public data in individualized ways. These methods are impossible to copy without the necessary personal data.
Methods Google Uses to Get Data
- Click Tracking
- Google logs all the navigational clicks (ads, actions, feature clicks, etc.) of all of its users on all of its services
- Forms
- Along with the data the user enters directly into the forms (username, password, etc.), Google logs the time and date and location of submission
- Cookies
- Google uses cookies on all of its web properties. Additionally, it leaves advertising (Doubleclick) cookies to track users' movement around the web. By doing this, Google can track individual users on any page that has either DoubleClick of Adsense ads.
- Server Requests Stored in Log Files
- Every request made to any of Google's server is stored in log files
- JavaScript
- Google has small amounts of JavaScript embedded in websites all over the internet. When a user's browser executes the script in the background, Google is able to tell a lot of important information on a person's browsing habits
- Web Beacons
- Google embeds small (1 pixel by 1 pixel) transparent .gifs into many of its checkout screens. Just like the JavaScript, a user downloads the invisible image and sends information about their computer to Google.
- Store
- Google uses an internal database called BigTable spread over approximately one million servers
- Massive Data Analytics
- Permanent Backup
- The final resting place for data at Google is likely in permanent storage. Google's privacy policies hint that some user data can never be completely deleted because of permanent backups.
Google (Normal Search)
- Search Engine Result Pages
- Country code domain
- Query
- IP address
- Language
- Number of results
- Safe search
- Additional preferences can include:
- Street Address
- City
- State
- Zip/postal code
- Server log
- Query
- URL
- IP address
- Cookie
- Browser
- Date
- Time
- Clicks
Google Personalized Search
- Logs every website visited as a result of a Google search.
- Content analysis of visited websites
Google Account
- Used as resource to compile information on individual users
- Sign up
- Sign up date
- Username
- Password
- Alternate e-mail
- Location (country)
- Personal picture
- Usage
- Friends
- Google Services usage
- Amount of logins
Toolbar
- All websites visited
- Unique application number
- Sends all visited 404s to Google
- Toolbar synchronization function
- Stores autofill info with Google account
- Sends structure of web forms to Google
- Safe browsing
- Stores response to security warnings
- Stores autofill forms data
- Spellcheck sends data to Google servers
Web History
- Every website visited from Google SERP
- Date
- Time
- Search query
- Ads clicked
- Which service
Translate
- All text sent to Google servers
Google Finance
- Stock portfolio
- User’s stocks
- Amount of shares
- Date/time bought
- Bought at price
Google Checkout
- Buyers
- Full legal name
- Credit card number
- Debit card number
- Card expiration date
- Card Verification Number (CVN)
- Billing address
- Phone number
- E-mail address
- Sellers
- Bank account number
- Personal address
- Business category
- Government-issued identification number
- Social Security Number
- Taxpayer Identification Number
- Sales Volume
- Transaction volume
- Business information from Dun & Bradstreet
- Transactions
- Amount
- Description of product
- Name of seller
- Name of buyer
- Type of payment used
- User trend data
- Web Beacons
- Referrer data
YouTube
- YouTube SERP data
- Registered user data
- Videos uploaded
- Comments posted
- Videos flagged
- Subscriptions
- Channels
- Groups
- Favorites
- Contacts
- All videos watched
- Frequency of data transfers
- Size of data transfers
- Click location data
- Information display data
- Web Beacons for tracking
- E-mail opened or discarded
- Account basics
- Password
- Username
- Location (country)
- Postal code
- Birthdate
- Gender
Gmail
- Stores, processes, and maintains all messages
- Account activity
- Storage usage
- Number of log-ins
- Data displayed
- Links clicked
- Stores all e-mails
- Contact lists
- Spam trends
- Gchat
- All conversations and who they involve.
- When service is used
- Size of contact list
- Contacts communicated with
- Frequency of data transfers
- Size of data transfers
- Clicks
Calendar
- Name
- Default language
- Time zone
- Usage statistics
- How long the service is used for
- Frequency of data transfers
- Size of data transfers
- Number of events
- Number of calendars
- Clicks
- Deletes every 90 days
- All events
- Who is going
- Who was invited
- Comments
- Descriptions
- Date
- Time
Desktop
- Indexes and stores
- Versions of your files
- Computer activity
- E-mails
- Chats
- Web history
- Mixed with web search results
- Content analysis of data on computer for integration into SERPs (opt-in)
- Unique application number
- Application interacts with Google’s servers
- Number of searches and response times
Goog 411
- Phone number
- Time of call
- Duration of call
- Options selected
- Phone number used as identifier
- Records all voice commands
iGoogle
- Settings stored in Cookies
- Settings linked to Google Account
Blogger
- User photo
- Birth date
- Location
- Frequency of data transfers
- Size of data transfers
- Clicks
- Blogger Mobile
- Phone number
- Associates with Google Account
- Device identifiers
- Hardware Identifiers
Google Docs
- E-mail address
- Number of logins
- Actions taken
- Storage usage
- Clicks
- All collaborators
- All text
- All images
- All changes (previous versions)
Groups
- E-mail password
- Contents of posts
- Contents of custom pages
- Contents of external files
- Account activity
- Groups joined
- Groups managed
- List of members
- List of invitees
- Ratings made
- Preferred settings
Orkut
- Name
- Gender
- Age
- Location
- Occupation
- Religion
- Friend graph
- Hobbies
- Interests
- Photos
- Invites
- Messages
- Orkut Mobile
- Phone number
- Wireless carrier
- Content of message
- Date
- Time
- Everything a user writes
- Every blog post a user reads
Picasa
- Friend graph
- Favorite lists
- Clicks (almost all Google services track all clicks)
- All photos
- Geotags (Exif data)
- People who subscribe to albums
Mobile
- Phone number
- Device type
- Request type
- Carrier
- Carrier user ID
- Content of request
- Maps for mobile
- Location information (GPS)
- Address
- Websites visited if user asks Google to transcode
- Voice commands
Web Accelerator
- Web requests
- Cache of websites before you go to them
Double Click/AdWords
- Ads clicked
- Age
- Sex
- Location
- Trends of past visited websites
- IP address
Health
- Medial records
- Doctors
- Conditions
- Prescriptions
- Age
- Sex
- Race
- Blood type
- Weight
- Height
- Allergies
- Procedures
- Test results
- Immunizations
Postini
- E-mail address
- Traffic patterns
- Clicks
GrandCentral
- Credit card
- Credit card expiration date
- Credit card verification number
- Billing address
- Stores, process and maintains
- Voicemail messages
- Recorded conversations
- Contact lists
- Storage usage
- Number of log ins
- Data displayed
- Clicks
- Telephony log information
- Calling-party phone number
- Forwarding numbers
- Time of calls
- Date of calls
- Duration of calls
- Types of calls
Google Merchant Search
- Name
- Contact information
- E-mail address
- Phone number
Notebook
- Stores, processes and maintains
- All content in notebook
- Nickname
- Storage usage
- Number of log-ins
Comments
You can read more about how comments are sorted in this blog post.
User Comments
There are currently no comments for this article.