Blog Post on Google Data Collection

https://web.archive.org/web/20150211072349/http://moz.com/blog/the-evil-side-of-google-exploring-googles-user-data-collection: Blog post explaining google data collection.

Date Created:
1 38

References



Notes


Note: Written in 2008

Google started as a company that manipulated publicly available data better than its competition. When the cost of doing a software business decreased, Google has to turn to get access to private data in order to maintain their competitive advantage.

The most cost effective way of doing this for the engines is by collecting data from the users that already use their services. Google has been increasingly serving its users by using their personal data to manipulate public data in individualized ways. These methods are impossible to copy without the necessary personal data.

Methods Google Uses to Get Data

  • Click Tracking
    • Google logs all the navigational clicks (ads, actions, feature clicks, etc.) of all of its users on all of its services
  • Forms
    • Along with the data the user enters directly into the forms (username, password, etc.), Google logs the time and date and location of submission
  • Cookies
    • Google uses cookies on all of its web properties. Additionally, it leaves advertising (Doubleclick) cookies to track users' movement around the web. By doing this, Google can track individual users on any page that has either DoubleClick of Adsense ads.
  • Server Requests Stored in Log Files
    • Every request made to any of Google's server is stored in log files
  • JavaScript
    • Google has small amounts of JavaScript embedded in websites all over the internet. When a user's browser executes the script in the background, Google is able to tell a lot of important information on a person's browsing habits
  • Web Beacons
    • Google embeds small (1 pixel by 1 pixel) transparent .gifs into many of its checkout screens. Just like the JavaScript, a user downloads the invisible image and sends information about their computer to Google.
  • Store
    • Google uses an internal database called BigTable spread over approximately one million servers
  • Massive Data Analytics
  • Permanent Backup
    • The final resting place for data at Google is likely in permanent storage. Google's privacy policies hint that some user data can never be completely deleted because of permanent backups.

Google (Normal Search)

  • Search Engine Result Pages
  • Country code domain
  • Query
  • IP address
  • Language
  • Number of results
  • Safe search
  • Additional preferences can include:
    • Street Address
    • City
    • State
    • Zip/postal code
  • Server log
    • Query
    • URL
    • IP address
    • Cookie
    • Browser
    • Date
    • Time
  • Clicks

Google Personalized Search

  • Logs every website visited as a result of a Google search.
  • Content analysis of visited websites

Google Account

  • Used as resource to compile information on individual users
  • Sign up
    • Sign up date
    • Username
    • Password
    • Alternate e-mail
    • Location (country)
  • Personal picture
  • Usage
    • Friends
    • Google Services usage
    • Amount of logins

Toolbar

  • All websites visited
  • Unique application number
  • Sends all visited 404s to Google
  • Toolbar synchronization function
    • Stores autofill info with Google account
    • Sends structure of web forms to Google
  • Safe browsing
    • Stores response to security warnings
  • Stores autofill forms data
  • Spellcheck sends data to Google servers

Web History

  • Every website visited from Google SERP
  • Date
  • Time
  • Search query
  • Ads clicked
  • Which service

Translate

  • All text sent to Google servers

Google Finance

  • Stock portfolio
    • User’s stocks
    • Amount of shares
    • Date/time bought
    • Bought at price

Google Checkout

  • Buyers
    • Full legal name
    • Credit card number
    • Debit card number
    • Card expiration date
    • Card Verification Number (CVN)
    • Billing address
    • Phone number
    • E-mail address
  • Sellers
    • Bank account number
  • Personal address
  • Business category
    • Government-issued identification number
      • Social Security Number
      • Taxpayer Identification Number
    • Sales Volume
  • Transaction volume
  • Business information from Dun & Bradstreet
  • Transactions
    • Amount
    • Description of product
    • Name of seller
    • Name of buyer
    • Type of payment used
  • User trend data
    • Web Beacons
  • Referrer data

YouTube

  • YouTube SERP data
  • Registered user data
    • Videos uploaded
    • Comments posted
    • Videos flagged
    • Subscriptions
      • Channels
      • Groups
      • Favorites
    • Contacts
    • All videos watched
    • Frequency of data transfers
    • Size of data transfers
    • Click location data
    • Information display data
  • E-mail
    • Web Beacons for tracking
      • E-mail opened or discarded
  • Account basics
    • E-mail
    • Password
    • Username
    • Location (country)
    • Postal code
    • Birthdate
    • Gender

Gmail

  • Stores, processes, and maintains all messages
  • Account activity
    • Storage usage
    • Number of log-ins
  • Data displayed
  • Links clicked
  • Stores all e-mails
  • Contact lists
  • Spam trends
    • Gchat
      • All conversations and who they involve.
      • When service is used
      • Size of contact list
      • Contacts communicated with
  • Frequency of data transfers
  • Size of data transfers
  • Clicks

Calendar

  • Name
  • Default language
  • Time zone
  • Usage statistics
    • How long the service is used for
    • Frequency of data transfers
    • Size of data transfers
    • Number of events
    • Number of calendars
    • Clicks
    • Deletes every 90 days
  • All events
    • Who is going
    • Who was invited
    • Comments
    • Descriptions
    • Date
    • Time

Desktop

  • Indexes and stores
    • Versions of your files
    • Computer activity
      • E-mails
      • Chats
      • Web history
  • Mixed with web search results
  • Content analysis of data on computer for integration into SERPs (opt-in)
  • Unique application number
  • Application interacts with Google’s servers
  • Number of searches and response times

Goog 411

  • Phone number
  • Time of call
  • Duration of call
  • Options selected
  • Phone number used as identifier
  • Records all voice commands

iGoogle

  • Settings stored in Cookies
  • Settings linked to Google Account

Blogger

  • User photo
  • Birth date
  • Location
  • Frequency of data transfers
  • Size of data transfers
  • Clicks
  • Blogger Mobile
    • Phone number
    • Associates with Google Account
    • Device identifiers
    • Hardware Identifiers

Google Docs

  • E-mail address
  • Number of logins
  • Actions taken
  • Storage usage
  • Clicks
  • All collaborators
  • All text
  • All images
  • All changes (previous versions)

Groups

  • E-mail password
  • Contents of posts
  • Contents of custom pages
  • Contents of external files
  • Account activity
    • Groups joined
    • Groups managed
    • List of members
    • List of invitees
    • Ratings made
    • Preferred settings

Orkut

  • Name
  • Gender
  • Age
  • Location
  • Occupation
  • Religion
  • Friend graph
  • Hobbies
  • Interests
  • Photos
  • Invites
  • Messages
  • Orkut Mobile
    • Phone number
    • Wireless carrier
    • Content of message
    • Date
    • Time
  • Everything a user writes
  • Every blog post a user reads

Picasa

  • Friend graph
  • Favorite lists
  • Clicks (almost all Google services track all clicks)
  • All photos
  • Geotags (Exif data)
  • People who subscribe to albums

Mobile

  • Phone number
  • Device type
  • Request type
  • Carrier
  • Carrier user ID
  • Content of request
  • Maps for mobile
    • Location information (GPS)
    • Address
  • Websites visited if user asks Google to transcode
  • Voice commands

Web Accelerator

  • Web requests
  • Cache of websites before you go to them

Double Click/AdWords

  • Ads clicked
  • Age
  • Sex
  • Location
  • Trends of past visited websites
  • IP address

Health

  • Medial records
    • Doctors
    • Conditions
    • Prescriptions
    • Age
    • Sex
    • Race
    • Blood type
    • Weight
    • Height
    • Allergies
    • Procedures
    • Test results
    • Immunizations

Postini

  • E-mail address
  • Traffic patterns
  • Clicks

GrandCentral

  • Credit card
  • Credit card expiration date
  • Credit card verification number
  • Billing address
  • Stores, process and maintains
    • Voicemail messages
    • Recorded conversations
    • Contact lists
  • Storage usage
  • Number of log ins
  • Data displayed
  • Clicks
  • Telephony log information
    • Calling-party phone number
    • Forwarding numbers
    • Time of calls
    • Date of calls
    • Duration of calls
    • Types of calls

Google Merchant Search

  • Name
  • Contact information
    • E-mail address
    • Phone number

Notebook

  • Stores, processes and maintains
    • All content in notebook
    • Nickname
    • Storage usage
    • Number of log-ins

You can read more about how comments are sorted in this blog post.

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Files

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Upload Previous Version of Editor State

ESC