Pro Git

Reading "Pro Git: Everything You Need to Know About Git" by Scott Chacron and Ben Straub. I want to know more about git so that I can use it more effectively, and this free book was recommended on the Git documentation website.

Date Created:
Last Edited:

References / Helpful Links



Chapter 1 - Getting Started


Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. For the examples in this book, you will use software source code as the files being version controlled, but you could use anything.

A Version Control System (VCSs) allows you to revert selected files back to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more.

Centralized Version Control Systems (CVCSs) have a single server that contains all the versioned files, and a number of clients that check out files from that central place. Local VCS and CVCS suffer the same problem - whenever you have the entire history of the project in a single place, you risk losing everything.

Distributed Version Control Systems (DVCSs) (such as Git, Mercurial, or Darcs) don't just check out the latest snapshot of the files; rather, they fully mirror the repository, including its full history.

Distributed Version Control Diagram

Around 2005, Git was created for the maintenance of the Linux kernel. Some of the goals of the system were:

  • Speed
  • Simple design
  • Strong support for non-linear development (thousands of parallel branches)
  • Fully Distributed
  • Able to handle large project files like the Linux kernel effectively (speed and data size)

The major difference between Git and other VCS is the way Git thinks about data. Most other systems think of the information they store as a set of files and changes made to each file over time (this is commonly described as delta-based version control).

Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a series of snapshots of a miniature filesystem. With Git, every time you commit, or save the state of your project, Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. Git thinks about its data more like a stream of snapshots.

VC as Stream of Snapshots

Most operations in Git need only local files and resources to operate. If you want to see the changes between a file and that same file a month ago, Git can look up the file a month ago and do a local difference calculation. Everything in Git is check-summed before it is stored and then referred to by that checksum. This means it's impossible to change the contents of any file or directory without Git knowing about it. The mechanism that Git uses for checksumming is called a SHA-1 hash.

In cryptography, SHA-1 (Secure Hash Algorithm 1) is a hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecimal digits.
SHA-1 Wikipedia
  • Example: 24b9da6552252987aa493b52f8696cd6d3b00373

Git stores everything in its database not by file name but by the hash value of its contents.

When you do actions in Git, nearly all of them only add data to the Git database. It is very difficult to lose things.

Git has three main stages that your files can reside in: modified, staged, and committed.

  • Modified means that you have changed the file but have not committed it to your database yet.
  • Staged means that you have marked a modified file in its current version to go into your next commit snapshot
  • Committed means that the data is safely stored in your local database.

The three main sections of a Git project: the working tree, the staging area, and the Git directory.

Main Stages of Git Project

  • The working tree is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on a disk for you to use or modify.
  • The staging area is a file, generally contained in your Git directory, that stores information about what will go into your next commit. Its technical name in Git parlance is the index.
  • The Git directory is where Git stores the metadata and object database for your project.

The basic Git workflow looks something like:

  1. Modify files in the working tree,
  2. Selectively stage those changes you want to be part of the next commit, which adds only those changes to the staging area.
  3. You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.

If a particular version of a file is in the git directory, it's considered committed.

There are many different ways to use Git. The command line is the only place where you can run all Git commands.

Installing on Linux:

$ sudo dnf install git-all

Installing on Windows

The most official build is available for download on the Git website. Just go to https://git-scm.com/download/win and the download will start itself automatically. To get an automated installation you can use the Git Chocolately package.

First Time Git Setup

After installing git, you'll want to do a few things to customize your Git environment. Git comes with a tool called git config that lets you get and set configuration variables that control all aspects of how Git looks and operates. These variables can be stored in three different places:

  1. [path]/etc/gitconfig file: Contains values applied to every user on the system and all their repositories. If you pass the option --system to git config, it reads and writes from this file specifically. Because this is a system configuration file, you would need administrative or superuser privilege to make changes to it.
  2. ~/.gitconfig or ~/.config/git/config file: Values specific personally to you, the user. You can make Git read and write to this file specifically by passing the --global option, and this affects all repositories you work with on your system.
  3. config file in the Git directory (that is, .git/config) of whatever repository you're currently using: Specific to that single repository. You can force Git to read from and write to this file with the --local option, but that is default. You need to be located somewhere in the Git repository for this option to work properly.

Each level overrides those values in the previous level. View all your settings and where they are coming from using:

$ git config --list --show-origin

Your Identity

The first thing to do when you install Git is to set your user name and email address. Every Git commit uses this information, and it's immutably baked into the commits you start creating. You need to do this only once if you pass the --global option.

$ git config --global user.name "John Doe"
$ git config --global user.email johndoe@example.com

Your Editor

You can now configure the default text editor that will be used when Git needs you type in a message.

$ git config --global core.editor emacs

Your Default Branch Name

You can set the default branch name to something other than master:

$ git config --global init.defaultBranch main

You can check what Git thinks a specific key's value is by typing git config <key>:

$ git config user.name

Getting Help

Get the manual page for any of the Git commands:

$ git help <verb>
$ git <verb> --help
$ man git-<verb>
$ git <verb> -h # Gives an abbreviated, concise manual page


Chapter 2 - Git Basics


You can obtain a Git repository in one of two ways:

  1. Take a local directory that is currently not under version control and turn it into a Git repository
$ cd <directory_not_under_vc>
$ git init
  • This creates a new subdirectory in the directory not under version control named .git that contains all of the necessary repository files - a Git repository skeleton. At this point, nothing in your project is tracked yet.
  • If you want to start version-controlling existing files, you should begin tracking those files and do an initial commit.
$ git add *.c
$ git add LICENSE
$ git commit -m "Initial Project version"
  1. You can clone an existing Git repository from elsewhere

If you want to get a copy of an existing Git repository, the command you need is git clone. With this command, Git receives a fill copy of nearly all data that the server has. Every version of every file for the history of a project is pulled down by default when you run git clone. You clone a repository with git clone <url>

$ git clone https://github.com/libgit2/libgit2
  • The command above creates a directory named libgit2, initializes a .git directory inside it, pulls down the data for that repository, and checks out a working copy of the latest version. If you want to clone the repository into a directory named something other than libgit2, you can specify the new directory as an additional arg:
$ git clone https://github.com/libgit2/libgit2 mylibgit

Recording Changes to the Repository

At this point you should have a working copy of all repository files in front of you. Each file in the working directory can be in one of two states: tracked or untracked. Tracked files are files that were in the last snapshot, as well as any newly staged files; they can be unmodified, modified, or staged. Tracked files are files Git knows about. Untracked files are everything else.

Lifecycle of the Status of Your Files

Checking the Status of Your Files

The main tool you use to determine which files are in which state is the git status command. If you run this command directly after a clone, you should see something like:

$ git status 
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

This means that you have a clean working directory, which means that none of your tracked files are modified. Git also doesn't see any untracked files, or they would be listed here. The command tells you what branch you're on and informs you that it has not diverged from the same branch on the server. Untracked basically means that Git sees a file you didn't have in the previous snapshot (commit), and which hasn't yet been staged.

Tracking New Files

In order to track a new file, you use the command git add

$ git add README

If you feed git add a directory, it adds all files in that directory recursively.

Staging Modified Files

git add is a multipurpose command - you use it to begin tracking new files, to stage files, and to do other things like marking merge-conflicted files as resolved. It may be helpful to think of it more as add precisely this content to the next commit rather than add this file to the project.

Short Status

git status output is pretty comprehensive, it's also quite wordy. Git also has a short status flag so that you can see the changes in a more compact way. git status -s or git status --short gives you a simplified output.

Ignoring Files

Often, you'll have a class of files that you don't want Git to automatically add or even show you as being untracked. In such cases, you can create a file listing patterns to match the files to ignore named .gitignore.Setting up a ,gitignore file for your new repository before you get going is generally a good idea so you don't accidentally commit files that you don't want in your Git repository. Here are the rules of the .gitignore file:

  • Blank lines or lines starting with # are ignored
  • Standard glob patterns work, and will be applied recursively throughout the entire working tree
  • You can start patterns with a forward slash / to avoid recursively
  • You can end patterns with a forward slash / to specify a directory
  • You can negate a pattern by starting with an exclamation point !

Glob patterns are like simplified regular expressions that shells use.

  • An * matches zero or more characters
  • [abc] matches any character inside the brackets
  • A ? matches a single character
  • Brackets enclosed by a hyphen [0-9] matches any characters between them
  • Two asterisks match nested directories: a/**/z would match a/z, a/b/z, a/b/c/z and so on

Viewing Staged and Unstaged Changes

The git diff command tells you exactly what you changed, not just what files changed. git diff shows you the exact lines added and removed - the patch, as it were. The command compares what is in your working directory and what is in your staging area. git diff only shows you unstaged changes.

Committing Your Changes

The simplest way to commit is git commit. Doing so launches the editor or choice. You can type your commit message inline with the commit command by specifying it after a -m flag:

$ git commit -m "Story 182: fix benchmarks for speed"

Every time you perform a commit, you're recording a snapshot of your project that you can vert to or compare to later.

Skipping the Staging Area

Adding the -a option to the git commit command makes Git automatically stage every file that is already tracked before doing the commit, letting you skip the git add part:

$ git commit -a -m "Adding new benchmarks"

Removing Files

To remove a file from Git, you have to remove it from your tracked files and the commit. The git rm command does that, and also removes the file from your working directory so you don't see it as an untracked file the next time around.

$ git rm PROJECTS.md

The next time you commit, the file will be gone and no longer tracked. If you modified the file or had already added it to the staging area, you must force the removal with the -f option. Another useful thing you may want to do is to keep the file in your working tree but remove it from your staging area. In other words, you may want to keep the file on your hard drive but not have Git track it anymore. This is particularly useful if you format to add something to your .gitnore file and accidentally staged it:

$ git rm --cached README
$ git rm log/\*.log # you can also pass glob patterns

Moving Files

Git doesn't explicitly track file movement. If you rename a file in Git, no metadata is stored in Git that tells it you renamed the file. Git has a git mv command. If you want to rename a file in Git, you can run something like:

$ git mv file_from file_to

The above is better than just moving the file with the mv command.

Viewing the Commit History

The most basic and powerful tool to view existing commit history is the git log command. git log lists the commits made in that repository in reverse chronological order; that is, the most recent commits show up first. The command lists each commit with its SHA-1 checksum, the author's name and email, the date written, and the commit message. There a huge variety of options for this command:

  • -p or --path: Shows the difference introduced in each commit
  • You can limit the number of entries shown with the -<int> option
  • You can use the --stat option for abbreviated version of --patch
  • --pretty: changes the log output to formats other than the default
    • --pretty=oneline: Prints each commit on a single line
    • $ git log --pretty=format:"%h - %an, %ar : %s": You can specify format explicitly

Useful Specifiers for git log --pretty=format

    • The oneline and format options have an option --graph that adds a nice ASCII graph showing your branch and merge history.

Limiting Log Output

git log takes a number of useful limiting options - options that let you show only a subset of commits. Time limiting options like --since and --until are very useful, e.g., git log --since=2.weeks

The -S option (colloquially known as Git's "pickaxe" option) takes a string and only shows those commits that changed the number of occurrences of that string. If you specify a directory or file name, you can limit the log output to commits that introduced a change to those files:

$ git log -- path/to/file

Undoing Things

Undoing things is one of the few areas in Git where you may lose some work if you do it wrong. If you want to redo a commit and include the changes you forgot, stage them, and commit again using the --ammend option: $ git commit --ammend. This command takes the staging area and uses it for the commit. The same commit-message editor fires up, but it already contains the message of your previous commit. You end up with a single commit - the second commit replaces the results of the first.

Unstaging a Staged File

Use the git reset HEAD <file> command to unstage.

Unmodifying a Modified File

The git checkout -- <file> command reverts the file to the previous commit. This is a dangerous command. Any local changes you made to that file are gone - Git just replaced that file with the last staged or committed version. Use Caution with this command.

Undoing things with git restore

git restore is an alternative to git reset. git restore --staged <file> to remove a file from the staging area. git restore <file> to discard the changes you have made. This is a dangerous command like checkout.

Working with Remotes

Remote repositories are versions of your project that are hosted on the Internet or network somewhere. You can have several of them, each of which generally is either read-only or read/write for you. Collaborating with others involves managing these remote repositories and pushing and pulling data to and from them when you need to share work.

Showing Your Remotes

To see which remote servers you have configured, you can run the git remote command. It lists the the short names of each remote handle you've specified. If you cloned your repository, you should at least see origin - that is the default name Git gives to the server you cloned from. The -v option shows the URLs that Git has stored for the shortname to be used when when reading and writing to that remote.

Adding Remote Repositories

To add a new remote Git repository as a shortname you can reference easily, run:

$ git remote add <shortname> <url>

Fetching and Pulling form Your Remotes

To get data from your remote projects:

$ git fetch <remote>

The command goes out to that remote project and pulls down all the data from that remote project that you don't have yet. git fetch origin fetches any new work that has been pushed to that server since you cloned it. Note that this command only downloads data - it doesn't automatically merge it with any of your work or modify what you're currently working on.

If your current branch is set up to track a remote branch, you can use the git pull command to automatically fetch and them merge that remote branch into your current branch. Running git pull generally fetches data from the server you originally cloned from and automatically tries to merge it into code you're currently working on.

Pushing to Your Remotes

When you have your project at a point you want to share, you have to push it upstream. the command for this is simple:

$ git push <remote> <branch>
This command works only if you cloned from a server to which you have write access and if nobody has pushed in the meantime. If you and someone else clone at the same time and they push upstream and then you push upstream, your push will rightly be rejected. You’ll have to fetch their work first and incorporate it into yours before you’ll be allowed to push.

Inspecting a Remote

If you want more information about a particular remote:

$ git remote show origin

It lists the URL for the remote repository as well as the tracking branch information.

Renaming and Removing Remotes

git remote rename to change a remote's shortname.

$ git remote rename pb paul

If you want to remove a remote:

$ git remote remove paul

Tagging

Git has the ability to tag specific points in a repository's history as being important. Typically, people use this functionality to mark release points (v1.0, v2.0 and so on).

$ git tag [-l] [--list] # Lists the existing tags
v1.0
v2.0
$ git tag -l "v1.8.5*" # List for tags that match a particular pattern
v1.8.5
v1.8.5-rc0
v1.8.5-rc1
v1.8.5-rc2
v1.8.5-rc3
v1.8.5.1
v1.8.5.2
v1.8.5.3
v1.8.5.4
v1.8.5.5
$ # Creaing an annotated Tag:
$ git tag -a v1.4 -m "My version 1.4 - Big Changes to Code"
$ git show v1.4 #See the tag data along with the commit that was tagged
$ git tag -a v1.2 9fceb02 # Creates a tag later for the commit with the given checksum
$ # git push doesn't transfer tags to remote servers
$ # you will have to explicitly push tags
$ git push origin v1.5 # You have to explicitly push tags after creating them
$ git tag -d b1.4-lw # Deletes a tag from local repo
$ git push origin :refs/tags/v1.4-lw # Deletes a tag from remote server
$ git push origin --delete <tagname> # Also deletes a tag from remote server
$ git checkout v2.0.0 # Checksout a tag

Git Aliases

$ git config --global alias.co checkout
$ git config --global alias.br branch
$ git config --global alias.ci commit # Instead of typing `git commit`, you only have to type `git ci`
$ git config --global alias.st status


Chapter 3 - Git Branching


Branching means you diverge from the main line of development and continue to do work without messing with that main line. Git encourages workflows that branch and merge often, even multiple times in a day.

Branches in a Nutshell

Git stores data as a series of snapshots. When you make a commit, Git stores a commit pointer that contains a pointer to the snapshot of the content you changed. This object also contains the author's name, email address, the commit message, and pointers to the commit or commits that came directly before this commit.

Staging files computes a checksum for each one. (Git refers to stored versions of files as blobs). When you create a commit with git commit, Git checksums each subdirectory and stores them as a tree object in the Git repository. Git then creates a commit object that has the metadata and a pointer to the root project tree so it can re-create that snapshot when needed.

A Commit and Its Tree

The next commit stores a pointer to the commit that came immediately before it.

Commit and Their Parents

A branch in Git is simply a lightweight pointer to one of these commits. The default branch name in Git is master. As you start making commits, you're given a master branch that points to the last commit you made. Every time you commit, the master branch pointer moves forward automatically.

Branch and Commit History

Creating a New Branch

Creating a new branch creates a new pointer for you to move around.

$ git branch testing # Creates a new branch called testing

Two Branches Pointing to the Same Series of Commits

How does Git know what branch you're currently on? It keeps a special pointer called HEAD. In Git, HEAD is a pointer to the local branch you're currently on. The git branch command only creates a new branch - it didn't switch to the new branch.

HEAD Pointing to Branch

git log --decorate shows you where the branch pointers are pointing.

Switching Branches

$ git checkout testing # Switches to the testing branch

Switching Branches

The significance of switching branches can be seen after you make a commit:

$ # Made some changes ...
$ git commit -a -m "Made a change"

HEAD moves forward with commit

$ git checkout master

This command moves the HEAD pointer back to point to the master branch, and it reverted the files in your working directory back to the snapshot that master points to. After making some changes to master and committing:

Divergent History

Because a branch in Git is actually a simple file that contains the 40 character SHA-1 checksum of the commit it points to, branches are cheap to create and destroy. Creating a new branch is as quick and simple as writing 41 bytes to a file.

Basic Branching and Merging

  • The below command is equivalent to creating branch iss53 and checking out that branch.
git checkout -b iss53

Note that if your working directory or staging area has uncommitted changes that conflict with the branch you're checking out, Git won't let you switch branches. It's best to have a clean working state when you switch branches.

The git merge command merges a branch into the current branch.

$ git checkout master # Was originally in hotfix
$ git merge hotfix # updates the master branch to point to the current state of hotfix

The term fast-froward in a merge means moving the pointer forward because there is no divergent work to merge together. Merging hotfix with master below (using the command above) would be an example of fast-forward.

Git Branch Image

$ git branch -d hotfix  # Deletes the hotfix branch
Deleted branch hotfix (3a0874c).

Branch Management

If you run git branch with no arguments, you get a simple listing of your current branches. The * character prefixes the branch you currently have checked out. git branch -v shows you the last commit of every branch.

Renaming a Branch

$ git branch --move bad-branch-name corrected-branch-name # Renames the branch, this is only local
$ git push --set-upstream origin corrected-branch-name # Pushes the changes to the server
Changing the name of a branch like master/main/mainline/default will break the integrations, services, helper utilities and build/release scripts that your repository uses.

A topic branch is a short-lived branch that you create and use for a single particular feature or related work.

Remote references are references (pointers) in your remote repositories, including branches, tags, and so on. You can get a full list of remote references explicitly with git ls-remote <remote> or git remote show <remote> for remote branches as well as more information. Remote-tracking branches are references to the state of remote branches. They're local reference that you can't move; Git moves them for you whenever you do any network communication, to make sure they accurately represent the state of the remote repository. Think of them as bookmarks, to remind you where the branches in your remote repositories were the last time you connected to them. git fetch <remote> can be used to update your local repository with updated data from remote repository (think: you and someone else are working on the same branch of a repo and they push a commit while you are still working from the place of the last commit).

While the git fetch command will fetch all the changes on the server that you don't have yet, it will not modify your working directory at all. It will simply get the data for you and let you merge it yourself. However, there is a command called git pull which is essentially a git fetch immediately followed by a git merge in most cases.

Rebasing

With the rebase command, you can take all the changed that were committed on one branch and replay them on a different branch.

This operation works by going to the common ancestor of the two branches (the one you’re on and the one you’re rebasing onto), getting the diff introduced by each commit of the branch you’re on, saving those diffs to temporary files, resetting the current branch to the same commit as the branch you are rebasing onto, and finally applying each change in turn.

Do not rebase commits that exist outside your repository and that people may have based work on.


Chapter 4 doesn't apply to me. It talks about different options for setting up your own Git server. I use GitHub, and I don't see that changing.


Chapter 5 - Distributed Git


Chapter talks about how to contribute code successfully to a project and make it as easy on you and the project maintainer as possible and how to maintain a project successfully with a number of developers contributing.

Distributed Workflows

Centralized Systems

In centralized systems, there is generally a single collaboration model - the centralized workflow. One central hub, or repository, can accept code, and everyone synchronizes their work with it. A number of developers are nodes - consumers of that hub - and synchronize with that centralized location.

Centralized Workflow

This means that if two developers clone from the hub and both make changes, the first developer to push their changes back up can do so with no problems. The second developer must merge the first one's work before pushing changes up, so as to not overwrite the first developer's changes.

Integration-Manager Workflow

It's possible to have a workflow where every developer has write access to their own public repository and read access to everyone else's. This scenario often includes a canonical repository that represents the "official" project. To contribute to that project, you create your own public clone of the project and push your changes to it. Then, you can send a request to the maintainer of the main project to pull in your changes. The maintainer can then add your repository as a remote, test your changes locally, merge them into their branch, and push back to their repository:

  1. The project maintainer pushes to their public repository.
  2. A contributor clones that repository and makes changes.
  3. The contributor pushes to their own public copy.
  4. The contributor sends the maintainer an email asking them to pull changes.
  5. The maintainer adds the contributor's repository as a remote and merges locally.
  6. The maintainer merged changes to the main repository.

Integration Manager Workflow

General guidelines on submitting commits. Some tips on commits:

  • Try to avoid whitespace errors in commit messages
  • Try to make each commit a logically separate changeset. If you can, try to make your changes digestible.
  • Your commit messages should be no more than about 50 characters and describe the changeset precisely, followed by a blank line, with a more detailed explanation.
In software development, the main difference between a branch and a fork is that a branch is isolated within the same repository, while a fork is a separate copy of the repository.

Forking a repository involves creating your own copy of it to push your commits to. You can eventually merge a fork back into the main project if you choose.

Because Git doesn't have monotonically increasing numbers like 'v123' or the equivalent to go with each commit, if you want to have a human readable name to go with a commit, you can run git describe on that commit. In response, Git generates a string consisting of the name of the most recent tag earlier than that commit, followed by the number of commits since that tag, followed by a partial SHA-1 value of the commit being described.


Chapter 6 - GitHub


GitHub is the single larges host for git repositories, and is the central point of collaboration for millions of developers and projects.

If you want to contribute to an existing project that you don't have push access, you can fork the project. When you fork a project, GitHub will make a copy of the project that is entirely yours; it lives in your namespace, and you can push to it. People can form a project, push to it, and contribute their changes back to the original repository by creating what's called a Pull request. This opens up a discussion thread code review, and the owner and the contributor can then communicate about the change until the owner is happy with it, at which point the project can merge it in. To fork a project, click the Fork button.

Fork Button

The GitHub Flow

  1. Fork the project
  2. Create a topic branch from master
  3. Make some commits to improve the project
  4. Push this branch to your GitHub project
  5. Open a Pull Request on GitHub
  6. Discuss, and optionally continue committing
  7. The project owner merges or closes the Pull Request
  8. Sync the updated master back to your fork.

This is the basic Integration-Manager Workflow converged in Distributed Git. You can make a pull request by clicking the Compare & Pull Request green button on GitHub. You then give your Pull request a title and description. When you create the pull request, the owner of the project you forked will get a notification that someone is suggesting a change and will link to a page that has all of this information on it. The owner of the project can look at the suggested change and merge it, reject it, or comment on it. Anyone can also leave general comments on the Pull Request.

Most GitHub projects think about Pull Request branches as iterative conversations around a proposed change, culminating in a unified diff that is applied by merging.

If your Pull request does not merge cleanly, you can either rebase your branch on top of whatever the target branch is or your can merge the target branch into your branch. Most developers on GitHub will choose to do the latter.

GitHub Flavored Markdown

### Task Lists

- [x] Write the Code
- [] Write all the tests

### Code Snippets

```java
for (int i=0 ; i < 5 ; i++)
{
System.out.println("i is : " + i);
}
```
### Quoting

> Whether 'tis Nobler in the mind to suffer
> The Slings and Arrows of outrageous Fortune,

### Emoji

I :eyes: that :bug: and I :cold_sweat:. :trophy: for :microscope: it.

### Images

![git](https://www.domain.com/image.png)

GitHub will include the README file on the landing page of the project. The CONTRIBUTING file is another special file.


Chapter 10 - Git Internals


Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it. When you run git init in a new or existing directory, Git creates the .git directory, which is where almost everything that Git stores and manipulates is located. If you want to back up or clone your repository, copying this single directory elsewhere gives you nearly everything you need. What a newly created .git directory typically looks like:

$ ls -F1
config # contains project-specific configuration options
description
HEAD # points to the branch you currently have checked out
hooks/ # contains your client-side and server-side hook scripts
info/ # keeps global exclude file for ignored patterns
objects/ # stores all the content for your database
refs/ # stroes pointers into cimmit objects in that data (branches, tags, and more)

Git Objects

Git is a content-addressable filesystem: it means that at the core of Git is a simple key-value data store. What this means is that you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve the content.

$ find .git/objects -type f
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

This is how Git stores content initially - a single file per piece of content, named with a SHA-1 checksum of the content and its header. The subdirectory is named with the first two characters of the SHA-1, and the filename is the remaining 38 characters. The object type is called a blob.

The tree object solves the problem of storing the filename and also allows you to store a group of files together. Git stores content in a manner similar to a UNIX filesystem, but a bit simplified. All the content is stored as tree and blob objects, with trees corresponding to UNIX directory entries and blobs corresponding more or less to inodes or file contents.

$ git cat-file -p master^{tree}
100644 blob a906cb2a4a904a152e80877d4088654daad0c859 README
100644 blob 8f94139338f9404f26296befa88755fc2598c289 Rakefile
040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0 lib


Comments

You must be logged in to post a comment!

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


Insert Chart

ESC

View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language