// Package index implements encoding and decoding of index format files. // // Git index format // ================ // // == The Git index file has the following format // // All binary numbers are in network byte order. Version 2 is described // here unless stated otherwise. // // - A 12-byte header consisting of // // 4-byte signature: // The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache") // // 4-byte version number: // The current supported versions are 2, 3 and 4. // // 32-bit number of index entries. // // - A number of sorted index entries (see below). // // - Extensions // // Extensions are identified by signature. Optional extensions can // be ignored if Git does not understand them. // // Git currently supports cached tree and resolve undo extensions. // // 4-byte extension signature. If the first byte is 'A'..'Z' the // extension is optional and can be ignored. // // 32-bit size of the extension // // Extension data // // - 160-bit SHA-1 over the content of the index file before this // checksum. // // == Index entry // // Index entries are sorted in ascending order on the name field, // interpreted as a string of unsigned bytes (i.e. memcmp() order, no // localization, no special casing of directory separator '/'). Entries // with the same name are sorted by their stage field. // // 32-bit ctime seconds, the last time a file's metadata changed // this is stat(2) data // // 32-bit ctime nanosecond fractions // this is stat(2) data // // 32-bit mtime seconds, the last time a file's data changed // this is stat(2) data // // 32-bit mtime nanosecond fractions // this is stat(2) data // // 32-bit dev // this is stat(2) data // // 32-bit ino // this is stat(2) data // // 32-bit mode, split into (high to low bits) // // 4-bit object type // valid values in binary are 1000 (regular file), 1010 (symbolic link) // and 1110 (gitlink) // // 3-bit unused // // 9-bit unix permission. Only 0755 and 0644 are valid for regular files. // Symbolic links and gitlinks have value 0 in this field. // // 32-bit uid // this is stat(2) data // // 32-bit gid // this is stat(2) data // // 32-bit file size // This is the on-disk size from stat(2), truncated to 32-bit. // // 160-bit SHA-1 for the represented object // // A 16-bit 'flags' field split into (high to low bits) // // 1-bit assume-valid flag // // 1-bit extended flag (must be zero in version 2) // // 2-bit stage (during merge) // // 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF // is stored in this field. // // (Version 3 or later) A 16-bit field, only applicable if the // "extended flag" above is 1, split into (high to low bits). // // 1-bit reserved for future // // 1-bit skip-worktree flag (used by sparse checkout) // // 1-bit intent-to-add flag (used by "git add -N") // // 13-bit unused, must be zero // // Entry path name (variable length) relative to top level directory // (without leading slash). '/' is used as path separator. The special // path components ".", ".." and ".git" (without quotes) are disallowed. // Trailing slash is also disallowed. // // The exact encoding is undefined, but the '.' and '/' characters // are encoded in 7-bit ASCII and the encoding cannot contain a NUL // byte (iow, this is a UNIX pathname). // // (Version 4) In version 4, the entry path name is prefix-compressed // relative to the path name for the previous entry (the very first // entry is encoded as if the path name for the previous entry is an // empty string). At the beginning of an entry, an integer N in the // variable width encoding (the same encoding as the offset is encoded // for OFS_DELTA pack entries; see pack-format.txt) is stored, followed // by a NUL-terminated string S. Removing N bytes from the end of the // path name for the previous entry, and replacing it with the string S // yields the path name for this entry. // // 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes // while keeping the name NUL-terminated. // // (Version 4) In version 4, the padding after the pathname does not // exist. // // Interpretation of index entries in split index mode is completely // different. See below for details. // // == Extensions // // === Cached tree // // Cached tree extension contains pre-computed hashes for trees that can // be derived from the index. It helps speed up tree object generation // from index for a new commit. // // When a path is updated in index, the path must be invalidated and // removed from tree cache. // // The signature for this extension is { 'T', 'R', 'E', 'E' }. // // A series of entries fill the entire extension; each of which // consists of: // // - NUL-terminated path component (relative to its parent directory); // // - ASCII decimal number of entries in the index that is covered by the // tree this entry represents (entry_count); // // - A space (ASCII 32); // // - ASCII decimal number that represents the number of subtrees this // tree has; // // - A newline (ASCII 10); and // // - 160-bit object name for the object that would result from writing // this span of index as a tree. // // An entry can be in an invalidated state and is represented by having // a negative number in the entry_count field. In this case, there is no // object name and the next entry starts immediately after the newline. // When writing an invalid entry, -1 should always be used as entry_count. // // The entries are written out in the top-down, depth-first order. The // first entry represents the root level of the repository, followed by the // first subtree--let's call this A--of the root level (with its name // relative to the root level), followed by the first subtree of A (with // its name relative to A), ... // // === Resolve undo // // A conflict is represented in the index as a set of higher stage entries. // When a conflict is resolved (e.g. with "git add path"), these higher // stage entries will be removed and a stage-0 entry with proper resolution // is added. // // When these higher stage entries are removed, they are saved in the // resolve undo extension, so that conflicts can be recreated (e.g. with // "git checkout -m"), in case users want to redo a conflict resolution // from scratch. // // The signature for this extension is { 'R', 'E', 'U', 'C' }. // // A series of entries fill the entire extension; each of which // consists of: // // - NUL-terminated pathname the entry describes (relative to the root of // the repository, i.e. full pathname); // // - Three NUL-terminated ASCII octal numbers, entry mode of entries in // stage 1 to 3 (a missing stage is represented by "0" in this field); // and // // - At most three 160-bit object names of the entry in stages from 1 to 3 // (nothing is written for a missing stage). // // === Split index // // In split index mode, the majority of index entries could be stored // in a separate file. This extension records the changes to be made on // top of that to produce the final index. // // The signature for this extension is { 'l', 'i', 'n', 'k' }. // // The extension consists of: // // - 160-bit SHA-1 of the shared index file. The shared index file path // is $GIT_DIR/sharedindex.. If all 160 bits are zero, the // index does not require a shared index file. // // - An ewah-encoded delete bitmap, each bit represents an entry in the // shared index. If a bit is set, its corresponding entry in the // shared index will be removed from the final index. Note, because // a delete operation changes index entry positions, but we do need // original positions in replace phase, it's best to just mark // entries for removal, then do a mass deletion after replacement. // // - An ewah-encoded replace bitmap, each bit represents an entry in // the shared index. If a bit is set, its corresponding entry in the // shared index will be replaced with an entry in this index // file. All replaced entries are stored in sorted order in this // index. The first "1" bit in the replace bitmap corresponds to the // first index entry, the second "1" bit to the second entry and so // on. Replaced entries may have empty path names to save space. // // The remaining index entries after replaced ones will be added to the // final index. These added entries are also sorted by entry name then // stage. // // == Untracked cache // // Untracked cache saves the untracked file list and necessary data to // verify the cache. The signature for this extension is { 'U', 'N', // 'T', 'R' }. // // The extension starts with // // - A sequence of NUL-terminated strings, preceded by the size of the // sequence in variable width encoding. Each string describes the // environment where the cache can be used. // // - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from // ctime field until "file size". // // - Stat data of plumbing.excludesfile // // - 32-bit dir_flags (see struct dir_struct) // // - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file // does not exist. // // - 160-bit SHA-1 of plumbing.excludesfile. Null SHA-1 means the file does // not exist. // // - NUL-terminated string of per-dir exclude file name. This usually // is ".gitignore". // // - The number of following directory blocks, variable width // encoding. If this number is zero, the extension ends here with a // following NUL. // // - A number of directory blocks in depth-first-search order, each // consists of // // - The number of untracked entries, variable width encoding. // // - The number of sub-directory blocks, variable width encoding. // // - The directory name terminated by NUL. // // - A number of untracked file/dir names terminated by NUL. // // The remaining data of each directory block is grouped by type: // // - An ewah bitmap, the n-th bit marks whether the n-th directory has // valid untracked cache entries. // // - An ewah bitmap, the n-th bit records "check-only" bit of // read_directory_recursive() for the n-th directory. // // - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data // is valid for the n-th directory and exists in the next data. // // - An array of stat data. The n-th data corresponds with the n-th // "one" bit in the previous ewah bitmap. // // - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit // in the previous ewah bitmap. // // - One NUL. // // == File System Monitor cache // // The file system monitor cache tracks files for which the core.fsmonitor // hook has told us about changes. The signature for this extension is // { 'F', 'S', 'M', 'N' }. // // The extension starts with // // - 32-bit version number: the current supported version is 1. // // - 64-bit time: the extension data reflects all changes through the given // time which is stored as the nanoseconds elapsed since midnight, // January 1, 1970. // // - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap. // // - An ewah bitmap, the n-th bit indicates whether the n-th index entry // is not CE_FSMONITOR_VALID. // // == End of Index Entry // // The End of Index Entry (EOIE) is used to locate the end of the variable // length index entries and the beginning of the extensions. Code can take // advantage of this to quickly locate the index extensions without having // to parse through all of the index entries. // // Because it must be able to be loaded before the variable length cache // entries and other index extensions, this extension must be written last. // The signature for this extension is { 'E', 'O', 'I', 'E' }. // // The extension consists of: // // - 32-bit offset to the end of the index entries // // - 160-bit SHA-1 over the extension types and their sizes (but not // their contents). E.g. if we have "TREE" extension that is N-bytes // long, "REUC" extension that is M-bytes long, followed by "EOIE", // then the hash would be: // // SHA-1("TREE" + + // "REUC" + ) // // == Index Entry Offset Table // // The Index Entry Offset Table (IEOT) is used to help address the CPU // cost of loading the index by enabling multi-threading the process of // converting cache entries from the on-disk format to the in-memory format. // The signature for this extension is { 'I', 'E', 'O', 'T' }. // // The extension consists of: // // - 32-bit version (currently 1) // // - A number of index offset entries each consisting of: // // - 32-bit offset from the beginning of the file to the first cache entry // in this block of entries. // // - 32-bit count of cache entries in this blockpackage index package index