chapter14: Index¶
1 Basic Concepts¶
Index mechanisms used to speed up access to desired data. 索引用来加速查找。
search Key: attribute to set of attributes used to look up records in a file. 属性为用于查找文件中记录的属性集。
An index file consists of records (called index entries) of the form. 索引文件通常是有顺序的
1.1 Index Evaluation Metrics¶
- Access types supported efficiently
- Point query: records with a specified value in the attribute.
点查询
- Range query: records with an attribute value falling in a specified range of values.
范围查询
- Point query: records with a specified value in the attribute.
- Access time
- Insertion time
- Deletion time
- Space overhead
1.2 Ordered Indices¶
Primary index 是很宝贵的,只能有一个,其他都是辅助索引。
Example
- Dense index(稠密索引) — Index record appears for every search-key value in the file. 所有 search-key 都要出现在索引文件里。
- Sparse Index(稀疏索引): contains index records for only some search-key values.
Example
Tip
1.3 Multilevel Index(多级索引)¶
2 B+-Tree Index¶
Example
2.1 Observations about B+-trees¶
!!! note Query on B+-Tree
2.2 Insert of B+-Tree¶
!!! note Insert on B+-Trees
Example
2.3 Delete of B+-Tree¶
!!! note Delete on B+-Tree
Example
!!! warning B+-Tree: height and size estimation
2.4 other issues in indexing¶
- Record relocation and secondary indices
- If a record moves, all secondary indices that store record pointers have to be updated如果一条记录移动,存储记录指针的所有二级索引都必须更新
- Node splits in B+-tree file organizations become very expensive B+树文件组织中的节点拆分变得非常昂贵
- Solution: use primary-index search key instead of record pointer in secondary index解决方案:在二级索引中使用主索引搜索键代替记录指针
- Variable length strings as keys
- Variable fanout
- Prefix compression
- Key values at internal nodes can be prefixes of full key内部节点的键值可以是全键的前缀
- Keep enough characters to distinguish entries in the subtrees separated by the key value保留足够的字符来区分由键值分隔的子树中的条目
2.5 Multiple-Key Access¶
3 Bulk Loading and Bottom-up Build¶
Example
如果要排序的内容较大,无法放下内存,可以使用外部排序。
fanout 可以计算出来。
可以用 level-order 写到磁盘里,便于顺序访问所有索引,此时块是连续的。(便于顺序访问所有数据项)
这里的代价就是建好后,一次 seek 后全部写出去 (9 blocks)
Example2
把刚刚那棵 B+ 树叶子节点(即遍历所有数据)需要 1seek+6blocks. 随后和上面的数据合并后,写回磁盘时需要 1seek+13blocks.
Merge two existing two B+-trees , to create a new B+-tree using the Bottom-UP Build algorithm, as in LSM-tree Index 假设有两棵这样生成的 B+ 树,将他们合并在一起。首先把叶子节点拿出来排序。