redis ziplist

2019-12-06 2233 words 5 minutes

Contents

ziplist

结构

ziplist是一个压缩的双向链表，由一个特殊编码的连续内存块构成。如果没有特殊指定，所有字段都是以小端序来存储的。

1
2
3
4
5
ziplist：
<zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>

entry：
<prevlen> <encoding> <entry-data>

ziplist：

uint32_t zlbytes：ziplist占据的内存大小，包括zlbytes字段。
uint32_t zltail：最后一个entry（非zlend）的offset，记录了尾节点距离起始地址的字节数，相当于尾指针的作用。
uint16_t zllen：entry的数目，如果元素数目大于2^16-2，那么值为2^16-1，只有通过遍历才能得知，有多少个元素。zlen不计入entry数目。
uint8_t zlend：特殊的entry，ziplist的结尾标记，值为0xFF。

每个entry大小是不固定的：

prevlen：前一个entry的长度，用于从后往前遍历时计算prev entry的起始地址。长度小于254字节和大于等于254字节分别使用两种存储方式。
value prevlen size
< 254 0xbb 1字节
>=254 0xFE xx xx xx xx 5字节
当长度大于等于254时，0xFE代表prevlen值的类型，后面的4个字节才是长度。
这样存储方式刚好可以与zlend的0xFF区分开来，没有一个entry的开头会是0xFF。

value	prevlen	size
< 254	`0xbb`	1字节
>=254	`0xFE xx xx xx xx`	5字节

encoding：记录了entry-data数据类型和长度。

前两个bit代表存储的是字节数组，还是整型。

encoding	size	值的类型
`00pppppp`	1 byte	长度<=63的字节数组
`01pppppp	qqqqqqqq`	2 bytes（14 bits使用大端序存储）
`10000000	qqqqqqqq	rrrrrrrr
`11000000`	1 byte	int16_t
`11010000`	1 byte	int32_t
`11100000`	1 byte	int64_t
`11110000`	1 byte	24 bit signed integer
`11111110`	1 byte	8 bit signed integer
`1111xxxx`	1 byte	没有`content`属性，4个bit存储[0, 12]的值
`11111111`	1 byte	zlend

entry-data：节点的值，字节数组或整形。

插入

ziplist使用了很多宏来实现，主要是进行类型转换和指针运算以便访问相关的字段。

插入分三个情况，

需要分别准备新entry的prevlen，encoding，分配空间。

prevlen 先看插入位置是否在zlen之前，
- 如果不在，p处的prevlen就是新entry的prevlen。
- 如果在，此时无法直接得知prevlen，需要看zlend前是否还有元素。若没有，则zipRawEntryLength(ptail)计算tail元素的长度。
p不是zlend的时候，新entry的len会作为p处entry的prevlen，需要确保p处prevlen空间足够。
encoding 对于encoding，先会尝试编码为整型zipTryEncoding(s,slen,&value,&encoding)，
- 如果可以编码为整型，那么zipIntSize(encoding)得到entry-data的大小。
- 如果无法合法的编码为整型，那么根据slen编码为字节数组zipStoreEntryEncoding(NULL,encoding,slen)，entry-data的大小为slen。
分配空间，计算出各字段所需的空间后，memmove()来完成空间的调整。要注意对源地址的处理，p-nextdiff。
- 如果nextdiff == 0，说明p处entry的prevlen，可以保存新entry的大小。
- 如果nextdiff == 4，那么说明空间不够，此时p处entry的prevlen原大小为1 byte，从p往前数4 bytes，加起来的5 bytes作为新prevlen的存储，因此从p - 4处开始memmove()。
- 如果nextdiff == -4，说明空间多出4 bytes，从p + 4的位置开始memmove()。

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen;
    unsigned int prevlensize, prevlen = 0;
    size_t offset;
    int nextdiff = 0;
    unsigned char encoding = 0;
    long long value = 123456789; /* initialized to avoid warning. Using a value
                                    that is easy to see if for some reason
                                    we use it uninitialized. */
    zlentry tail;

    /* Find out prevlen for the entry that is inserted.
     *
     * entry开头的1 byte是不是0xFF
     *
     * 如果插入位置是在zlend处，需要判断zlend前是否有entry，
     * 如果没有，那么新prevlen为0，
     * 如果有，那么需要计算zl+zltail处entry的大小，将其设置为新entry的prevlen
     *
     */
    if (p[0] != ZIP_END) {
        ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);
    } else {
        unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
        if (ptail[0] != ZIP_END) {
            prevlen = zipRawEntryLength(ptail);
        }
    }

    /* See if the entry can be encoded */
    if (zipTryEncoding(s,slen,&value,&encoding)) {
        /* 'encoding' is set to the appropriate integer encoding */
        reqlen = zipIntSize(encoding);
    } else {
        /* 'encoding' is untouched, however zipStoreEntryEncoding will use the
         * string length to figure out how to encode it. */
        reqlen = slen;
    }
    /* We need space for both the length of the previous entry and
     * the length of the payload. */
    reqlen += zipStorePrevEntryLength(NULL,prevlen);
    reqlen += zipStoreEntryEncoding(NULL,encoding,slen);

    /* When the insert position is not equal to the tail, we need to
     * make sure that the next entry can hold this entry's length in
     * its prevlen field. */
    int forcelarge = 0;
    nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;
    if (nextdiff == -4 && reqlen < 4) {
        nextdiff = 0;
        forcelarge = 1;
    }

    /* Store offset because a realloc may change the address of zl. */
    offset = p-zl;
    zl = ziplistResize(zl,curlen+reqlen+nextdiff);
    p = zl+offset;

    /* Apply memory move when necessary and update tail offset. */
    if (p[0] != ZIP_END) {
        /* Subtract one because of the ZIP_END bytes */
        memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);

        /* Encode this entry's raw length in the next entry. */
        if (forcelarge)
            zipStorePrevEntryLengthLarge(p+reqlen,reqlen);
        else
            zipStorePrevEntryLength(p+reqlen,reqlen);

        /* Update offset for tail */
        ZIPLIST_TAIL_OFFSET(zl) =
            intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);

        /* When the tail contains more than one entry, we need to take
         * "nextdiff" in account as well. Otherwise, a change in the
         * size of prevlen doesn't have an effect on the *tail* offset. */
        zipEntry(p+reqlen, &tail);
        if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
        }
    } else {
        /* This element will be the new tail. */
        ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
    }

    /* When nextdiff != 0, the raw length of the next entry has changed, so
     * we need to cascade the update throughout the ziplist */
    if (nextdiff != 0) {
        offset = p-zl;
        zl = __ziplistCascadeUpdate(zl,p+reqlen);
        p = zl+offset;
    }

    /* Write the entry */
    p += zipStorePrevEntryLength(p,prevlen);
    p += zipStoreEntryEncoding(p,encoding,slen);
    if (ZIP_IS_STR(encoding)) {
        memcpy(p,s,slen);
    } else {
        zipSaveInteger(p,value,encoding);
    }
    ZIPLIST_INCR_LENGTH(zl,1);
    return zl;
}

连锁更新

插入或删除时，可能导致后面entry存储的prevlen发生变化。理论上，扩张和缩小都会有，但redis有意忽略了缩小的情况，避免连续的插入导致频繁的扩容和缩小。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
if (next.prevrawlensize > rawlensize) {
    /* This would result in shrinking, which we want to avoid.
     * So, set "rawlen" in the available bytes. */
    zipStorePrevEntryLengthLarge(p+rawlen,rawlen);
} else {
    zipStorePrevEntryLength(p+rawlen,rawlen);
}

/* Stop here, as the raw length of "next" has not changed. */
break;