redis ziplist

ziplist

结构

ziplist是一个压缩的双向链表,由一个特殊编码的连续内存块构成。如果没有特殊指定,所有字段都是以小端序来存储的。

1
2
3
4
5
ziplist
<zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>

entry
<prevlen> <encoding> <entry-data>

ziplist:

  • uint32_t zlbytes:ziplist占据的内存大小,包括zlbytes字段。
  • uint32_t zltail:最后一个entry(非zlend)的offset,记录了尾节点距离起始地址的字节数,相当于尾指针的作用。
  • uint16_t zllen:entry的数目,如果元素数目大于2^16-2,那么值为2^16-1,只有通过遍历才能得知,有多少个元素。zlen不计入entry数目。
  • uint8_t zlend:特殊的entry,ziplist的结尾标记,值为0xFF

每个entry大小是不固定的:

  • prevlen:前一个entry的长度,用于从后往前遍历时计算prev entry的起始地址。长度小于254字节和大于等于254字节分别使用两种存储方式。

    valueprevlensize
    < 2540xbb1字节
    >=2540xFE xx xx xx xx5字节

    当长度大于等于254时,0xFE代表prevlen值的类型,后面的4个字节才是长度。

    这样存储方式刚好可以与zlend0xFF区分开来,没有一个entry的开头会是0xFF

  • encoding:记录了entry-data数据类型和长度。

    前两个bit代表存储的是字节数组,还是整型。

    encodingsize值的类型
    00pppppp1 byte长度<=63的字节数组
    `01ppppppqqqqqqqq`2 bytes(14 bits使用大端序存储)
    `10000000qqqqqqqqrrrrrrrr
    110000001 byteint16_t
    110100001 byteint32_t
    111000001 byteint64_t
    111100001 byte24 bit signed integer
    111111101 byte8 bit signed integer
    1111xxxx1 byte没有content属性,4个bit存储[0, 12]的值
    111111111 bytezlend
  • entry-data:节点的值,字节数组或整形。

插入

ziplist使用了很多宏来实现,主要是进行类型转换和指针运算以便访问相关的字段。

插入分三个情况, -w629

需要分别准备新entry的prevlenencoding,分配空间。

  1. prevlen 先看插入位置是否在zlen之前,

    • 如果不在,p处的prevlen就是新entry的prevlen
    • 如果在,此时无法直接得知prevlen,需要看zlend前是否还有元素。若没有,则zipRawEntryLength(ptail)计算tail元素的长度。

    p不是zlend的时候,新entry的len会作为p处entry的prevlen,需要确保pprevlen空间足够。

    -w736

  2. encoding 对于encoding,先会尝试编码为整型zipTryEncoding(s,slen,&value,&encoding)

    • 如果可以编码为整型,那么zipIntSize(encoding)得到entry-data的大小。
    • 如果无法合法的编码为整型,那么根据slen编码为字节数组zipStoreEntryEncoding(NULL,encoding,slen),entry-data的大小为slen
  3. 分配空间, 计算出各字段所需的空间后,memmove()来完成空间的调整。要注意对源地址的处理,p-nextdiff

    • 如果nextdiff == 0,说明p处entry的prevlen,可以保存新entry的大小。

    • 如果nextdiff == 4,那么说明空间不够,此时p处entry的prevlen原大小为1 byte,从p往前数4 bytes,加起来的5 bytes作为新prevlen的存储,因此从p - 4处开始memmove()

      -w614

    • 如果nextdiff == -4,说明空间多出4 bytes,从p + 4的位置开始memmove()

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen;
    unsigned int prevlensize, prevlen = 0;
    size_t offset;
    int nextdiff = 0;
    unsigned char encoding = 0;
    long long value = 123456789; /* initialized to avoid warning. Using a value
                                    that is easy to see if for some reason
                                    we use it uninitialized. */
    zlentry tail;

    /* Find out prevlen for the entry that is inserted.
     *
     * entry开头的1 byte是不是0xFF
     *
     * 如果插入位置是在zlend处,需要判断zlend前是否有entry,
     * 如果没有,那么新prevlen为0,
     * 如果有,那么需要计算zl+zltail处entry的大小,将其设置为新entry的prevlen
     *
     */
    if (p[0] != ZIP_END) {
        ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);
    } else {
        unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
        if (ptail[0] != ZIP_END) {
            prevlen = zipRawEntryLength(ptail);
        }
    }

    /* See if the entry can be encoded */
    if (zipTryEncoding(s,slen,&value,&encoding)) {
        /* 'encoding' is set to the appropriate integer encoding */
        reqlen = zipIntSize(encoding);
    } else {
        /* 'encoding' is untouched, however zipStoreEntryEncoding will use the
         * string length to figure out how to encode it. */
        reqlen = slen;
    }
    /* We need space for both the length of the previous entry and
     * the length of the payload. */
    reqlen += zipStorePrevEntryLength(NULL,prevlen);
    reqlen += zipStoreEntryEncoding(NULL,encoding,slen);

    /* When the insert position is not equal to the tail, we need to
     * make sure that the next entry can hold this entry's length in
     * its prevlen field. */
    int forcelarge = 0;
    nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;
    if (nextdiff == -4 && reqlen < 4) {
        nextdiff = 0;
        forcelarge = 1;
    }

    /* Store offset because a realloc may change the address of zl. */
    offset = p-zl;
    zl = ziplistResize(zl,curlen+reqlen+nextdiff);
    p = zl+offset;

    /* Apply memory move when necessary and update tail offset. */
    if (p[0] != ZIP_END) {
        /* Subtract one because of the ZIP_END bytes */
        memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);

        /* Encode this entry's raw length in the next entry. */
        if (forcelarge)
            zipStorePrevEntryLengthLarge(p+reqlen,reqlen);
        else
            zipStorePrevEntryLength(p+reqlen,reqlen);

        /* Update offset for tail */
        ZIPLIST_TAIL_OFFSET(zl) =
            intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);

        /* When the tail contains more than one entry, we need to take
         * "nextdiff" in account as well. Otherwise, a change in the
         * size of prevlen doesn't have an effect on the *tail* offset. */
        zipEntry(p+reqlen, &tail);
        if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
        }
    } else {
        /* This element will be the new tail. */
        ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
    }

    /* When nextdiff != 0, the raw length of the next entry has changed, so
     * we need to cascade the update throughout the ziplist */
    if (nextdiff != 0) {
        offset = p-zl;
        zl = __ziplistCascadeUpdate(zl,p+reqlen);
        p = zl+offset;
    }

    /* Write the entry */
    p += zipStorePrevEntryLength(p,prevlen);
    p += zipStoreEntryEncoding(p,encoding,slen);
    if (ZIP_IS_STR(encoding)) {
        memcpy(p,s,slen);
    } else {
        zipSaveInteger(p,value,encoding);
    }
    ZIPLIST_INCR_LENGTH(zl,1);
    return zl;
}

连锁更新

插入或删除时,可能导致后面entry存储的prevlen发生变化。理论上,扩张和缩小都会有,但redis有意忽略了缩小的情况,避免连续的插入导致频繁的扩容和缩小。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
if (next.prevrawlensize > rawlensize) {
    /* This would result in shrinking, which we want to avoid.
     * So, set "rawlen" in the available bytes. */
    zipStorePrevEntryLengthLarge(p+rawlen,rawlen);
} else {
    zipStorePrevEntryLength(p+rawlen,rawlen);
}

/* Stop here, as the raw length of "next" has not changed. */
break;
Licensed under CC BY-NC-SA 4.0
comments powered by Disqus