valgrind 用法

介绍 valgrind 的 Memcheck、Callgrind、Helgrind、Massif 等工具的用法。

Memcheck

功能是:

  1. 未允许的内存访问,例如 overrun 或者 underrun 堆内存,或者 oveerrun 栈顶,或者访问已经被释放的内存。
  2. 使用未定义的值,例如没有被初始化的值,或者从其他未初始化的值派生出来的值。
  3. 错误释放堆内存,类似于 double free,或者错误搭配 new/new[]/malloc。
  4. 在内存分配时,传入负数作为大小。
  5. 内存泄露。
1
valgrind --tool=memcheck --leak-check=full --track-origins=yes

Illegal read / Illegal write errors

1
2
3
4
5
int main()
{
int y = 1;
printf ("x = %d\n", *(int*)(&y + 10));
}
1
2
3
==25724== Invalid read of size 4
==25724== at 0x400674: main
==25724== Address 0x0 is not stack'd, malloc'd or (recently) free'd

Use of uninitialised values

1
2
3
4
5
int main()
{
int x;
printf ("x = %d\n", x);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
==38591== Use of uninitialised value of size 8
==38591== at 0x571A32B: _itoa_word (in /usr/lib64/libc-2.17.so)
==38591== by 0x571E5B0: vfprintf (in /usr/lib64/libc-2.17.so)
==38591== by 0x57254E8: printf (in /usr/lib64/libc-2.17.so)
==38591== by 0x400682: main

==38591== Conditional jump or move depends on uninitialised value(s)
==38591== at 0x571A335: _itoa_word (in /usr/lib64/libc-2.17.so)
==38591== by 0x571E5B0: vfprintf (in /usr/lib64/libc-2.17.so)
==38591== by 0x57254E8: printf (in /usr/lib64/libc-2.17.so)
==38591== by 0x400682: main
==38591== Uninitialised value was created by a stack allocation
==38591== at 0x400667: main

在程序操作未初始化的数据时,memcheck 会记录这些数据,但不会输出错误。只有当这个程序尝试使用这些未初始化的数据,并且会影响这个程序的外部可见性时,才会报错。在这个例子中,x 没有被初始化。memcheck 观察到这个值被传给 printf 和 vfprintf,但并没有输出错误。当 vfprintf 检查 x 的值,并且试图将其转换为 ASCII 字符串时,memcheck 才会输出错误。

可以通过设置 --track-origins=yes 来检查这些未初始化的数据。它会使得 memcheck 跑得更慢,但更容易发现问题。

Use of uninitialised or unaddressable values in system calls

Memcheck 检查 system call 中所有的未初始化变量,包括:

  1. 所有的直接变量。
  2. 或者,如果一个 system call 需要读取程序中的某一段缓存,memcheck 会检查整个缓存是否 addressable,并且其内容是否被初始化。
  3. 或者,如果这个 system call 需要写到用户提供的某一段缓存中,memcheck 需要检查这段缓存是否 addressable。

Illegal frees

Memcheck 记录通过 malloc 和 new 分配的所有块,所以他可以知道某个 free 或者 delete 是否合法。在这里,出现了 double free。

C

1
2
3
4
5
6
int main()
{
void * x = malloc(10);
free(x);
free(x);
}

在出现非法读写的错误时,memcheck 会尝试解析被释放的地址。

1
2
3
4
5
6
7
8
9
==27728== Invalid free() / delete / delete[] / realloc()
==27728== at 0x4C2B06D: free (vg_replace_malloc.c:540)
==27728== by 0x4006E4: main
==27728== Address 0x5ab1c80 is 0 bytes inside a block of size 10 free'd
==27728== at 0x4C2B06D: free (vg_replace_malloc.c:540)
==27728== by 0x4006D8: main
==27728== Block was alloc'd at
==27728== at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==27728== by 0x4006C8: main

注意,如果我们释放的是指向某个堆空间内部的指针,则也会出现类似的错误。

1
2
3
4
5
int main()
{
void * x = malloc(10);
free(x + 1);
}

此时,报错为 is 1 bytes inside ... alloc'd。这样的报错说明不是 double free 的问题。

1
2
3
4
5
6
==31870== Invalid free() / delete / delete[] / realloc()
==31870== at 0x4C2B06D: free (vg_replace_malloc.c:540)
==31870== by 0x4006DC: main
==31870== Address 0x5ab1c81 is 1 bytes inside a block of size 10 alloc'd
==31870== at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==31870== by 0x4006C8: main

C++

1
2
3
4
5
6
int main()
{
int * x = new int[2]{1, 2};
delete [] x;
delete [] x;
}
1
2
3
4
5
6
7
8
9
==12625== Invalid free() / delete / delete[] / realloc()
==12625== at 0x4C2BB8F: operator delete[](void*) (vg_replace_malloc.c:651)
==12625== by 0x400705: main
==12625== Address 0x5ab1c80 is 0 bytes inside a block of size 8 free'd
==12625== at 0x4C2BB8F: operator delete[](void*) (vg_replace_malloc.c:651)
==12625== by 0x4006F2: main
==12625== Block was alloc'd at
==12625== at 0x4C2AC38: operator new[](unsigned long) (vg_replace_malloc.c:433)
==12625== by 0x4006C8: main
1
2
3
4
5
int main()
{
int * x = new int[2]{1, 2};
delete [] (x + 1);
}
1
2
3
4
5
6
==18227== Invalid free() / delete / delete[] / realloc()
==18227== at 0x4C2BB8F: operator delete[](void*) (vg_replace_malloc.c:651)
==18227== by 0x4006FC: main
==18227== Address 0x5ab1c84 is 4 bytes inside a block of size 8 alloc'd
==18227== at 0x4C2AC38: operator new[](unsigned long) (vg_replace_malloc.c:433)
==18227== by 0x4006C8: main

When a heap block is freed with an inappropriate deallocation function

1
2
3
4
5
int main()
{
int * x = new int[2]{1, 2};
delete x;
}
1
2
3
4
5
6
==29865== Mismatched free() / delete / delete []
==29865== at 0x4C2B6DF: operator delete(void*, unsigned long) (vg_replace_malloc.c:595)
==29865== by 0x400710: main
==29865== Address 0x5ab1c80 is 0 bytes inside a block of size 8 alloc'd
==29865== at 0x4C2AC38: operator new[](unsigned long) (vg_replace_malloc.c:433)
==29865== by 0x4006E8: main

C++ 中的 allocate 和 deallocate 操作包含:

  1. If allocated with malloc, calloc, realloc, valloc or memalign, you must deallocate with free.
  2. If allocated with new, you must deallocate with delete.
  3. If allocated with new[], you must deallocate with delete[].

最要命的是在 Linux 中其实无所谓搞混这些 allocate 和 deallocate 操作。但是这样错误的搭配在其他平台比如 Solaris 上则会导致 crash。

Overlapping source and destination blocks

memcpystrcpystrncpystrcatstrncat 中,指向 src 和 dst 的指针不能 overlap。

比较奇怪的是下面的代码并不会出现这样的错误。

1
2
3
4
5
6
7
8
int main()
{
int * x = new int[3]{1, 2, 3};
memcpy(x + 1, x, 2);
void * y = malloc(10);
memset(y, 0, 10);
memcpy(y + 1, y, 2);
}

原因是 gcc 会把 memcpy 优化掉,通过 -fno-builtin-memcpy 可以禁用这个性质。

1
2
3
4
==15974== Source and destination overlap in memcpy(0x5ab1c81, 0x5ab1c80, 2)
==15974== at 0x4C2E81D: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==15974== by 0x40075E: main
==15974==

Fishy argument values

所有的内存分配函数都需要指定需要分配的大小,而这个大小肯定是一个非负数,并且不会特别大。例如我们不太可能在64位机器上分配 2**23 个字节。这样的大小通常来自于一个人为的错误,而这样的值就被称为 fishy value。在 malloccallocreallocmemalignnewnew []

1
2
3
4
int main()
{
void * x = malloc(-2);
}
1
2
3
==27571== Argument 'size' of function malloc has a fishy (possibly negative) value: -2
==27571== at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==27571== by 0x40067A: main

但同时注意到编译器也会触发警告。

1
2
warning: argument 1 value ‘18446744073709551614’ exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
void * x = malloc(-2);

Memory leak detection

Memcheck 会记录所有分配的堆对象。
通过设置 --leak-check,对于在结束时尚未被释放的 block,Memcheck 会检查这个 block 是否可以从 root set 被访问。这里的 root set 包括:

  1. 通用寄存器
  2. 在所有可访问内存,包括栈中的 initialised, aligned, pointer-sized data words

有两种方法可以访问一个 block:

  1. start-pointer,也就是指向 block 开始位置的指针
  2. interior-pointer,也就是指向 block 中间位置的指针

一个 interior-pointer 是如何产生的呢?

  1. 它可能开始是一个 start-pointer,但后来被程序故意或者非故意地向前移动
    比如如果程序使用 tagged pointer。因为对齐的缘故,指针最右边的几位通常是0,所以会被用来存储额外的信息。这些信息可能导致指针被前进。
  2. 可能是内存中的某个垃圾
  3. 【stdstring】可能是指向 std::string 内部持有的 char[] 的指针
    例如某些编译期会在 std::string 的头部存3个字段,分别表示数组的 length、capacity 和 refcount,在这3个字段之后再放置真正的 char[]。但是它返回的指针是指向 char[] 的。这个有点类似 Redis 的 SDS 的实现。
  4. 【length64】Some code might allocate a block of memory, and use the first 8 bytes to store (block size - 8) as a 64bit number. sqlite3MemMalloc does this.
  5. 【newarray】可能是执行某个 T[] 中的指针。这里的 T 是一个 C++ 对象,它具有自定义的析构函数,并使用 new[] 分配,delete[] 删除
    在这种情况下,一些编译器会在指针的前面放一个 magic cookie,用来存放长度。
  6. 【multipleinheritance】It might be a pointer to an inner part of a C++ object using multiple inheritance.

You can optionally activate heuristics to use during the leak search to detect the interior pointers corresponding to the stdstring, length64, newarray and multipleinheritance cases. If the heuristic detects that an interior pointer corresponds to such a case, the block will be considered as reachable by the interior pointer. In other words, the interior pointer will be treated as if it were a start pointer.

下面一张图阐释了几种内存泄露的情况:

  1. DR: Directly reachable
  2. IR: Indirectly reachable
  3. DL: Directly lost
  4. IL: Indirectly lost
  5. (y)XY: it’s XY if the interior-pointer is a real pointer
  6. (n)XY: it’s XY if the interior-pointer is not a real pointer
  7. (_)XY: it’s XY in either case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
     Pointer chain            AAA Leak Case   BBB Leak Case
------------- ------------- -------------
(1) RRR ------------> BBB DR
(2) RRR ---> AAA ---> BBB DR IR
(3) RRR BBB DL
(4) RRR AAA ---> BBB DL IL
(5) RRR ------?-----> BBB (y)DR, (n)DL
(6) RRR ---> AAA -?-> BBB DR (y)IR, (n)DL
(7) RRR -?-> AAA ---> BBB (y)DR, (n)DL (y)IR, (n)IL
(8) RRR -?-> AAA -?-> BBB (y)DR, (n)DL (y,y)IR, (n,y)IL, (_,n)DL
(9) RRR AAA -?-> BBB DL (y)IL, (n)DL

Pointer chain legend:
- RRR: a root set node or DR block
- AAA, BBB: heap blocks
- --->: a start-pointer
- -?->: an interior-pointer

前四行比较简单。
第5行,如果这个 interior pointer 是一个 real pointer,则是 directly reachable。如果不是 real pointer 则是 directly lost。
第6行,相当于是2+5,没啥特殊的。
第7行,相当于是5+1,没啥特殊的。
第8行,可以分成三种情况。

1
2
3
4
5
     Pointer chain            AAA Leak Case   BBB Leak Case
------------- ------------- -------------
(8) RRR -?-> AAA -?n-> BBB (y)DR, (n)DL DL
(8) RRR -?y-> AAA -?y-> BBB DR IR
(8) RRR -?n-> AAA -?y-> BBB DL IL

但实际输出的时候,不会按照上面9个情况来输出,而是设计为如下的形式:

  1. Still reachable 1-2行
  2. Definitely lost 3行
  3. Indirectly lost 4/9行
  4. Possibly lost 5-8行
    这种情况下可能存在1或者多个指针构成的链,但其中至少有一个指针是 interior pointer。这个可能只是内存中的随机值,并恰巧指向了某个块。

Details of Memcheck’s checking machinery

这一节介绍 Memcheck 的原理。

Valid-value (V) bits

It is simplest to think of Memcheck implementing a synthetic CPU which is identical to a real CPU, except for one crucial detail. Every bit (literally) of data processed, stored and handled by the real CPU has, in the synthetic CPU, an associated “valid-value” bit, which says whether or not the accompanying bit has a legitimate value. In the discussions which follow, this bit is referred to as the V (valid-value) bit.

Each byte in the system therefore has a 8 V bits which follow it wherever it goes. For example, when the CPU loads a word-size item (4 bytes) from memory, it also loads the corresponding 32 V bits from a bitmap which stores the V bits for the process’ entire address space. If the CPU should later write the whole or some part of that value to memory at a different address, the relevant V bits will be stored back in the V-bit bitmap.

In short, each bit in the system has (conceptually) an associated V bit, which follows it around everywhere, even inside the CPU. Yes, all the CPU’s registers (integer, floating point, vector and condition registers) have their own V bit vectors. For this to work, Memcheck uses a great deal of compression to represent the V bits compactly.

Copying values around does not cause Memcheck to check for, or report on, errors. However, when a value is used in a way which might conceivably affect your program’s externally-visible behaviour, the associated V bits are immediately checked. If any of these indicate that the value is undefined (even partially), an error is reported.

Valid-address (A) bits

结合 VV 和 VA

Debugging MPI Parallel Programs with Valgrind

Callgrind

检查程序中函数调用过程中出现的问题。

Cachegrind

检查程序中缓存使用出现的问题。

Helgrind

检查多线程程序中出现的竞争问题。

Massif

检查程序中堆栈使用中出现的问题。

Reference

  1. https://valgrind.org/docs/manual/mc-manual.html#mc-manual.errormsgs