鉴于贵司大作tikv、tidb、tiflash在Rust、Go和C++之间横跳，因此学习Rust被提上了日程。

本文简称叫Rust: ACPPPP，它主要是用来讨论Rust在一些方面和C++的异同，而不是介绍这一门语言。所以文章是话题形式的，会有很多穿插，例如在讨论所有权时，会直接讲结构体。

rustup：Toolchain 管理工具

安装 nightly toolchain

1	rustup toolchain install nightly

然后激活

1	rustup default nightly

Override

Toolchain 的选择使用下面：

在命令行中指定，如cargo +beta
RUSTUP_TOOLCHAIN环境变量
用 rustup override set 覆盖当前目录以及子目录的设置
rustup show 和 rustup override unset 可以查看和取消 override
rust-toolchain.toml 或者 rust-toolchain
使用 default toolchain

Cargo：包管理工具

workspace、crate 和 mod

C++ 并没有什么包管理，如果我们想要引用什么东西，代码声明一下，然后确保链接器能够看到定义就行。并且因为模板的引入，很多都是头文件，直接 include 就行。

访问 mod

crate内

src/main.rs 和 src/lib.rs 被称为 crate 根。

一个 crate 下有若干 mod，每个 mod 的成员在对应 mod 文件夹的 mod.rs 中列出。
例如下面的声明，会查找当前目录下的 hello.rs，或者 hello 目录下的 mod.rs。

1 2	// mod.rs mod hello;

可以通过#[path = "foo.rs"]来指定 mod 的位置。这种用法可以在函数中 inline。可以在这段代码中查看具体用法。

跨crate

跨 crate 访问，需要使用 Cargo.toml 中定义的 crate 别名。

rustc 和 crate

rustc只接受一个文件，并只生成一个crate。

1	rustc hello.rs --crate-type=lib

workspace

workspace 不能嵌套。所以如果两个 Cargo 工程，并且工程 A 依赖于工程 B，比较好的方案是平行摆放两个工程，并设置 dependencies。

virtual workspace

crate 内部组织形式

mod.rs
一个 common.rs 用作公共依赖，mod.rs 注明只对该 mod 服务
一个或多个 xxx_engine.rs 用来定义 mod 对外暴露的主要功能
在 mod 中 pub mod 和 pub use。
不 pub use。
一个或多个 yyy_impls.rs，用来辅助 xxx_engine.rs 的实现
可能需要 use xxx_engine.rs 中的一些东西。
反过来，xxx_engine.rs 中会 use yyy_impls::*。
不 pub use。

编译与链接

调试信息

可以通过 -C debug_info 来指定调试信息的等级，其中0(false)、1、2(true) 分别对应无/行信息以及全部信息。如果设置为0，那么很可能部分代码的行号是打印不出来的。
另外，Cargo.toml 中的 [profile] 也可以修改。

条件编译

features

features 用来支持条件编译和可选依赖。
在编译时，可以通过 --features 去 enable 某个 feature。
例如在 Cargo.toml 中，webp 是一个 feature，并且它没有 enable 其他 feature。而 ico 这个 feature 会 enable 两个 feature 即 bmp 和 png。

[features]
# Defines a feature named `webp` that does not enable any other features.
webp = []
ico = ["bmp", "png"]

在代码中，可以用下面两种方式，让代码只对 webp 被 enable 的情况下生效，

// 1
#[cfg(feature = "webp")]
// 2
if cfg!(feature = "webp") {

}

默认情况下，所有的 feature 都是 disable 的，但可以把 feature 加入 default 中来默认 enable 它。

1	default = ["webp"]

在编译时，可以指定 --no-default-features 来 disable default feature。

dependency features

在指定 dependency 时，也可以指定 features。
例如下面的配置中，将 flate2 的 default features 去 disable 掉，但额外 enable 了 zlib 这个 feature。

1 2	[dependencies] flate2 = { version = "1.0.3", default-features = false, features = ["zlib"] }

optional dependency

下面的语句表示 gif 默认不会作为依赖

1 2	[dependencies] gif = { version = "0.11.1", optional = true }

它会隐式定义了如下的 feature

1 2	[features] gif = ["dep:gif"]

可以通过 cfg(feature = "gif") 来判断 dependency 是否被启用，通过 --features gif 来显式启用 dependency。

如下的代码表示 avif 会 enable ravif 和 rgb 这两个 feature，但因为显式使用了 dep:ravif 和 dep:rgb，所以系统不会隐式生成 ravif 和 rgb 这两个 feature。

[dependencies]
ravif = { version = "0.6.3", optional = true }
rgb = { version = "0.8.25", optional = true }

[features]
avif = ["dep:ravif", "dep:rgb"]

feature 的传递

假设一个 Cargo 工程的 default-member 是 x，其中有个 feature f。如果使用 edition 2018，那么指定 –features f 是不行的，需要在 workspace 下面传递一下 feature f，写成 f = [“x/f”] 才行

Cargo.toml 解读

[dependencies]
依赖的第三方package
[dev-dependencies]
只有tests/examples/benchmarks依赖的第三方package
[features]
用来支持条件编译和可选依赖
[lib]

[[test]]
两个中括号说明是表数组，可以这样写

[[test]]
path = ""
name = ""
[[test]]
path = ""
name = ""

[package]
[workspace]
相对于 package 而言，workspace 是一系列共享同样的 Cargo.lock 和输出目录的包。
包含 members 数组。
[profile]
[patch.crates-io]

Cargo.lock 解读

Cargo.lock 是记录每个 crate 对应版本的工具。例如下面的配置表示依赖一个0.1.0版本的 azure_core 库，可是这个版本具体对应哪个 rev 呢？github 打开发现 master 上已经是0.2.1的版本了，我们显然不可能是用的 master 啊。

1	azure_core = { version = "0.1.0", git = "https://github.com/Azure/azure-sdk-for-rust"}

此时查看 Cargo.lock 就能发现类似下面的配置，其中具体指出了0.1.0对应的 git commit

[[package]]
name = "azure_core"
version = "0.1.0"
source = "git+https://github.com/Azure/azure-sdk-for-rust#b3c53f4cec4a6b541e49388b51e696dc892f18a3"
dependencies = [
 "async-trait",
 ...
]

一个 workspace 只在根目录有一个 Cargo.lock。这确保了所有的 crate 都使用完全相同版本的依赖。
如果在 Cargo.toml 和 add-one/Cargo.toml 中都增加 rand crate，则 Cargo 会将其都解析为同一版本并记录到唯一的 Cargo.lock 中。

Cargo 的常见问题

failed to authenticate when downloading repository这样的错误一般出现在和github交互的场景中。使用下面的办法可解决

1
2
3

eval `ssh-agent -s`
ssh-add
cargo build

Blocking waiting for file lock on the registry index 这样的错误一般删除 rm $CARGO_HOME/.package-cache.

所有权、生命周期

为了检验是否初步理解Rust所有权，可以尝试自己实现一个双向链表。

绑定和可变性

let和let mut

let x = y 表示把 y 这个值 bound/assign 到变量 x 上，因为 let 是 immutable 的，所以就不能修改变量x，也就是再次给它赋值(assign)了。如果需要能re-bound或者re-assign，就需要let mut x = y这种形式。

对结构体而言，如果它是immutable的，那么它的所有成员也都是immutable的。在C++中，可以声明类中的某个成员是mutable的，这样即使在const类中也可以修改它，但Rust不允许这样。

由此还派生出了&mut和&两种引用。可以可变或者不可变地借用let mut绑定的值，但只能不可变地借用let绑定的值。

Pattern Matching

下面的语句都在尝试定义一个&mut {interger}类型的a，但第三条语句是编译不过的。原因是它触发了Rust里面的pattern matching。

let ref mut a = 5;
let a = &mut 5;
let &mut a = 5;
// 下面这个语句肯定编译不过，但可以从错误中得到a的实际类型，所以是个常见的白嫖编译器类型推导的办法
let _: () = a;

我们很熟悉对 enum 类型(诸如Option和Result)进行 Pattern Matching 的做法。下面介绍一些不一样的，例如可以 Pattern Match 一个 struct，有点类似 C++ 的 structual binding。

struct Point {
    x: i32,
    y: i32,
}

fn main() {
    let p = Point { x: 0, y: 7 };
    let Point { x: a, y: b } = p;
    assert_eq!(0, a);
    assert_eq!(7, b);
}

通过@，可以在 Pattern Matching 的时候同时指定期待的值，并将该值保存到局部变量中，有点类似于 Haskell 的用法。

#[derive(Debug)]
struct Point {
    x: i32,
    y: i32,
}

fn main() {
    let p = Point {x: 1, y: 2};
    match p {
        Point { x: xv @ 1, y: yv @ 2} => println!("matched x {:?} y {:?}", xv, yv),
        _ => println!("no match"),
    }

    match p {
        pt @ Point { .. } => println!("matched pt {:?}", pt),
        _ => println!("no match"),
    }

    // pattern bindings after an `@` are unstable
    // https://github.com/rust-lang/rust/issues/65490
    match p {
        pt @ Point { x, y } => println!("matched pt {:?} x {:?} y {:?}", pt, x, y),
        _ => println!("no match"),
    }
}

如何在pattern matching的时候不move，而是borrow呢？如下所示，g是一个owned值，而不是一个mutable borrow的值。解决方案就是直接match v.intention_mut()

let intention = v.intention_mut();
match intention {
    vehicle::Intention::Die => {
    },
    vehicle::Intention::Goto(g) => {
    },
}

Variable shadow

在Rust中有如下称为Variable shadow的做法。一个问题油然而生，既然可以直接let mut，为什么还需要如下的做法呢？

1 2	let x = 1; let x = 2;

其实 shadow 的含义是这个变量的生命周期没变，只是无法通过从前的名字访问它了，而 let mut 在重新assign 之后，原来的 value 就会被析构掉。进一步举个例子，给出下面这个程序，它的输出是啥？

struct S {
    x: i32
}

impl Drop for S{
    fn drop(&mut self) {
        println!("drop {}", self.x)
    }
}

fn main() {
    {
        let a = S { x: 1 };
        let a = S { x: 2 };
    }
    {
        let mut a = S { x: 1 };
        a = S { x: 2 };
    }
}

结论如下

drop 2
drop 1
drop 1
drop 2

为什么呢？对于第一种情况，a被rebound了，但是S {x: 1}只是被shadow了，并没有立即析构。但对于第二种情况，在rebound的时候，S { x: 1 }就被析构了。

移动和借用

可以把所有对值的使用方式归纳为三种：复制、移动和引用(或者称为指针)：

复制的缺点是浪费空间和时间。
移动的缺点是很多变量的地址会变，这个 FFI 带来很多麻烦，需要用 Box/Pin 将一些东西分配到堆上的固定地址，并且传出裸指针。
引用的缺点是存在 NULL，为了避免 NULL，又要引入生命周期注解等机制。此外，即使在有了移动语义后，多线程之间依然可以通过引用来访问同一个值，产生并发问题。

Rust中的移动可能伴随着内存地址的变化。很显然，一个对象从A方法通过调用被移动到B方法中，那么肯定出于不同的栈帧中，它的地址肯定会变化，所以要提防这个。而C++中移动更类似金蝉脱壳，将老对象中的东西拆出来用来构建新对象。

移动

// Can compile
let x = 1;
let y = x;
println!("Result: {}", x);

// Can not compile
let vx = vec![1];
let vy = vx;
println!("Result: {}", vx[0]);

引用

引用和借用是什么关系呢？创建一个引用的行为称为借用，在借用过程中，是不可以访问owned值的，否则出现use of borrowed xxx错误。

在C++中，引用必须在定义时就绑定，并且，无论它是可变引用T&还是不可变引用const T&，都不能重新绑定。这很难受，并且std::reference_wrapper也不是什么时候都可以用的。Rust中这些都不是问题，例如下面的代码就可以正常运行。

1
2
3

let mut a: &i32;
a = &1;
a = &2;

只能有一个可变借用，或多个不可变借用

考虑下面的Race Condition:

多个指针访问同一块数据
至少一个指针被用来修改数据
没有同步机制

Rust解决方案是只能同时有一个可变借用，或者多个不可变借用。问题来了，如果Owner在写，有一个可变引用在写，或者有一个不可变引用在读呢？
对于对象的成员函数的调用，这种情况是不存在的。如下所示，成员函数需要&self或者&mut self。

let mut x = "123".to_string();
let y = &mut x;

x.push_str("456");

println!("y = {}", y);

那么对于primitive types呢？运行下面的代码，发现出现错误提示”use of borrowed aaa“，这也就是说在借用期间，是无法访问owned value的，毕竟被借走了嘛。

let mut aaa: i32 = 1;
let bbb = &mut aaa;
aaa += 1;
println!("bbb {:?}", *bbb);

注意，下面的代码给人一种”可以同时使用借用和owned的值的错觉“，但并不是这样。因为change_aaa对aaa的借用在调用完成之后就结束了，后面aaa = 2的时候就没有其他借用情况了。

fn change_aaa(bbb: &mut i32){
    *bbb = 2;
}

fn main() {
    let mut aaa: i32 = 1;
    // TODO 是否可以想个办法异步执行
    change_aaa(&mut aaa)
    aaa = 2;
}

demo

通过移动来实现析构

std::mem::drop函数用来析构T的对象，这是对移动的应用。在调用drop函数时，_x的所有权会被移入。当然，如果实现了Copy，那么drop就无效了。

1	pub fn drop<T>(_x: T) { }

借用的demo

当一个函数接受引用作为参数时，需要显式借用，这一点和C++不一样。

fn fn_takes_ref(i: &int) {
    println!("{}", i);
}
// Error
fn_takes_ref(1);
// Ok
fn_takes_ref(&1);

Clone和Copy

Copy

Rust有一个叫做std::marker::Copy的特殊trait，其中不带有任何方法，所以基本可以视作是给编译器提供的一个marker。如果一个类型实现了Copy trait，在赋值的时候使用复制语义而不是移动语义。

Rust不允许自身或其任何部分实现了Drop trait的类型使用Copy trait。这听起来很奇怪，但如果我说Copy trait的实现就是bitwise的Copy，就合理了。所以可以近似理解为Copy只适用于C++中的trivial的对象。

Clone

对于非trivial对象，又想复制怎么办呢？一个方法是实现Clone trait。可以理解为是C++中的拷贝构造函数。

容易想到，如果仅仅实现深复制，那么实际上就是递归调用所有field的.clone()而已，这其实等价于下面的代码

1
2
3

#[derive(Clone)]
struct S {
}

但注意，编译器在要求实现Copy后，Clone的含义也必须代表bitwise memcpy。因此我们通常会通过#[derive(Copy,Clone)]来支持自动生成Copy特性。

所有权相关设施

介绍Borrow(.borrow())/BorrowMut(.borrow_mut())/AsRef(.as_ref())/AsMut(.as_mut())/ToOwned(.to_owned())等基础的实现。

as_ref/as_mut 和借用

什么时候用 as_ref/as_mut 呢？如下代码所示，如果需要获得容器 Option 持有的对象的借用，那么我们不能先 unwrap 再 &mut 借用，而应该先 as_mut 再 unwrap。

struct S {
    a: i32,
    b: i32,
}

fn main() {
    let mut a: Option<S> = Some(S {a: 1, b: 2});
    let b = &mut a;
    // Error
    let c = &mut b.unwrap();
    // Ok
    let c = b.as_mut().unwrap();
}

Borrow和AsRef的区别是什么？

可以看到AsRef和Borrow两个trait的定义不能说非常相似，也可以说是一模一样了，那为什么会分成两个呢？

pub trait AsRef<T: ?Sized> {
    fn as_ref(&self) -> &T;
}

pub trait Borrow<Borrowed: ?Sized> {
    fn borrow(&self) -> &Borrowed;
}

显然这个疑问是普遍的，通常的说法是 Borrow 更严格，目的是借用；AsRef 支持的类型更广，目的是类型转换。但说实话，还是一头雾水。这篇文章讲解了个例子，概括如下：

HashMap 存储 (K, V) 对，并且可以通过提供的 &K 查找对应的 &mut V。因为按 K 写，按 &K 取，所以需要保证这两个的行为是一致的。
【Q】为什么 HashMap要按照&K取呢？
于此同时，我们可以实现一个CaseInsensitiveString结构，它可以看做是忽略大小写比较的一个String。
问题来了，我们有impl Borrow<str> for String，那么是否可以实现impl Borrow<str> for CaseInsensitiveString呢？
答案是不可以的，这样会破坏HashMap的一致性。例如我两个只是大小写不同的字符串，按照s: CaseInsensitiveString比较是相等的，按照s.borrow()比较就不相等了。
但这就够了么？难道CaseInsensitiveString不可以转换成&str么？当然可以，所以有AsRef。

cannot infer type for type parameter `Borrowed` declared on the trait `BorrowMut`

1 2	let a = Box::new(RefCell::new(1)); (*a.borrow_mut().get_mut()) = 2;

为什么不能从`&mut`调用Clone？

从下面的实现可以看到，标准库没有为 &mut 提供 Clone，原因是会产生指向同一个位置的两个 &mut。

impl<T: ?Sized> Clone for *const T {
    fn clone(&self) -> Self {
        *self
    }
}

impl<T: ?Sized> Clone for *mut T {
    fn clone(&self) -> Self {
        *self
    }
}

impl<T: ?Sized> Clone for &T {
    fn clone(&self) -> Self {
        *self
    }
}

impl<T: ?Sized> !Clone for &mut T {}

下面的代码中，如果 clone 了 &mut MyStruct2，会出现多个指向同一个地址的 &'a mut MyStruct

#[derive(Clone)]
struct MyStruct {
    val: usize,
}

#[derive(Clone)]
struct MyStruct2<'a> {
    struct_reference: &'a mut MyStruct
}

但需要注意，clone 的目标不是 &T 而是 T。上面例子为什么会失败，原因是在 Clone MyStruct的时候递归地需要 Clone &'a mut MyStruct 导致的。但如果直接对一个 &mut T 调用 Clone 就不会出现编译问题，如下所示

#[derive(Clone)]
struct DoClone{
    x: i32
}
let mut dc = DoClone{x:1};
let mdc = &mut dc;
mdc.clone();

ToOwned和Clone的区别是什么？

pub trait ToOwned {
    type Owned: Borrow<Self>;
    fn to_owned(&self) -> Self::Owned;

    fn clone_into(&self, target: &mut Self::Owned) { ... }
}

下面这个例子很经典，"123" 是一个 &str 类型，对它调用 clone，还会得到一个 &str 类型。但调用 to_owned 则会得到一个 String 类型。

1 2	let become_str = "123".clone(); let become_String = "123".to_owned();

异常和错误处理

Option

如何处理Option呢？

unwrap+if
match，并处理 Some(e) 和 None
unwrap_or
map 组合子，and_then 组合子
?
得到 Result<T, NoneError>

Result

如何处理 Result 呢？

try!
?

常见组合子

filter_map
F 如果返回 None，则跳过处理这个元素

panic

1
2
3

macro_rules! panic {
    ($($arg:tt)*) => { ... };
}

一个 panic 操作会使得当前线程 panic。
诸如 Option 和 Result 的 unwrap 方法，如果结果是 None 或者 Err，则会导致 panic。

Panic 和 thread

let thread_join_handle = std::thread::spawn(move || {
    panic!("Pan");
});
let res = thread_join_handle.join();
assert!(res.is_err());

可以看到，如果线程 panic 了，那么 join 会得到一个 Err。

thread '<unnamed>' panicked at 'Pan', src\mod_thread\mod.rs:4:9
stack backtrace:
   0: std::panicking::begin_panic
             at /rustc/f83e0266cf7aaa4b41505c49a5fd9c2363166522\library\std\src/panicking.rs:588:12
   1: learn::mod_thread::test_join_panic::{{closure}}
             at .\src\mod_thread\mod.rs:4:9
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Process finished with exit code 0

捕获 panic

对 panic 的处理，可以是直接 abort，也可以是 unwind。通过std::panic::catch_unwind可以捕获 unwind 形式的 panic。在 Rust 1.0 中，panic 只能被父线程捕获，所以如果要捕获 panic，就必须为可能 panic 的代码启动一个新的线程，而 catch_unwind 可以缓解这问题。

catch_unwind 的用法通常是在 FFI 的边界中用来捕获所有的 panic，但我们无法获取和 panic 有关的信息，例如 backtrace。此时可以使用panic::set_hook。

另外一种捕获 panic 的方法，是 panic::set_hook。

Exception Safety

考虑下面的代码，在 unsafe 中，clone 可能 panic。一旦它 panic，因为已经 set_len 了，所以我们可能读到一些未初始化的数据。

impl<T: Clone> Vec<T> {
    fn push_all(&mut self, to_push: &[T]) {
        self.reserve(to_push.len());
        unsafe {
            // can't overflow because we just reserved this
            self.set_len(self.len() + to_push.len());

            for (i, x) in to_push.iter().enumerate() {
                self.ptr().add(i).write(x.clone());
            }
        }
    }
}

double panic

避免 double panic，可以使用 safe_panic。double panic 的情况可能出现在：

如果在某个线程中已经 panic 了，并且后续在某个对象的 Drop 中也会有 panic 逻辑，那么可能 panic while panicking
例如这里的Drop->stop->cancel_applying_snap->check_applying_snap 和 check_snap_status->check_applying_snap。

指针和智能指针

C++中，为了突破栈上分配的限制会在堆上分配对象，Rust中为了避免移动，有更进一步的往堆上创建对象的需求。C++不会对指针进行资源管理，后面标准库也只是断断续续支持了一些智能指针，但Rust希望做得更周到一点。

在Rust中有下面的指针：

*mut T/*const T
这是C的裸指针
Box
Pin
Rc
Arc
原子引用计数
Ref
RefCell
Cow
Cell
NotNull

raw pointer

Rust book 中介绍了一些 raw pointer 相对于其它 rust 指针的特点：

它们不保证能够指向有效内存。它们有可能是 null。
它们不会像 Box 一样会自动 gc，需要手动内存管理。
而 Box::from_raw 可以用来获得 raw pointer 的所有权，并进行自动的内存管理。
也可以通过 ptr::drop_in_place 来销毁。它可以处理 trait object 的情况，因为这个时候我们无法通过 ptr::read 去读出这个对象，然后销毁。drop_in_place 是一个编译器内置的实现。
它们是 POD，也就是说它们不具备 ownership。不像 Box，Rust 编译器不会检查诸如 use-after-free 的场景。
不同于 &，raw pointer 不具备任何 lifetime，所以编译器不能推断出 dangling pointer。
no guarantees about aliasing or mutability other than mutation not being allowed directly through a *const T。

Box

trait Deref/DerefMut

Deref 是 deref 操作符 * 的 trait，比如 *v。它的作用是：

对于实现了 Copy 对象，获得其拷贝
对于没有实现 Copy 的对象，获得其所有权

如下所示，一个智能指针对象U比如Box，如果它实现了U: Deref<Target=T>，那么Deref能够从它获得一个&T。实现上，我们从一个&Box<T>解两次引用，获得T，再返回&T。抽象一点来说，在实现了Deref后，能将&U变成&T，换种说法*x的效果就是*Deref::deref(&x)。这么做的好处是将所有奇怪的对智能指针的引用都转成&T。

impl<T: ?Sized> Deref for &T {
    type Target = T;
    fn deref(&self) -> &T {
        *self
    }
}

impl<T: ?Sized, A: Allocator> Deref for Box<T, A> {
    type Target = T;
    fn deref(&self) -> &T {
        &**self
    }
}

而 DerefMut 如下所示，它允许我们从智能指针获取一个 &mut T。容易发现，如果一个智能指针没实现 DerefMut，那么它实际上是 Immutable 的。

pub trait DerefMut: Deref {
    /// Mutably dereferences the value.
    #[stable(feature = "rust1", since = "1.0.0")]
    fn deref_mut(&mut self) -> &mut Self::Target;
}

Rc/Arc

如下图所示，三个 list 中，b 和 c 共享 a 的所有权。我们可以用 Rc 来描述。

注意虽然 Rc::clone(a) 等价于 a.clone()，但推荐使用 Rc::clone，因为这显式表示它只增加引用计数。

enum List {
    Cons(i32, Rc<List>),
    Nil,
}

use crate::List::{Cons, Nil};
use std::rc::Rc;

fn main() {
    let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
    let b = Cons(3, Rc::clone(&a));
    let c = Cons(4, Rc::clone(&a));
}

只读和 &mut

需要注意的是，Rc 只允许各个所有者之间只读地进行共享。否则，如果各个所有者能修改，那么就有可能data race。也就是说，Rc/Arc 不实现 AsMut 和 DerefMut，从而做到禁止可变借用。事实上，Rc 会在编译期进行不可变借用的检查。

例如，如果 Rc 中持有 FnMut，则会导致 “cannot borrow data in an Rc as mutable” 报错

1
2
3

let mut a = 10;
let r = Rc::new(|| a += 1);
r();

尽管如此，Rc 还是通过 Rc::get_mut 提供一种获得 &mut T 的方法。它会在运行期用 Rc::is_unique 来判断是否为唯一引用，并返回 Some(&mut T) 或者 None。

内部可变性(interior mutability)引用

总所周知，Rust 要求一个对象可以：

有多个不可变引用(aliasing)
有一个可变引用(mutability)

通过 Cell 和 RefCell 可以允许“多个可变引用”。其主要做法是可以通过&来 mutate 对象。

RefCell

RefCell 是类似于 Box 的指针，但不同于引用和 Box 类型，RefCell 在运行期检查借用。具体来说，RefCell 在运行期检查：

在任意时刻只能获得一个 &mut 或任意个 &
引用指向的对象是存在的

容易想到，RefCell 的内部实现肯定会有 unsafe 块，才能绕过编译期的可变/不可变借用检查，而 delay 到运行期检查。但仍然是要求在任何时候只允许有多个不可变借用或一个可变借用。如果运行时检查出现问题，则会 panic，如下所示。

1
2
3

// panic: already borrowed: BorrowMutError
let y1 = b.borrow_mut();
let y2 = b.borrow_mut();

RefCell在自己不可变的情况下，修改内部的值，这也就是内部可变性。可以类比为C++中一个const对象里面的mutable成员。那么RefCell也可以用在类似的场景下，例如一些需要存中间状态的状态机、Mocker等。

对RefCell的&和&mut借用，分别对应了.borrow()和.borrow_mut()方法。

RefCell是Send/Sync的么？将在Sync/Send章节中介绍。

RefCell 的 borrow_mut 和 get_mut

上文介绍了，RefCell 访问对象需要通过 borrow 系列方法，但还有一个 get_mut 方法，它是做啥的呢？
根据文档可以发现，这个方法直接在编译器从 RefCell 中获取 &mut T。如下所示，这遵循编译期的检查，比如两次 &mut T 会在编译器挡掉而不是在运行期 panic，有点不把它当 RefCell 用的感觉。

// OK
let x1 = b.get_mut();
let x2 = b.get_mut();
// Compile Error
x1.store(false, std::sync::atomic::Ordering::SeqCst);
x2.store(true, std::sync::atomic::Ordering::SeqCst);

如果我们联用，会在编译期报错，不过报错内容比较有趣。它说b.get_mut()是个可变借用，而b.borrow_mut()是个不可变借用。为什么不是两次可变借用的冲突？原因很简单，RefCell 本来就是支持的内部可变性嘛，所以对于 Rust 来讲，这是个不可变借用没问题。

let x1 = b.get_mut();
let y1 = b.borrow_mut();
x1.store(false, std::sync::atomic::Ordering::SeqCst);
y1.store(false, std::sync::atomic::Ordering::SeqCst);

Rc+RefCell

fn make_value(i: i32) -> Rc<RefCell<i32>> {
    Rc::new(RefCell::new(i))
}

fn main() {
    let value = make_value(5);
    let a = Rc::new(Cons(Rc::clone(&value), Rc::new(Nil)));
    let b = Cons(make_value(1), Rc::clone(&a));
    let c = Cons(make_value(2), Rc::clone(&a));
    *value.borrow_mut() += 10;
    let z = value.borrow_mut();
    // *z = 11; // error
    println!("a after = {:?}", a);
    println!("b after = {:?}", b);
    println!("c after = {:?}", c);
}

Cell

不同于 RefCell，Cell 实现可变性的办法是将持有的对象移出或者移进。所以它不是操作对象的可变引用 &mut，而是操作对象本身。这也意味着我们只能动态

对于实现了 Copy 的类型，get 方法可以获取当前值。
对于实现了 Default 的类型，take 可以取出当前值，并用 Default::default() 代替。
对于所有的类型
1. replace 可以替换并返回旧值。
2. into_inner 可以消费 Cell，并返回内部值。
3. set 类似于 replace，但直接 drop 旧值。

Mutex 可以和 RefCell/Cell 联用么？

Mutex 自带了内部可变性。完全可以理解，如果一个函数不可变，那么为何需要用 Mutex 保护呢？

let a = Mutex::new(1);
let mut lock = a.lock().unwrap();
*(lock.deref_mut()) = 2;

let a = Box::new(RefCell::new(1));
(*a.deref().borrow_mut().deref_mut()) = 2;

同时，RefCell 和 Cell 都没有实现 Sync，也就是说它们不是线程安全的。

GhostCell

见 GhostCell。

【建议在学习Pin之前，了解 Deref 和 DerefMut】
一个async fn 会产生一个自引用结构 AsyncFuture，因此它不能被移动。让一个对象不能被移动的第一步是将它分配到堆上，Box 可以做到这一点。但这并不够，因为如下所示，std::mem::swap 能够移动 Box 中的对象：

// Note we don't use &TestNUnpin
let mut rb = Box::new(TestNUnpin{b: "b".to_owned()});
let mut rb2 = Box::new(TestNUnpin{b: "a".to_owned()});
std::mem::swap(rb.as_mut(), rb2.as_mut());
println!("{} {}", rb.b, rb2.b); // Should be `a b`

另一方面，很多 FFI 会跨语言边界传递指针，这也需要保证地址是不变的。综上于是就有了 Pin。Pin 中包裹了一个指针，如 Pin<&mut T> , Pin<&T> , Pin<Box<T>>，Pin 保证对应的 T 不会被移动。
其实在 C++ 中也会有自引用结构，并且也会造成相同的问题。可以参考付老板的文章。

struct X {
    int data;
    int & ref;
    explicit X(int v) : data(v), ref(data) {}
};

void test() {
    std::vector<X> vec;
    for (int i = 0; i < 10; ++i) {
        vec.emplace_back(i);
    }
    for (const auto & x: vec) {
        std::cout << x.ref << std::endl; // boom!
    }
}

Pin 分析了下，诸如 std::mem::swap 之流为什么能移动，原因是它们都能获得 &mut T。所以只要限制可变借用，就可以在把对象 Pin 在堆上。限制获得可变引用简单啊，不实现 AsMut 就行。

Unpin 和 !Unpin 和 PhantomPinned

大部分的类型都被实现了 Unpin trait，表示能够随意被移动。

而一个可以被 Pin 住的值需要实现 !Unpin。因为 Rust 中带 ! 这样的称为 negative bounds，Rust 对它的支持还没有稳定下来。所以更一般的做法是让结构中持有一个 PhantomPinned 的 marker。

#[derive(Debug)]
struct StructCanBePinned {
    a: String,
    _marker: PhantomPinned,
}

std::marker::PhantomPinned 中被实现了 !Unpin，它会使持有它的结构体变成 !Unpin，从而无法被移动。

以上的容易理解，但为什么会有 Unpin 和 !Unpin 呢？原因是需要给类型分类，讨论在 Pin 之前和之后类型的行为。这肯定难以理解，所以不妨先看看Pin是如何创建的，再回过来看。

Pin 对象的创建方式

在下面的代码中，对一个实现了 trait Unpin 的类型 Target，可以直接通过 Pin::new 产生一个 Pin<P> 对象。

impl<P: Deref<Target: Unpin>> Pin<P> {
    pub const fn new(pointer: P) -> Pin<P> {
        unsafe { Pin::new_unchecked(pointer) }
    }
}
pub const unsafe fn new_unchecked(pointer: P) -> Pin<P> {
    Pin { pointer }
}

但如果说是一个!Unpin的对象，Pin::new 会返回错误 “error[E0277]: PhantomPinned cannot be unpinned”；或者错误 “the trait Unpin is not implemented for TestNUnpin“，和”note: consider using Box::pin“。可以通过打开下面代码的注释来检查。

use std::pin::Pin;

#[derive(Default, Debug)]
struct TestUnpin {
    a: String,
}
#[derive(Default, Debug)]
struct TestNUnpin {
    b: String,
}
impl !Unpin for TestNUnpin {}

fn main() {
    let rp = Pin::new(&mut TestUnpin::default());
    // let rnp = Pin::new(&mut TestNUnpin::default());
    // let rnp2 = Pin::new(&TestUnpin::default()); // error[E0277]: `PhantomPinned` cannot be unpinned
    let rnb = Box::pin(TestNUnpin::default());
}

使用不安全的 new_unchecked

我们可以通过 Pin::new_unchecked 来创建 !Unpin 的对象。但这是不安全的，因为我们不能保证传入的 pointer: P 指向的数据是被 pin 的。使用这个方法，需要保证 P::Deref/DerefMut 的实现中不能将 self 中的东西进行移动。这是因为 Pin 的 as_mut 和 as_def 会调用 P 的 deref(_mut)。

pub fn as_mut(&mut self) -> Pin<&mut P::Target> {
    // SAFETY: see documentation on this function
    unsafe { Pin::new_unchecked(&mut *self.pointer) }
}

我们可以构造出一个 evil 有问题的 case。在 DerefMut 中，我们将 b 的原值 move 了出来。然后我们将 EvilNUnpin 作为一个 String 的指针传进去。结果打印出来已经有问题了。解决方案也很简单，如果想要 T 不被移动，那么始终 pin 住 &mut T 就行。

此外，还需要保证这个 pointer 指向的对象不会再被移动，特别要注意不能以 &mut P::Target 这样的方式被移动，例如通过之前提的 mem::swap。
特别地，Pin 需要保证自己维护的指针不会再被移动了，**即使在自己销毁之后，也是不能被移动的**，但这个很难在编译期判定。如下代码所示，在两个 Pin 对象析构后，我们又可以移动对象 x1 和 x2 了。

fn test_new_unchecked() {
    // We can even swap !Unpin objects, with Pin::new_unchecked
    let mut x1 = TestNUnpin{ b: "1".to_owned() };
    let mut x2 = TestNUnpin{ b: "2".to_owned() };
    let ptr1 = &x1 as *const _ as isize;
    let ptr2 = &x2 as *const _ as isize;
    unsafe {
        let _pin1 = Pin::new_unchecked(&x1);
        let _pin2 = Pin::new_unchecked(&x1);
    }
    std::mem::swap(&mut x1, &mut x2);
    unsafe {
        let n1 = &*(ptr1 as *const TestNUnpin);
        let n2 = &*(ptr2 as *const TestNUnpin);
        assert_eq!(n1.b, "2");
        assert_eq!(n2.b, "1");
    }
}

对 Rc 使用 new_unchecked 也不安全

如下所示，我们可以获得 &mut T，从而又可以乱搞了。

use std::rc::Rc;
use std::pin::Pin;

let mut x = Rc::new(TestNUnpin{ b: "1".to_owned() });
let pinned = unsafe { Pin::new_unchecked(Rc::clone(&x)) };
{
    let p = pinned.as_ref();
}
drop(pinned);
// We can get &mut T now.
assert_eq!(Rc::get_mut(&mut x).is_some());

使用安全的 Box::pin

使用 Box::pin 会产生一个 Pin<Box<T>>。

let mut x1 = TestNUnpin{ b: "1".to_owned() };
let mut x2 = TestNUnpin{ b: "2".to_owned() };
let ptr1 = &x1 as *const _ as isize;
let ptr2 = &x2 as *const _ as isize;

let mut bx1 = Box::pin(x1);
let mut bx2 = Box::pin(x2);
std::mem::swap(&mut bx1, &mut bx2);

unsafe {
    let n1 = &*(ptr1 as *const TestNUnpin);
    let n2 = &*(ptr2 as *const TestNUnpin);
    // Should still be 1 and 2.
    assert_eq!(n1.b, "1");
    assert_eq!(n2.b, "2");
}

为什么 Box::pin 可以 Pin 住 !Unpin？
查看 Box::pin 的实现。它传入一个 T，然后创建一个Box<T>并立马 Pin 住它。

1
2
3

pub fn pin(x: T) -> Pin<Box<T>> {
    (box x).into()
}

在 Pin 之前，无法移动 T，这是因为只能同时有一个可变借用 &mut T。
在 Pin 之后，无法移动 T，这是因为 Box 被实现为 owned 且 unique 的。可以参考下面的代码

let mut t = TestNUnpin{b: "b".to_owned()};
let mt = &mut t;
let b = Box::pin(&mut t);
let mut t2 = TestNUnpin{b: "a".to_owned()};
std::mem::swap(mt, &mut t2);
println!("{} {}", t.b, t2.b);

使用安全的 pin_utils

还可以使用 pin_utils::pin_mut!。对于下面的代码，我们考量上述 new_unchecked 安全性的几点保证：

控制 Deref(Mut)
pin 的是 &mut T 而不是 T。
不能取出 &mut T
这很简单，因为开始的 $x 已经被 shadow 了。
不能再次移动 T
同上。

#[macro_export]
macro_rules! pin_mut {
    ($($x:ident),* $(,)?) => { $(
        // Move the value to ensure that it is owned
        let mut $x = $x;
        // Shadow the original binding so that it can't be directly accessed
        // ever again.
        #[allow(unused_mut)]
        let mut $x = unsafe {
            $crate::core_reexport::pin::Pin::new_unchecked(&mut $x)
        };
    )* }
}

这里的 shadow 非常重要，我们用下面的例子来说明。可以看到 xp 并没有 shadow 住 x，因此在它被 drop 后，x 又可以被 mutable borrow 了。所以 pin_mut 的实现中保证了

fn test_shadow() {
    // How pin_mut! takes effect.
    let mut x = TestNUnpin { b: "b".to_owned() };
    let mut xp = unsafe { Pin::new_unchecked(&mut x) };
    drop(xp);
    assert_eq!(x.b, "b");
    let mut x2 = TestNUnpin { b: "b2".to_owned() };
    std::mem::swap(&mut x, &mut x2);
    assert_eq!(x.b, "b2");
}

另外，这里的 let mut $x = $x 也很重要，它使得下面的代码可以编译

fn main() {
    let mut x = 5;
    let x_mut = &mut x;
    pin_mut!(x_mut);
    **x_mut = 10;
    print!("{}", x);
}

同时它可以拒绝

let mut foo = Foo { ... };

{
    pin_mut!(foo);
    let _: Pin<&mut Foo> = foo;
}

// Woops we now have an unprotected Foo when its supposed to be pinned and
// thus can break the guarantees of Pin::new_unchecked
let foo_ref: &mut Foo = &mut foo;

总结

总结几个疑问：

为什么可以直接 Pin::new 一个 Unpin 对象？
因为对于实现Unpin类型的对象，Pin不做任何保证。
为什么不能直接 Pin::new 一个 !Unpin 的对象？
因为这是不安全的，所以要么 unsafe 地 Pin::new_unchecked 来创建，要么借助于诸如 Box::pin 等安全的方法。

如果 T 是 Unpin，能获得 Pin 里面的 &mut 么？
可以通过Pin::get_mut获得

let mut p = TestUnpin{ "a".to_owned() };
let mut p2 = TestUnpin{ "b".to_owned() };
let mut rp = unsafe {
    Pin::new(&mut p)
};
let mut rp2 = unsafe {
    Pin::ne(&mut p2)
};
std::mem::swap(Pin::get_mut(rp), Pin::get_mut(rp2));
println!("{} {}", p.a, p2.a); // Should be `a b`

但如果类型是 !Unpin，我们就不能调用 Pin::get_mut。

现在回答为什么要有 Unpin 和 !Unpin 的问题。对于 Unpin 类型，它实际上是给 Pin 做了一个担保，告诉 Pin 即使我这个类型被移动了也没事，所以 Pin 对它的作用就是屏蔽了 &mut 的获取渠道。对于 !Unpin 和 PhantomPinned 类型，它们是真的不能被移动的，这不仅要借助 Pin，这些类型自己也要提供一个合适的接口，从它们来创建Pin。

Pin 和内部可变性

是不是被 Pin 的对象就不可以有内部可变性呢？不妨考虑下面一个更简单的对象，我们修改 a 的值，并不会导致任何地址上的变化，所以这个对象是可以有内部可变性的。
Pin 使得下面的代码不可编译，并报错”trait DerefMut is required to modify through a dereference, but it is not implemented for Pin<&mut SimpleNUnPin>“。

struct SimpleNUnPin {
    a: u64,
}

impl !Unpin for SimpleNUnPin {}

fn main()
{
    let x = SimpleNUnPin { a: 1 };
    pin_utils::pin_mut!(x);
    x.as_mut().a = 2;
}

那么，我们能够通过 RefCell 获得内部可变性么？这其实不安全。

NonNull

生命周期

生命周期(lifetime)是编译期中的 borrow checker 用来检查所有的借用都 valid 的结构。
当在结构体中持有一个引用时，需要指定生命周期，从而防止悬垂引用。

计算生命周期

下面的代码展示了 Lifetime 和 Scope 的区别。

fn main() {
    let i = 3; // Lifetime for `i` starts. ────────────────┐
    //                                                     │
    { //                                                   │
        let borrow1 = &i; // `borrow1` lifetime starts. ──┐│
        //                                                ││
        println!("borrow1: {}", borrow1); //              ││
    } // `borrow1 ends. ──────────────────────────────────┘│
    //                                                     │
    //                                                     │
    { //                                                   │
        let borrow2 = &i; // `borrow2` lifetime starts. ──┐│
        //                                                ││
        println!("borrow2: {}", borrow2); //              ││
    } // `borrow2` ends. ─────────────────────────────────┘│
    //                                                     │
}   // Lifetime ends. ─────────────────────────────────────┘

如下所示，发生了移动，只有一次析构

struct S {
    a: u64,    
}

impl Drop for S {
    fn drop(&mut self) {
        println!("drop!");
    }
}

fn main() {
    let s = {
        let s = S {
            a: 1
        };
        s
    };
}

但对于下面的代码，则会返回错误”error[E0597]: s.a does not live long enough”。这应该是 Rust 对自引用结构支持不够的问题。

use std::cell::RefCell;

struct S<'a> {
    a: u64,
    ra: RefCell<Option<&'a u64>>,
}

// impl<'a> Drop for S<'a> {
//     fn drop(&mut self) {
//         println!("drop!");
//     }
// }

fn main() {
    let s = {
        let s = S {
            a: 1,
            ra: RefCell::new(None),
        };
        *s.ra.borrow_mut() = Some(&s.a);
        s
    };
    let b = Box::into_raw(Box::new(s));
    println!("{}", (*b).a);
}

声明周期注解

下面表示 foo 具有生命周期参数 ‘a 和 ‘b，并且 foo 的 lifetime 不会超过 ‘a 和 ‘b 的 lifetime。

1 2	foo<'a, 'b> // `foo` has lifetime parameters `'a` and `'b`

函数

不考虑省略：

所有引用参数都需要带一个生命周期参数
返回的引用要么是’static，要么是和输入一样的生命周期

下面编译出错。这里，'a must live longer than the function，也就是说函数运行完之后，’a 应该还在 lifetime 中。而这里的 &String 在函数返回前就析构了，所以它肯定不满足 'a 的约束。

1	fn invalid_output<'a>() -> &'a String { &String::from("foo") }

方法

下面两个代码实际是等价的，所以如果希望 self 的生命周期就是 impl 的，那么应该加上 ‘a。

impl<'a> Foo<'a> {
    fn foo(&'a self, path: &str) -> Boo<'a> { /* */ }
}

impl<'a> Foo<'a> {
    fn foo<'b>(&'b self, path: &str) -> Boo<'b> { /* */ }
}

当然，未必要给 impl 加 lifetime 参数

struct Owner(i32);

impl Owner {
    // Annotate lifetimes as in a standalone function.
    fn add_one<'a>(&'a mut self) { self.0 += 1; }
    fn print<'a>(&'a self) {
        println!("`print`: {}", self.0);
    }
}

fn main() {
    let mut owner = Owner(18);

    owner.add_one();
    owner.print();
}

Elision

生命周期的协变和逆变

'a是'b的子类，这个在生命周期注解中的含义是'a比'b长，也就是说子类的生命周期大于等于父类。

'a : 'b

根据里氏替换原则，可以得到定义：

协变
子类可以替换父类，也就是生命周期短的参数可以接受生命周期长的参数。大部分指针都是协变或者不变的。
逆变

讨论变性：

&'a T
对'a协变，对 T 协变。
&'a mut T
对'a协变，对 T 不变。
fn(T) -> U
对 T 逆变，也就是它可以接受生命周期更短的参数。
对 U 协变。

讨论上面几个问题。
为什么 &'a mut T 对 T 不变？不妨考虑 T 分别是 &'static str 和 &'a str 的情况，显然前者是后者的子类。那么如果 T 协变，则前者就能代替后者。现在我们考虑下面的 overwrite 函数。如果协变成立，就可以事实上将 string 这个生命周期更短的赋值给 forever_str 这个生命周期更长的了。

fn overwrite<T: Copy>(input: &mut T, new: &mut T) {
    *input = *new;
}

fn test_varaint_mut() {
    let mut forever_str: &'static str = "hello";
    {
        let string = String::from("world");
        // *string is str
        // &*string is &'a str
        overwrite(&mut forever_str, &mut &*string);
    }
}

闭包和函数

Fn/FnMut/FnOnce

Rust对a(b,c,d)这样的调用搞了个有点像Haskell中的$的东西，目的是为了重载“对函数的调用”。

1
2
3

Fn::call(&a, (b, c, d))
FnMut::call_mut(&mut a, (b, c, d))
FnOnce::call_once(a, (b, c, d))

FnOnce 会获取自由变量的所有权，并且只能调用一次，调用完会把自己释放掉。
FnMut 会可变借用自由变量。
Fn 会不可变借用自由变量。
FnMut 和 Fn都可以调用多次。

可以用下面的代码确定某个函数具体实现了哪个trait，实现了的trait能够通过编译。

fn is_fn <A, R>(_x: fn(A) -> R) {}
fn is_Fn <A, R, F: Fn(A) -> R> (_x: &F) {}
fn is_FnMut <A, R, F: FnMut(A) -> R> (_x: &F) {}
fn is_FnOnce <A, R, F: FnOnce(A) -> R> (_x: &F) {}

查看代码，发现三者具有继承关系Fn : FnMut : FnOnce。

pub trait FnOnce<Args> {
    /// The returned type after the call operator is used.
    #[lang = "fn_once_output"]
    #[stable(feature = "fn_once_output", since = "1.12.0")]
    type Output;

    /// Performs the call operation.
    #[unstable(feature = "fn_traits", issue = "29625")]
    extern "rust-call" fn call_once(self, args: Args) -> Self::Output;
}

pub trait FnMut<Args>: FnOnce<Args> {
    /// Performs the call operation.
    #[unstable(feature = "fn_traits", issue = "29625")]
    extern "rust-call" fn call_mut(&mut self, args: Args) -> Self::Output;
}

pub trait Fn<Args>: FnMut<Args> {
    /// Performs the call operation.
    #[unstable(feature = "fn_traits", issue = "29625")]
    extern "rust-call" fn call(&self, args: Args) -> Self::Output;
}

为什么是这样的继承关系呢？这篇回答给出了解释。

确实可以让FnOnce、FnMut和FnOnce做7种自由组合，但其中只有三种traits是有意义的：

Fn/FnMut/FnOnce
FnMut/FnOnce
FnOnce

这是因为，如果传入&self可以解决的问题，传入&mut self也可以解决。传入&mut self可以解决的问题，传入self也可以解决。但反之就不一定成立。

所以 self 是大哥级的人物，动用了伤害很大，它能够解决一切的问题，所以他是最 base 的 trait，而不是最 derive 的 trait。

闭包对三个trait的实现

所有的闭包都实现了FnOnce
如果闭包只移出了所有权，则只实现FnOnce
如果闭包没移出所捕获变量的所有权，并修改了变量，则实现FnMut
如果闭包没移出所捕获变量的所有权，且没有修改变量，则实现Fn

move关键字不会改变闭包具体实现的trait，而只影响变量的捕获方式，我们将在下节讨论。

捕获

上面的章节中介绍了闭包可能实现的三个trait，这个章节说明闭包如何捕获环境中的变量。

C++中捕获的问题

在 C++ 中返回一个捕获了 Local 变量的闭包，是有安全问题的，见 get_f()。
对于类中的方法，如果捕获了this指针，哪怕是[=]，并传出，在对象析构之后也是有问题的，见 print_this_proxy()。

#include <iostream>
#include <functional>
struct S {
    int x_ = 0;
    S(int x) : x_(x) {}
    ~S() {
        printf("Bye\n");
    }
    std::function<void()> print_this_proxy() {
        return [=](){
            // Capture this->x_
            printf("x_ %d", x_);
        };
    }
};

std::function<void()> get_f() {
    S s(1);
    auto f = [&](){
        printf("S %d\n", s.x_);
    };
    return f;
}

int main(){ 
    auto f = get_f();
    f(); // Not safe

    std::function<void()> proxy;
    {
        S s(2);
        proxy = s.print_this_proxy();
    }
    proxy(); // Not safe
}

Rust 的捕获

Rust 的捕获相比 C++ 使人比较困惑。首先它没有地方指定捕获哪些变量；另外，还有个 move 关键字；最后还会加上复制和移动语义。

|| 42;
|x| x + 1;
|x:i32| x + 1;
|x:i32| -> i32 { x + 1 };
move |x:i32| -> i32 { x + 1 };

闭包按照什么方式捕获，取决于我们打算如何使用捕获后的变量。

我们不妨看一个例子，首先定义下面的结构。get_number、inc_number和destructor分别需要传入不可变引用，可变引用以及值。

struct MyStruct {
    text: &'static str,
    number: u32,
}
impl MyStruct {
    fn new (text: &'static str, number: u32) -> MyStruct {
        MyStruct {
            text: text,
            number: number,
        }
    }
    fn get_number (&self) -> u32 {
        self.number
    }
    fn inc_number (&mut self) {
        self.number += 1;
    }
    fn destructor (self) {
        println!("Destructing {}", self.text);
    }
}

下面代码展示了类似 fn 的情况，这里 fn 并没有捕获任何自由变量，因此下面的代码可以正常编译和运行。

let obj1 = MyStruct::new("Hello", 15);
let obj2 = MyStruct::new("More Text", 10);
let closure1 = |x: &MyStruct| x.get_number() + 3;
assert_eq!(closure1(&obj1), 18);
assert_eq!(closure1(&obj2), 13);

is_fn(closure1); 
is_Fn(&closure1);
is_FnMut(&closure1);
is_FnOnce(&closure1);

下面的代码展示了 Fn 的情况，这里 closure2 捕获了obj1的引用。后面的代码进行验证，仍然可以obj1.get_number()来不可变借用，但需要可变引用的obj1.inc_number()就不能通过编译了。

let obj1 = MyStruct::new("Hello", 15);
let obj2 = MyStruct::new("More Text", 10);
// obj1 is borrowed by the closure immutably.
let closure2 = |x: &MyStruct| x.get_number() + obj1.get_number();
assert_eq!(closure2(&obj2), 25);
// We can borrow obj1 again immutably...
assert_eq!(obj1.get_number(), 15);
// But we can't borrow it mutably.
// obj1.inc_number();               // ERROR

事实上，闭包类似语法糖，相当于把需要捕获的上下文封装到一个Context里面传给真正的执行单元。下面的代码中将 closure2 改写为自由函数 func2。它接受一个 Context 对象，里面封装了一个不可变引用，并且其生命周期等于 Context 的生命周期。

struct Context<'a>(&'a MyStruct);
let obj1 = MyStruct::new("Hello", 15);
let obj2 = MyStruct::new("More Text", 10);
let ctx = Context(&obj1);
fn func2 (context: &Context, x: &MyStruct) -> u32 {
    x.get_number() + context.0.get_number()
}
assert_eq!(func2(&ctx, &obj2), 25);
// We can borrow obj1 again immutably...
assert_eq!(obj1.get_number(), 15);
// But we can't borrow it mutably.
// obj1.inc_number(); // ERROR

上面的 case 中不能调用 obj1.inc_number()，原因 obj1 不是 mut 的，改写成 let mut obj1 = ... 就行。
但这样，func2 和 obj1.inc_number() 这不就是同时对 obj1 做 Immutable 和 Mutable Borrow 了么？其实在最后加一行，再调用一次 func2 就能报错了。看起来 Rust 还蛮智能的，func2 虽然可变借用，但后续没有用到了，所以就不影响 obj1.get_number()。

...
assert_eq!(func2(&ctx, &obj2), 25);
assert_eq!(obj1.get_number(), 15);
assert_eq!(func2(&ctx, &obj2), 26);

下面的代码展示了 FnMut 的情况。现在闭包里就直接是可变借用了。在闭包之外，既不能可变借用，也不能不变借用，否则都无法编译。

let mut obj1 = MyStruct::new("Hello", 15);
let obj2 = MyStruct::new("More Text", 10);
// obj1 is borrowed by the closure mutably.
let mut closure3 = |x: &MyStruct| {
    obj1.inc_number();
    x.get_number() + obj1.get_number()
};
assert_eq!(closure3(&obj2), 26);
assert_eq!(closure3(&obj2), 27);
assert_eq!(closure3(&obj2), 28);
// We can't borrow obj1 mutably or immutably
// assert_eq!(obj1.get_number(), 18);   // ERROR
// obj1.inc_number();                   // ERROR

下面的代码展示了FnOnce的情况

let obj1 = MyStruct::new("Hello", 15);
let obj2 = MyStruct::new("More Text", 10);
// obj1 is owned by the closure
let closure4 = |x: &MyStruct| {
    obj1.destructor();
    x.get_number()
};

尝试用四个函数检查下，发现上面三个 trait 的检查都无法通过编译，也就说明 closure4 没有实现上面三个 trait。

// Does not compile:
// is_fn(closure4);
// is_Fn(&closure4);
// is_FnMut(&closure4);
// Compiles successfully:
is_FnOnce(&closure4);

可以发现，闭包捕获变量按照 &T -> &mut T -> T 的顺序，和 Fn -> FnMut -> FnOnce 的继承关系如出一辙。也就是先派小弟尝试捕获，小弟解决不了，再请老大出山的思路。

当然，可以通过 move 关键字，强行请老大出山。

对于 move/move async 捕获，如果闭包中需要使用某个变量例如 p，并且在闭包调用完之后，还需要继续访问，则需要在调用闭包前进行 clone，例如得到 pp。

#[derive(Debug,Clone)]
struct Point {
    x: i32,
    y: i32,
}

fn foo() {
    let p = Point {x: 1, y: 2};
    let pp = p.clone();
    let total_price = move | price: i32| {
        p.x * p.y * price
    };
    let price = total_price(10);
    println!("p {:?} price {}", pp, price);
}

闭包能否被多个线程使用？

https://stackoverflow.com/questions/36211389/can-a-rust-closure-be-used-by-multiple-threads

并发与异步

本章只介绍原理，不涉及具体框架。

Send 和 Sync

trait Send 表示该类型的实例可以在线程之间移动。大多数的 Rust 类型都是 Send 的，另一些则不可以：

Rc<T> 只能在同一个线程内部使用，它就不能被实现为 Send 的。
裸指针也不是 Send 的。

由 Send 类型组成的新类型也是 Send 的。

trait Sync 表示多个线程中拥有该类型实例的引用。换句话说，对于任意类型 T，如果 &T 是 Send 的，那么 T 就是 Sync 的。Sync 的要求会更高一点。

一些常见类型对 Send 和 Sync 的支持

Rc

Rc 并不是 Send 的。原因是 Rc 共享同一个引用计数块，并且更新引用计数并不是原子的。如果两个线程同时尝试 clone，那么它们可能同时更新引用计数，从而可能会 race。

Arc

如果 T 是 Send 和 Sync 的，那么 Arc<T> 是 Send 和 Sync 的

1 2	unsafe impl<T: ?Sized + Sync + Send> Send for Arc<T> {} unsafe impl<T: ?Sized + Sync + Send> Sync for Arc<T> {}

初看这很奇怪，难道不是为了并发安全才用的 Arc 么？为什么反过来 Arc 还需要一个并发安全的类型 T 呢？其实和C++一样，智能指针的线程安全包含两个层面，即智能指针本身的线程安全，主要是引用计数的；以及智能指针保护的数据的线程安全：

Arc 相对 Rc 只是保证了引用计数这一块功能是并发安全的
如果类型不是并发安全的，通常需要配合 RwLock 和 Mutex 等使用。

RefCell

从定义看，RefCell 是 Send 的，但不是 Sync 的。很容易理解，一个具有内部可变性的对象的引用被各个线程持有，那岂不是可以瞎改了？

#[stable(feature = "rust1", since = "1.0.0")]
unsafe impl<T: ?Sized> Send for RefCell<T> where T: Send {}
#[stable(feature = "rust1", since = "1.0.0")]
impl<T: ?Sized> !Sync for RefCell<T> {}

引用

&T 需要 T 是 Sync 的，这个对应了上面的定义。
&mut T 需要 T 是 Send 的

1 2	unsafe impl<T: Sync + ?Sized> Send for &T {} unsafe impl<T: Send + ?Sized> Send for &mut T {}

裸指针

各类指针都不是 Send 的

1 2	impl<T: ?Sized> !Send for const T {} impl<T: ?Sized> !Send for mut T {}

当然可以简单包一层，从而间接得到可以 Send 或 Sync 的裸指针

struct MyBox(*mut u8);

unsafe impl Send for MyBox {}
unsafe impl Sync for MyBox {}

当然，也可以用同样的办法，通过 negative_impls，取消某些已经被 Send/Sync 的类型的特性。

#![feature(negative_impls)]

// I have some magic semantics for some synchronization primitive!
struct SpecialThreadToken(u8);

impl !Send for SpecialThreadToken {}
impl !Sync for SpecialThreadToken {}

Mutex 和 RwLock

Mutex 需要 Send，但不需要 Sync。

#[stable(feature = "rust1", since = "1.0.0")]
unsafe impl<T: ?Sized + Send> Send for Mutex<T> {}
#[stable(feature = "rust1", since = "1.0.0")]
unsafe impl<T: ?Sized + Send> Sync for Mutex<T> {}

RwLock 不仅需要 Send，还需要 Sync。这是因为它的不可变引用会被 Reader 们共享读取。

#[stable(feature = "rust1", since = "1.0.0")]
unsafe impl<T: ?Sized + Send> Send for RwLock<T> {}
#[stable(feature = "rust1", since = "1.0.0")]
unsafe impl<T: ?Sized + Send + Sync> Sync for RwLock<T> {}

Future

Future 的所有权可能在各个线程之间移动，那为什么 Future 不是 Send 的呢？

多线程

因为 Rust 目前不支持可变参数包，所以只能通过 spawn 闭包的形式创建线程。

如果子线程 panic 了，其他线程是没影响的，除非：

某个线程 join 了 panic 的线程，此时会得到一个包含 Err 的 Result，如果直接 unwrap 则会 panic
如果线程在获得锁后panic，这种现象称为 poison
此时，再次尝试 mutex.lock() 会得到 PoisonError，并且 mutex.is_poisoned() 会返回true。

线程间同步

线程间通信

可以使用类似Go的Channel的方式来通信，也就是所谓的Do not communicate by sharing memory; instead, share memory by communicating。
这里mpsc是multiple producer, single consumer的意思。
send方法返回一个Result<T, E>类型，所以如果接收端已经被丢弃了，将没有发送值的目标，所以发送操作会返回错误。

use std::sync::mpsc;
use std::thread;

fn main() {
    let (tx, rx) = mpsc::channel();

    thread::spawn(move || {
        let val = String::from("hi");
        tx.send(val).unwrap();
    });
}

因为生产者是可以有多个的，所以tx.clone()可以产生另一个生产之。

异步

async 和 await

下面展示了 async 的写法：

常见的 async 函数 initial 和 plus_two，两者写法实际上是等价的
对 async 函数，编译器在实现时也会最终转成 async 块的形式，并且会改写函数签名。
一种是 async 块 plus_one

下面的例子中蕴含展示了两个概念：

initial 和 plus_one，两者写法实际上是等价的
可以理解为 async fn 是个语法糖。它实际可以看成一个返回了 impl Future<Output = ...> 类型的普通函数。
其中，impl Future 是 impl Trait 特性的写法，表示这个类型实现了 trait Future。Output 是我们期望这个 Future 在 Ready 后实际返回的类型，比如在这里就是 i32。
在 plus_one 中的 async 块，也是一个 Future
这是一个重要的纠偏。很多人提到 async 就会想到回调函数，但实际上 async 的基础单位是 future 而不是函数。比如 tokio::spawn(x) 的参数 x 就是一个 async 块，不需要传入一个函数。

async fn initial() -> i32 {
    1
}
fn plus_one() -> impl Future<Output = i32> {
    async {
        initial().await + 1
    }
}
async fn plus_two() -> i32 {
    initial().await + 2
}

plus_one_res 和 plus_two_res 都是 Future，可以通过 block_on 获取结果。

fn main() {
    let plus_one_res = plus_one();
    let plus_two_res = plus_two();
    println!("{}", futures::executor::block_on(plus_one_res));
    println!("{}", futures::executor::block_on(plus_two_res));
}

也可以通过 join 同时 await 多个 Future。

futures::executor::block_on(async {
    let j = futures::future::join(plus_one_res, plus_two_res).await;
    println!("{}", j.0);
    println!("{}", j.1);
});

select

select 支持 default 和 complete 分支:

default 会在被 select 的 future 都没有完成时执行
complete 分支则用来处理所有 future 都完成且不需进一步处理的情况

trait Future

这里指的是 std::future::Future，因为在早前还有 futures::future::Future，它是一个“社区版”的实现。后来 trait Future 被整合到了标准库中，剩余部分整合到了 pub trait FutureExt: Future，其中包含了 map/then 等组合子。当然对于 trait Future 还有其他的扩展，例如 async-std。但回过头，先来看看最基础的 trait Future。

pub trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

poll 函数返回的 Poll 是个 enum，包含 Ready(T) 和 Pending 两个状态。但它并不只有忙等，如果在一次 poll 后返回的是 Pending ，那就会注册 cx.waker 这个回调，在 Future 后调用进行通知。

Pin<&mut Self> 实际是个指针，它是为了解决自引用结构的问题。

impl Future

impl 了 trait Future 的类型有很多，例如 f.map 生成的 Map，f.then 生成的 Then 这些组合子都是 Future。

例如，下面代码为 Map 类型 impl Future。

pub struct Map<I, F> {
    // Used for `SplitWhitespace` and `SplitAsciiWhitespace` `as_str` methods
    pub(crate) iter: I,
    f: F,
}

// in futures-0.1.31, src/future/map.rs
// 注意，这是一个较老的版本，所以future.poll的签名也不一样。在futures-0.3.15中该实现被挪到了futures-util中
impl<U, A, F> Future for Map<A, F>
    where A: Future,
          F: FnOnce(A::Item) -> U,
{
    type Item = U;
    type Error = A::Error;

    fn poll(&mut self) -> Poll<U, A::Error> {
        let e = match self.future.poll() {
            Ok(Async::NotReady) => return Ok(Async::NotReady),
            Ok(Async::Ready(e)) => Ok(e),
            Err(e) => Err(e),
        };
        e.map(self.f.take().expect("cannot poll Map twice"))
         .map(Async::Ready)
    }
}

这里e.map().map()比较独特，前一个map是把self.f应用到e里面的东西，并且清空self.f，让它成为一次性的调用。后一个是把将map的结果包在Async::Ready里面。

async 的生命周期

和传统函数不同，async fn 会获取引用以及其他拥有非 ‘static 生命周期的参数，并返回被这些参数生命周期约束的 Future：

// This function:
async fn foo(x: &u8) -> u8 { *x }

// Is equivalent to this function:
fn foo_expanded<'a>(x: &'a u8) -> impl Future<Output = u8> + 'a {
    async move { *x }
}

这意味着这些 future 被 async fn 函数返回后必须要在它的非 ‘static 参数仍然有效时 .await。

async 块和闭包允许使用 move 关键字，这和普通的闭包一样。一个 async move 块会获取所指向变量的所有权，允许它的生命周期超过当前作用域(outlive)，但是放弃了与其他代码共享这些变量的能力：

下面的例子中，多个 async 块都可以访问同一个局部变量 my_string，只要它们都在这个变量的 scope 中。

async fn blocks() {
    let my_string = "foo".to_string();

    let future_one = async {
        // ...
        println!("{my_string}");
    };

    let future_two = async {
        // ...
        println!("{my_string}");
    };

    // Run both futures to completion, printing "foo" twice:
    let ((), ()) = futures::join!(future_one, future_two);
}

下面的例子中，async move 块会捕获 my_string 的所有权，并且移动到 async move 块生成的 Future 中。这允许了这个 Future 的生命周期比 my_string 原来的生命周期更长。

fn move_block() -> impl Future<Output = ()> {
    let my_string = "foo".to_string();
    async move {
        // ...
        println!("{my_string}");
    }
}

async 的 Send 性质

如下的代码无法通过编译

async fn foo() {
    let x = NotSend::default();
    bar().await;
}

需要修改为

async fn foo() {
    {
        let x = NotSend::default();
    }
    bar().await;
}

原因是编译器还不是足够智能，以至于能够识别 x 的实际生命周期并不会跨越 .await 调用。

async 实现

因为在 async 实现中会产生自引用结构，所以需要用 Pin，那什么是自引用结构？为什么 async 中会存在这种结构呢？首先得从 async 的实现讲起。

普通情况

看下面代码，f1 和 f2 两个 await 之间是串行的，那么编译器如何生成 f.await 的代码呢？

let f = async move {
    f1.await;
    f2.await;
}
f.await;

如果是用 Then 的方式，那么就通过回调实现，但这里 Rust 使用了状态机的方式，即编译器会生成类似下面的代码。AsyncFuture 实际上

struct AsyncFuture {
    fut_one: FutOne,
    fut_two: FutTwo,
    state: State,
}

enum State {
    AwaitingFutOne,
    AwaitingFutTwo,
    Done,
}

impl Future for AsyncFuture {
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> {
        loop {
            match self.state {
                State::AwaitingFutOne => match self.fut_one.poll(..) {
                    Poll::Ready(()) => self.state = State::AwaitingFutTwo,
                    Poll::Pending => return Poll::Pending,
                }
                State::AwaitingFutTwo => match self.fut_two.poll(..) {
                    Poll::Ready(()) => self.state = State::Done,
                    Poll::Pending => return Poll::Pending,
                }
                State::Done => return Poll::Ready(()),
            }
        }
    }
}

为什么 async 实现会涉及自引用结构？

在之前，已经讨论过编译 async 的普通情况。考虑下面的代码，应该如何编译呢？

async {
    let mut x = [0; 128];
    let read_into_buf_fut = read_into_buf(&mut x);
    read_into_buf_fut.await;
    println!("{:?}", x);
}

因为在 await 时可能发生线程切换，所以需要将 x 也转移到生成的 AsyncFuture 中。那么 read_into_buf 就会产生一个指向 x 的引用。如果在一个结构中，某个字段是指向另一个字段的引用，这就是一个自引用结构。

struct ReadIntoBuf<'a> {
    buf: &'a mut [u8], // points to `x` below
}

struct AsyncFuture {
    x: [u8; 128],
    read_into_buf_fut: ReadIntoBuf<'what_lifetime?>,
}

自引用结构

如下自引用结构中，b 是一个指向 a 的引用

struct Test<'a> {
    a: String,
    b: &'a String,
}

但很遗憾，Rust 现在不支持自引用结构，导致下面的代码会报错，之前在 lifetime 部分也提过。

fn main() {
    let a = String::from("Hello");
    let _test = Test { a, b: &a };
}

作为 workaround，又得用裸指针。但这玩意有个问题，它的地址是绝对的。当 Test 被移动了，b指向的地址并不会变化。这就好比反过来的刻舟求剑，我们希望 b 是一个刻在船 Test 上的地址，但实际上它是个 GPS 坐标。

struct Test {
    a: String,
    b: *const String,
}

then 用法

在没有 async 和 await 时，可以使用 then 系列的用法。

取消安全(cancellation safety)

https://developerlife.com/2024/07/10/rust-async-cancellation-safety-tokio/

在异步 Rust 中，任何 .await 都可能被取消。当 Future 被丢弃（例如任务超时、select! 中被竞争掉）时，Future 的执行会立即中止。这里中止表示：

执行器再也不会去 poll 它
该 future 及其内部所有 .await 点构成的状态机被立即销毁
编译器自动插入的 Drop 代码会一路回收资源

这里存在了 C++ 和 Rust 关于 Future 的一个巨大不同。C++ 中的 future 是执行结果的句柄，丢弃它并不会导致计算终止。但是 Rust 中的 Future 就是异步运算本身，丢弃它会提前终止任务。

取消安全(cancellation safety)的意思就是：当等待被取消时，锁的内部状态不会损坏，不会导致死锁或资源泄漏。

以 select! 为例，它的作用是第一个 ready 的 future 胜利，其他的 future 会被 drop 掉，因此停止执行。

loop {
    tokio::select!{
        branch_1_result = future_1 => {
            // handle branch_1_result
        },
        branch_2_result = future_2 => {
            // handle branch_2_result
        },
        // and so on
    }
}

假设这里的 future_1 是一个超时事件，而 future_2 是某个异步 io 任务。如果 future_1 胜利，而 future_2 还在执行的过程中，或者在 await 别的事件，则可能会有问题。

tokio 库

tokio::RwLock

RwLock 借助了 Semaphore。

pub struct RwLock<T: ?Sized> {
    // maximum number of concurrent readers
    mr: u32,
    //semaphore to coordinate read and write access to T
    s: Semaphore,
    //inner data T
    c: UnsafeCell<T>,
}

tokio::Semaphore

在 Semaphore 中维护一个 waiter 的链表。另有一个 atomic counter 记录了可用的 permit 的数量。如果 semaphore 持有的 permit 的数量不够，那么这些 task 就会将自己的 waker 添加到队列的末尾。当新的 permits 可用了，那么它们就会从队列头部开始分配。

因为 waker 添加到尾部，并且从头部弹出，所以 semaphore 是公平的。Tasks trying to acquire large numbers of permits at a time will always be woken eventually, even if many other tasks are acquiring smaller numbers of permits. This means that in a use-case like tokio’s read-write lock, writers will not be starved by readers.

// semaphore.rs
pub struct Semaphore {
    /// The low level semaphore
    ll_sem: ll::Semaphore,
}

// batch_semaphore.rs
pub(crate) struct Semaphore {
    waiters: Mutex<Waitlist>,
    /// The current number of available permits in the semaphore.
    permits: AtomicUsize,
}

类型系统

数组(Array)和切片(Slice)

数组的签名是 [T;N]，和 C++ 一样，数组类型中包含了它的大小，是编译期常量。数组是否是 Copy/Clone取决于其内部的类型，但如果使用 [x, N] 创建数组，则 x 对应的类型必须是 Copy 的。数组引用 &[T;N] 可以转换为切片引用 &[T]

与之对应的是切片 &[T] 和 &mut [T] 。

切片(Slice)的方法

Sized、!Sized、?Sized 和 DST

Dynamically sized type(DST)，即动态大小类型，表示在编译阶段无法确定大小的类型。

如果一个类型在编译期是已知 Size，并且 Size 固定不变的，那么它会自动实现 trait Sized。但有些类型是无法在编译期确定大小的，例如：

str 的大小是未知的
所以一般通过 &str 来访问它。
一个 trait 的大小也是未知的。

如果一个类型的大小是未知的，那么它的使用会有限制，例如不能写 Vec<T>，而只能将 T 放到 Box 里面，做成 Vec<Box<T>>。

胖指针

胖指针指的是指向 DST 的引用或者指针。
一个 Slice 是 DST，那么指向 Slice 的指针就是胖指针

assert_eq!(size_of::<u32>(), 4);
assert_eq!(size_of::<usize>(), 8);
assert_eq!(size_of::<[u32; 2]>(), 8);
assert_eq!(size_of::<&u32>(), 8);
assert_eq!(size_of::<&[u32; 2]>(), 8);
assert_eq!(size_of::<&[u32]>(), 16);
assert_eq!(size_of::<&mut[u32]>(), 16);

可以看到，&[u32]具有两倍大小，原因是其中还储存了一份长度，如下所示

struct SliceRef { 
    ptr: *const u32, 
    len: usize,
}

不能直接把变量绑定到一个 DST 上，因为编译器无法计算出如何分配内存。例如经常使用 &str，但基本见不到 str。

特别强调，指向数组的指针也是胖的。考虑到 C++ 允许直接使用 delete 析构 POD 数组，这也是和 C++ 部分一致的。

1 2	assert_eq!(size_of::<mut[u32]>(), 16); assert_eq!(size_of::<const[u32]>(), 16);

除了 Slice，trait object 也是 DST，它还包含了一个 vptr，将在后面讨论。

struct TraitObjectRef {
    data_ptr: *const (),
    vptr: *const (),
}

Rust 的胖指针和 C++ 的 thin pointer

从内存布局上来讲：

Rust 将 vtable 指针放到指针上面
从访存模式上，是二级的，即 vtable_ptr -> vtable。
1
2
3
4
struct FatPtr<T> {
pub vtable_ptr: *const VTable,
pub t: *mut T,
}
C++ 将 vtable 指针放到对象实例上面
从访存模式上，是三级的，即 ptr -> vtable_ptr -> vtable。
1
2
3
4
struct DynamicStruct<T> {
const VTable * vtable;
T* t;
};

从内存开销上来讲：

因为指针更容易被复制，存储在不同地方，所以 Rust 的 Fat Pointer 的内存开销会更大。
另一方面，C++ 的虚表位置离访问者更远，所以只要类中有定义虚函数，那么就需要在实例中存放一个虚表指针，不清楚编译器有没有什么优化。而 Rust 则只需要在真正需要 dynamic dispatch 的时候，才带上虚表的指针。

从功能上讲：

C++ 的方式可以实现多继承
因为对象中可以比较方便存放多个虚表指针

1
2
3

struct B1 { virtual void f1(); long b1; };  
struct B2 { virtual void f2(); long b2; };  
struct D : B1, B2 { void f1() override; void f2() override; long d; };

对应虚表

D object (32 B)
+--------------------+
| vptr for B1        |  → 指向 D-in-B1 虚表
+--------------------+
| B1::b1             |
+--------------------+
| vptr for B2        |  → 指向 D-in-B2 虚表
+--------------------+
| B2::b2             |
+--------------------+
| D::d               |
+--------------------+

Rust 的方式，struct 仍然可以保持为是 POD 的

Rust 中的字符串

Rust 中的字符串是很好的比较数组和切片的工具。和 C++ 一样，Rust 有两种字符串：

str
str 是 Rust 的原生字符串类型。因为是 DST，所以通常以 &str 出现。
String
String 类型可以随时修改其长度和内容。

str

&str 相关方法实现在 str.rs 的 impl str 中。通过 .as_ptr() 将其转换为一个 *const u8 指针，通过 .len() 获得其长度。

字符串字面量的类型是 &'static str。

&str 和 &[u8] 可以互相转换。

String

略

struct

tuple struct

Int 是一个别名，Interger 是一个新的类型。这种形式称为 tuple struct。

1 2	type Int = i32 struct Interger(u32)

ZST

在 C++ 中，会接触到这样的类型

struct ZST {
};

sizeof(ZST) // 1

assert(&ZST() != &ZST())

在 Rust 中

1 2	struct A; println!("{}", std::mem::size_of::<A>());

在 Rust 中 ZST 实例的地址是什么呢？

never类型和!

诸如return、break、continue、panic!()、loop 没有返回值的，或者说返回值类型是 never 即 !，对应到类型理论中就是 Bottom 类型 never 类型可以转换为其他任何类型，所以在诸如 match 中才能下入如下代码而不会产生类型错误

1	None => panic!

如下的发散函数也没有返回值，因此也具有never类型

fn foo() -> u32
{
    let x: ! = {
        return 123;
    }
}

类型推导

通过 turbofish 可以辅助推导，下面列出一些例子

x.parse::<i32>()
[
    AdminCmdType::CompactLog,
    AdminCmdType::ComputeHash,
    AdminCmdType::VerifyHash,
]
.iter()
.cloned()
.collect::<std::collections::HashSet<AdminCmdType>>()
// can also use std::collections::HashSet<_>

trait

trait 类似于 Haskell 中的 typeclass。

trait和adhoc多态

见笔记

关联类型

关联类型(associated types)是一个将类型占位符(也就是下面的type Output)与trait相关联的方式。

考虑如果某个类型impl了trait Add，那么它可以接受一个RHS类型的右操作数，并返回Output类型的结果。

pub trait Add<RHS, Output> {
    fn my_add(self, rhs: RHS) -> Output
}
impl Add<u32, u32> for u32 {
    fn my_add(self, ths: u32) -> u32 {
        self + rhs
    }
}

但考虑到trait Add可以接受的RHS可能是多种(例如对String而言可以接受String和&str)，但返回的Output类型是确定的，所以可以将Output类型从由用户指定改为由实现方指定。此时就可以定义一个关联类型type Output。

pub trait Add<RHS = Self> {
    type Output;
    fn add(self, rhs: RHS) -> Self::Output
}

trait的继承

struct不能继承，但是trait可以继承。

这里涉及到泛型约束的问题，例如我们impl的是两个Father的交集还是并集呢？

1
2
3

trait Son: Father1 + Father2 {
}
impl <T: Father1 + Father2> Son for T {}

在这里Father1和Father2是取的交集，也就是说对所有实现了Father1和Father2的T实现Son。

孤儿规则(Orphan Rule)

注意，孤儿规则是对 trait 而言的。而 impl 和 struct 就不能出现在不同的 crate 中，否则会报错 “cannot define inherent impl for a type outside of the crate where the type is defined”。

泛型约束

例如我们实现sum函数，它只能接受泛型参数T是实现了trait Add的。可以这样写

1	fn sum<T: Add<T, Output=T>>

因为使用了关联参数，所以还可以简写成这样

1	fn sum<T: Add<Output=T>>

如果要写的比较多，可以把里面的东西拿出来，用where来写

静态分发和动态分发

静态分发和动态分发是对 trait 而言的。
下面是静态分发，为 fly_static::<Pig> 和 fly_static::<Duck> 生成独立的代码。这类似于 C++ 里面的模板实例化。

1
2
3

fn fly_static<T: Fly>(S: T) -> bool {

}

下面是动态分发，在运行期查找 fly_dyn(&Fly) 对应类型的方法，例如实际传入的是 &Duck 还是 &Pig，是不一样的。这类似 C++ 里面的动态绑定，是有运行时开销的。

1
2
3

fn fly_dyn(S: &Fly) -> bool {

}

问题来了，这里的&Fly是啥呢？实际上这是后面讨论的trait对象。

trait 的常见问题

在 impl 中调用 trait 的默认实现

实现 dynamic cast

简单来说就是 PSElementWriteBatch 实现了 trait ElementWriteBatch。一个 PSElementWriteBatch 知道怎么 merge 一个 PSElementWriteBatch，但一个 PSElementWriteBatch 似乎很难知道如何 merge 一个 Box<dyn ElementWriteBatch>，除非 Box<dyn ElementWriteBatch> 提供一个 as_any()，然后在 merge 里面把这个 Any 手动 downcast 成 PSElementWriteBatch。

trait作为存在类型(Existential Type)

存在类型，又被称为无法直接实例化，它的每个实例是具体类型的实例。
对于存在类型，编译期无法知道其功能和 Size，目前 Rust 使用 trait object 和 impl Trait 处理存在类型。

trait object

如下所示，fly_dyn 中的 &Fly 参数就是一个 trait object。

1
2
3

fn fly_dyn(S: &Fly) -> bool {

}

TraitObject 可以看成具有下面的组织结构

#[repr(C)]
#[derive(Copy, Clone)]
#[allow(missing_debug_implementations)]
pub struct TraitObject {
    pub data: *mut (),
    pub vtable: *mut (),
}

vtable 中包含了对象的析构函数、大小、对齐、方法(也就是虚函数指针)等信息。

Rust 中有个常见的 E0035 错误，和 trait 的对象安全有关。具体来说，只有对象安全的 trait 才可以作为 trait object 来使用，否则只能作为一般的 trait 来使用。
具体指下面几点：

该 trait 的 Self 不能被限定为 Sized
该 trait 的所有方法必须是对象安全的
1. 方法受 Self: Sized 约束
2. 不包含任何泛型参数
3. 第一个参数必须为 Self 类型，或者可以解引用为 Self 的类型
4. Self 不能出现在出第一个参数之外的地方，包括返回值中

在 E0038 错误中，列出了一些破坏上述原则的情况。
比如 T 中定义了带有 type parameter 的接口函数 cb，那么 T 就不能以 trait object 的形式使用。这是因为 trait object T 的虚表无法穷尽 F 取不同类型的值的时候的情况。

pub trait T {
    fn get_value_cf<F>(&self, cf: &str, key: &[u8], cb: F)
    where
        F: FnOnce(Result<Option<&[u8]>, String>),
}

// Error like the following
// for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically

现在只能将 cb 改为动态分发形式

1 2	fn get_value_cf(&self, cf: &str, key: &[u8], cb: Box<dyn FnOnce(Result<Option<&[u8]>, String>)>) fn get_value_cf(&self, cf: &str, key: &[u8], cb: &mut dyn FnOnce(Result<Option<&[u8]>, String>));

但这样会出现另一个问题。一个 S: impl T 可以通过 &S 的方式传给 &dyn T 的参数。但反过来，一个 S: Box<dyn T> 就没法传给 impl T 的参数。

trait T {}

struct S {}

impl T for S {}

fn call_x(t: impl T) {}

fn call_y(t: &dyn T) {}

fn main() {
    let s = S {};
    let t : &dyn T = &s;
    call_x(t);
    call_y(&s);
}

在实践中，还观察到出现有 “Cast requires that variable is borrowed for ‘static” 这样的报错。在 demo 中如果删掉 impl RaftStoreProxy 和 impl RaftStoreProxyEngineTrait for RaftStoreProxyEngine 里面的 + '_，就能复现报错。解决方案就是加上 + '_。原因是 dyn Trait 会被默认为是 dyn Trait + 'static。我们需要显式指定另一个生命周期。

impl Trait

在目前的版本中，不能在 trait 中返回 impl Trait，也就是下面的代码无法编译。只能使用 Trait Object。

pub trait Vehicle {
    fn next(&self) -> impl Vehicle;
}

// `impl Trait` not allowed outside of function and inherent method return types

标准库

连接

默认情况下 Rust 编译时会 link 标准库，通过添加 no_std 属性可以关闭这个行为。

字符串

字符串相关的结构之间的转换

包括str、String、&[u8]、Vec<u8>。

macro 宏

macro 的 import 和 export

macro 有两种 scope，textual scope 和 path-based scope。这里的 path 有专门的定义，可以理解为类似 crate::a::b 或者 super::a::b 这样的东西。

use lazy_static::lazy_static; // Path-based import.

macro_rules! lazy_static { // Textual definition.
    (lazy) => {};
}

lazy_static!{lazy} // Textual lookup finds our macro first.
self::lazy_static!{} // Path-based lookup ignores our macro, finds imported one.

在通过 macro_rules! 定义了 macro 之后，进入 textual scope，直到退出外层的 scope。这就类似于通过 let 定义变量一样。如果定义多次，那么老的 macro 会被 shadow 掉。

如代码所示，在 mod.rs 中声明了 m 后，pub mod a，于是在 a 中也能使用 m 了。这是因为这里是 textual scope，a 也在 mod_macro 这个 scope 下面。也就是说 textual scope 可以进入子 mod，甚至穿越多个文件

使用 #[macro_use] 可以将 mod inner 中的 macro 暴露给外部。#[macro_use]甚至可以从另一个 crate import 指定的或者所有的 macro，如下所示：

#[macro_use(lazy_static)] // Or #[macro_use] to import all macros.
extern crate lazy_static;

lazy_static!{}
// self::lazy_static!{} // Error: lazy_static is not defined in `self`

#[macro_use]需要和#[macro_export]配合使用。#[macro_export]的作用是将 macro 的声明放到 crate root 中，这样就可以通过 crate::macro_name 来访问。
下面的代码中，helped 是定义在 mod mod_macro 中的，但它被 export 到了 crate root。所以我们可以通过 crate::helped 来访问。

macro 语法

基本定义

查看定义，MacroRule 就是一个被 match 的 pattern，它支持三种括号。

MacroRules :
   MacroRule ( ; MacroRule )* ;?

MacroRule :
   MacroMatcher => MacroTranscriber

MacroMatcher :
      ( MacroMatch* )
   | [ MacroMatch* ]
   | { MacroMatch* }

所以可以写如下所示的代码

macro_rules! add {
    {$a:expr,$b:expr,$c:expr} => {
        $a+$b
    };
    [$a:expr,$b:expr] => {
        $a+$b
    };
    ($a:expr) => {
        $a
    }
}

pub fn main() {
    println!("{}", add!{1, 2, 3});
    println!("{}", add![1, 2]);
    println!("{}", add!(1));
}

重复

如下的代码是两种对列表求和的方案。
在 MacroTranscriber 中有个结构 $(+$a)*，表示给列表的每个元素前面都加上一个 +。

macro_rules! add_list {
    ($($a:expr),*) => {
        0
        $(+$a)*
    }
}

macro_rules! add_list2 {
    ($a:expr) => {
        $a
    };
    ($a:expr,$($b:expr),+) => {
        $a
        $(+$b)*
    };
}

pub fn main() {
    println!("{}", add_list2!(1,2,3));
}

TT munchers

TT munchers指的是$($tail:tt)*这样的结构，它永远可以捕获到还没有被 macro 处理的部分。通过该结构可以“递归”调用 macro。
所以可以得到第三种求和方案。

macro_rules! add_list3 {
    ($a:expr) => {
        $a
    };
    ($a:expr,$($tail:tt)*) => {
        $a+add_list3!($($tail)*)
    };
}

对 MacroMatch 的详细说明

MacroMatch :
      Tokenexcept $ and delimiters
   | MacroMatcher
   | $ ( IDENTIFIER_OR_KEYWORD except crate | RAW_IDENTIFIER | _ ) : MacroFragSpec
   | $ ( MacroMatch+ ) MacroRepSep? MacroRepOp

MacroRepSep 能取什么呢？定义如下

1 2	MacroRepSep : Token except delimiters and MacroRepOp

delimiter 是三个括号，Token 基本上啥都可以是了。但可写出 ($($a:expr)>>*) 或者 ($($a:expr)%*) 么？并不能，原因在”Follow-set Ambiguity Restrictions”中有讲到。

metavariable

item
诸如 mod、extern crate、fn、struct、enum、union、trait、macro 这些结构都是 item
expr
block
pat(Pattern)
path
类似crate::a::b这样的东西
tt(TokenTree)

unsafe

unsafe 操作

Rust 哪些操作是需要 unsafe 包裹的呢？

对 *mut T 解引用
注意，取引用是 safe 的
访问全局的 static 对象
访问 union

这也对应了 Rust 的两个机制，所有权(禁止裸指针)和并发安全。

FFI

常见报错

Pure virtual function called。通常是因为对象提前被析构了，导致虚表也被释放了。常常和 invalid memory reference 交替出现。

Fat pointer 和 FFI

Rust 中的 Fat pointer 包含以下几类，在传给 FFI 接口后，会丢失除了地址之外的信息：

切片
切片中包含了长度信息，但传递给 FFI 只会保留起始地址，而丢失长度信息。
trait object
这个在下面会专门讲到

trait object 和 FFI

在处理 Fat pointer 和 FFI 的过程中，trait object 非常头疼。
考虑下面的场景，RaftStoreProxyPtr 位于 FFI 边界，负责向 C++ 端传递上下文 RaftStoreProxy。C++ 端持有 RaftStoreProxyFFIHelper 对象，并通过里面的 fn_handle_get_proxy_status 接口来调用 Rust。在调用时会传入 RaftStoreProxyPtr 作为上下文。Rust 端在收到调用时，将传入的 RaftStoreProxyPtr 从指针转回为 RaftStoreProxy，并调用 RaftStoreProxy 中的方法。

pub trait RaftStoreProxyFFI: Sync {
...
}

struct RaftStoreProxy {
...
}

impl RaftStoreProxyFFI for RaftStoreProxy {
...
}

#[repr(C)]
#[derive(Debug)]
pub struct RaftStoreProxyPtr {
    pub inner: *const ::std::os::raw::c_void,
}

impl RaftStoreProxyPtr {
    pub unsafe fn as_ref(&self) -> &RaftStoreProxy {
        &*(self.inner as *const RaftStoreProxy)
    }
    ...
}

#[repr(C)]
#[derive(Debug)]
pub struct RaftStoreProxyFFIHelper {
    pub proxy_ptr: RaftStoreProxyPtr,
    pub fn_handle_get_proxy_status: ::std::option::Option<
        unsafe extern "C" fn(
            arg1: RaftStoreProxyPtr,
        ) -> RaftProxyStatus,
    >,
    ...
}

impl RaftStoreProxyFFIHelper {
    // forward fn_handle_get_proxy_status to ffi_handle_get_proxy_status
}

pub extern "C" fn ffi_handle_get_proxy_status(proxy_ptr: RaftStoreProxyPtr) -> RaftProxyStatus {
    unsafe {
        let r = proxy_ptr.as_ref().status().load(Ordering::SeqCst);
        std::mem::transmute(r)
    }
}

现在我们想通过 RaftStoreProxyPtr 调用 RaftStoreProxyFFI 的方法，然后由 RaftStoreProxyFFI 动态 dispatch 到 RaftStoreProxy 上。这样的目的是因为 RaftStoreProxy 里面涉及了很多细节实现，我们想解耦掉，提出一个较为干净的 FFI 层。但很快遇到了问题：

没法为 RaftStoreProxyPtr 实现到 RaftStoreProxyFFI 的解引用
显然，这是因为没办法从 *const c_void 转成 trait object。

1
2
3

error[E0277]: the trait bound `c_void: RaftStoreProxyFFI` is not satisfied
the trait `RaftStoreProxyFFI` is implemented for `RaftStoreProxy`
required for the cast from `c_void` to the object type `dyn RaftStoreProxyFFI`

没法为 RaftStoreProxyPtr 直接实现 RaftStoreProxyFFI
这相当于是为 *const c_void 实现一些 trait。不论编译器的报错，单从这个指针我们都无法获得任何的可供后续转发的上下文。
也许可以尝试将 RaftStoreProxyPtr 直接定义为 TraitObject，可是它并不是 FFI 安全的。有一些比较 hack 的做法，比如可以再定义一个 FFI 安全的平行于 trait object 的对象。

最后的做法是避免在 FFI 边界使用 trait object，而是用一个简单的对象，如 RaftStoreProxy。然后让 RaftStoreProxy 持有一个 trait object 即 RaftStoreProxyEngineTrait。

异常

C++ 中异常不能穿透语言边界，并且会产生一个 SIGABRT。
在 v1.cpp 中有

#include <exception>
#include <stdexcept>

extern "C" {
    void c_func() {
        throw std::runtime_error("error!!!");
    }
}

在 Rust 的 main.rs 中有

#[link(name = "v1", kind = "dylib")]
extern {
    fn c_func();
}

fn main() {
    unsafe {
        c_func();
    }
}

经过测试发现，C++ 异常在语言边界会产生如下的错误

1	fatal runtime error: Rust cannot catch foreign exceptions

Layout

FFI 中保持 C++ 和 Rust 端 struct 的 Layout 一致是重要的。我们的实践是使用一个代码生成工具，而不是手动管理。
在下面的代码中并没有使用代码生成工具，结果造成 SSTReaderPtr 和 ProxyFFI.h 中的声明不符合。这导致 C++ 中使用的 struct 比 Rust 中多一个字段。实际我们观察到的现象是在 x86 架构下代码能够照常运行，但是在 arm 架构下就有问题。在 C++ 端传的参数是对的，但是到了 Rust 端解析出来就错了。这里的原因可能是 x86 编译会带上一些 padding，导致错误被掩盖了。

在不同架构上，c_char 的定义可能是 i8 可能是 u8，所以很坑。

FFI 和库

考虑 Rust 动态库会链接到 C++ 上，但他们可能“菱形依赖”同一个库。一般来说，方案是：

Rust 动态库 rename 掉自己的依赖，或者静态打包成 local
Rust 动态库动态加载这些库

但对于诸如 jemalloc 这些库，只能用第二种办法。

测试

cargo test

相比 C++ 的各种测试库，Cargo 直接整合了 cargo test。测试一般分为两种：

单测
一般是某个 mod 下面的 #[cfg(test)] 的 mod。
集成测试
一般是单独的 crate，名字叫做 tests。

说到 test feature，有一个坑点。考虑集成测试的情况，创建两个 crate：tests 和 raftstore。在集成测试的 tests crate 中开启的 #[cfg(test)]，或者 cargo test 自己带上的 test feature，都不会传递到 raftstore 中。如果有需要，得通过自定义一个 testexport 来传递：

如果 raftstore 需要感知 test 环境，就定义一个 testexport 在自己的 Cargo.toml
tests 的 Cargo.toml 去 enable raftstore/testexport

当然，如果是 raftstore 自己内部的单测，就不需要 testexport 了，所以我们常常看到代码

1	#[cfg(any(test, feature = "testexport"))]

具体进行什么测试，会经过：

package selection

--package 表示只测试某个 package 下面的测试，--workspace 测试 workspace 中的所有测试。
如果不给定任何选项，则会根据--manifest-path。如果在没有给定，则使用当前的工作目录。
如果工作目录是某个 workspace 的根，则运行所有的 default 成员的测试。即 [default-members] 中列出的项目。如果没有列出，对于 virtual workspace 会运行所有 workspace 成员的测试；对于非 virtual，则只运行 root crate 的测试。这里其实有点反直觉，按理说 virtual workspace 一个都不运行比较好。因为比如我哪天将 workspace 改成了 virtual，那么原来的 cargo test 脚本可能就会运行很多的测试。

target selection

如果没有指定 target selection，则TODO

cargo test 打印

1	test_snap_append_restart-0 2022/11/15 16:02:38.526 thread.rs:396: [DEBG] tid 42023 thread name is region-worker::test_snap_append_restart-0

Demo

线程安全的双向链表

Reference

Rust编程之道 by 张汉东
https://course.rs/
Rust 语言圣经
https://learnku.com/docs/rust-async-std/translation-notes/7132
异步rust学习
https://huangjj27.github.io/async-book/01_getting_started/03_state_of_async_rust.html
同样是异步教程
https://huangjj27.github.io/async-book/02_execution/02_future.html
对Future实现的讲解
https://kangxiaoning.github.io/post/2021/04/writing-an-os-in-rust-01/
这个是用Rust写操作系统的教程，这一节讲的是如何移除标准库
https://www.cnblogs.com/praying/p/14179397.html
future的实现，不关注async相关，包含各种组合子
https://cloud.tencent.com/developer/article/1628311
对pin的讲解
https://folyd.com/blog/rust-pin-unpin/
对pin的讲解
https://doc.rust-lang.org/std/pin/
pin的官方文档
https://www.zhihu.com/question/470049587
AsRef/Borrow/Deref的讲解
https://dengjianping.github.io/2019/03/05/%E8%B0%88%E4%B8%80%E8%B0%88Fn,-FnMut,-FnOnce%E7%9A%84%E5%8C%BA%E5%88%AB.html
Fn FnOnce FnMut的区别
https://zhuanlan.zhihu.com/p/341815515
对闭包的论述
https://medium.com/swlh/understanding-closures-in-rust-21f286ed1759
对闭包的说明
https://stackoverflow.com/questions/59593989/what-will-happen-in-rust-if-create-mutable-variable-and-mutable-reference-and-ch
Owner和&mut是否可以同时修改？
https://doc.rust-lang.org/cargo/reference/features.html
对features的论述
https://danielkeep.github.io/tlborm/book/pat-incremental-tt-munchers.html
TT munchers

rustup：Toolchain 管理工具

Override

Cargo：包管理工具

workspace、crate 和 mod

访问 mod

crate内

跨crate

rustc 和 crate

workspace

virtual workspace

crate 内部组织形式

编译与链接

调试信息

条件编译

features

dependency features

optional dependency

feature 的传递

Cargo.toml 解读

Cargo.lock 解读

Cargo 的常见问题

所有权、生命周期

绑定和可变性

let和let mut

Pattern Matching

Variable shadow

移动和借用

移动

引用

只能有一个可变借用，或多个不可变借用

demo

通过移动来实现析构

借用的demo

Clone和Copy

Copy

Clone

所有权相关设施

as_ref/as_mut 和借用

Borrow和AsRef的区别是什么？

cannot infer type for type parameter Borrowed declared on the trait BorrowMut

为什么不能从&mut调用Clone？

ToOwned和Clone的区别是什么？

异常和错误处理

Option

Result

常见组合子

panic

Panic 和 thread

捕获 panic

Exception Safety

double panic

指针和智能指针

raw pointer

Box

trait Deref/DerefMut

Rc/Arc

只读和 &mut

内部可变性(interior mutability)引用

RefCell

RefCell 的 borrow_mut 和 get_mut

Rc+RefCell

Cell

Mutex 可以和 RefCell/Cell 联用么？

GhostCell

Pin

Unpin 和 !Unpin 和 PhantomPinned

Pin 对象的创建方式

使用不安全的 new_unchecked

对 Rc 使用 new_unchecked 也不安全

使用安全的 Box::pin

使用安全的 pin_utils

总结

Pin 和内部可变性

NonNull

生命周期

计算生命周期

声明周期注解

函数

方法

Elision

cannot infer type for type parameter `Borrowed` declared on the trait `BorrowMut`

为什么不能从`&mut`调用Clone？