My Experience of Building a Hybrid Rust/C++ Project

Since April 2025, I have been actively contributing to a new Rust–C++ project. Through this work, I have gained many valuable insights. Although I cannot disclose most project details, there are numerous technical challenges worth discussing.

One of the most notable aspects of this project is that it has been developed alongside the rapid evolution of AI agents, which led us to encounter many pitfalls when practicing vibe coding.

About Vide Coding: The benefits and the pitfalls

Pitfalls of Vibe Coding

In early 2025, at the initial stage of our project, one of our core contributors quickly prototyped a demo using Cursor, covering multiple modules such as the read scheduler, index writer, and meta service.

Traces of this early implementation can still be found in the following pull requests:

Propmt is the code itself

We can treat prompt as code. This is not a new idea in the industry, but many teams apply it unevenly. A common practice is to store prompts as markdown or templates under version control, so they can be reviewed, diffed, and rolled back like any other artifact. Some teams go further and build a “prompt registry” or config service to version prompts outside the codebase, and pair it with evaluation suites that act like unit tests for prompts (golden outputs, A/B runs, regression checks). Others embed prompts directly in application code as constants, which makes deployment easy but tends to hide intent and lose reviewability. The shared direction is clear: treat prompts as first-class assets with explicit structure, reviews, and tests.

I just wrote a PR, https://github.com/pingcap-inc/tici/pull/692/files, where the core file is prompts/0001-gc-cdc.md. That file is the prompt itself, and it is committed with the PR, so anyone can start a session by loading the same prompt. It becomes versioned, diffable, and reviewable like real code, and the team no longer depends on a hidden chat history. Also, we can divide the whole goal into several sub goals and let AI agents to implement different sub goals sequentially or in parallel. And we don’t need to tell the AI the context everytime.

I think this can effectively make the AI agent more focused on what it needs to do, so as to generate bettwer codes with less resources.

Reviewers can write feedback directly in the Reviews section of the prompt doc, and I can pull that back into the next round of vibe. This workflow makes context length much less of a concern because the prompt is the canonical spec. A minimal shape looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Goal

...

## Sub goal 1

In this sub goal, you need to ...

### Programming Style

### Musts

### Tests

## Reviews

Rust as the language of “Vide Coding Era”

As Rust becomes increasingly adopted in the “vibe coding era”, it does offer stronger guarantees against concurrency and memory errors. However, it is still too early to say that Rust is THE ONE.

Such an AT-Agent native language will likely consist of at least three distinct sub-languages:

  • One for expressing intent
    This is the most critical language, because developers need a more efficient way to understand what AI agents have actually done.
  • One for validating correctness
    The language corresponds to the intent language, and is designed to describe test workflows more efficiently. Developers can fine-tune this part of the code to guide AI agents to generate correct code for corner cases.
  • One for concrete implementation
    This language is responsible for concrete implementation. Many AI agents, such as Codex, can already handle this layer well, as they rarely make trivial mistakes. Developers do not need to review this part of the code frequently, as other AI agents can handle the review instead.

Only with this separation can we truly balance readability, robustness, and long-term maintainability. Furthermore, many of the current libraries can be rewritten to be more friendly to AI agent.

Use Skills

I introduced several SKILLs into our new project. For example, ManualTest enables AI agents to execute manual test cases automatically.

Implement tests

Hierarchy of tests

The problem

Because each TiDB component is maintained in a separate repository, breaking changes in one component require coordinated adaptations across multiple repositories. Unfortunately, such adaptations cannot be performed atomically and are often non-trivial. While compilation flags or configuration options can sometimes be used to temporarily disable new features, this strategy is not always applicable. In particular, interface changes such as FFI definitions may break compatibility immediately.

In our project, an end-to-end (e2e) test starts a full cluster and asserts it from a client’s perspective, which in our case means sending SQL queries to the database service. Some of these tests are included in our CI pipeline. However, CI-based e2e tests cannot reliably detect adaptation issues. This creates a classic catch-22: resolving an adaptation problem requires updating all related components, yet the e2e tests cannot pass while you are still fixing the first component. As a result, most e2e tests are deferred to what we call the “daily tests”.

Nevertheless, we still need a subset of e2e tests in the CI pipeline. Although these tests may occasionally produce false positives due to compilation or adaptation issues, they provide valuable systematic checks to ensure that a new commit in one component does not break existing rules or behaviors. Deferring such checks to daily tests would be disastrous, as it makes bugs significantly harder to triage. When issues accumulate over time, the project can easily fall into a “bug jail,” where fixing new problems becomes increasingly expensive.

It is also worth noting that integration tests cannot practically detect all logical bugs. In many cases, module owners write integration tests mainly to verify that their own modules work with others, while overlooking the impact their changes may have on the system as a whole. This issue becomes even more critical when AI agents are used to refactor our code, as we need safeguards to ensure that unexpected behavior does not compromise the foundation of the project.

During the development and PoC stage of our project, several critical issues occurred because the tests were not correctly implemented, including:

  • Module A uses the API of Module B in a wrong way. Neither is there integration test of Module A, nor its scene is covered in the e2e test.
  • Module C fails to verify a corner case, which is later caught by my embedded e2e test (introduced later). This kind of error is easy to be ignored, because it passes all tests except one. However, that single test protects our system from an availability failure caused by a deadlock in Module C.
  • Another component changed its convention for constructing a field in an RPC request without informing us, which caused the system to malfunction at the SQL layer and made the issue difficult to investigate. This problem was also detected by my embedded e2e test.

The embeded e2e test

This idea is based on the observation that a component’s behavior is defined by how it communicates with other components, through RPC, FFI, shared memory, and similar mechanisms.

Therefore, mocking these communications in integration tests provides the following benefits:

  • We don’t need to start a full cluster, so we won’t face the adaption problem.
  • If an adaptation issue occurs, it can be easily reproduced at this level. This not only simplifies the debugging process, but also increases our confidence in the code.
  • This test treats our program as a black box, which makes it easier to implement because we do not need to understand how each module is implemented. These tests are expected to remain stable unless the interfaces or communication frameworks change.

Tests as the Backbone of Vibe Coding

In a Vibe Coding workflow, tests become the primary communication channel between intention and code. Among all types of tests, the embedded end-to-end (e2e) tests play a more and more important role.

Unlike unit tests, which specify local behavior, or integration tests, which usually verify a limited subsystem, my embedded e2e tests define system-level behavioral contracts. They describe what the system should do rather than how it should do it. This makes them naturally aligned with Test-Driven Development (TDD): they serve as executable specifications that drive the implementation.

Systematic choices

Thread or coroutine?

Benefits of using tokio:

  • Smaller memory cost, so we can create more coroutines.
  • Context switch is faster because there is no syscall.

Pitfalls of using tokio:

  • We cannot control the scheduling strategy of tokio’s runtime. For example, we cannot assign a priority to a specific task, nor can we limit the CPU quota of a particular class of tasks.
  • Switching to async code is often painful, as even the simplest function may become suspendable due to the use of tokio::sync locks.
  • It is hard to investigate deadlock / starvation problems.
  • Hard to use itertools. futures::stream can help, but it generates complex types.

Use seperated Runtime for different task pool?

Runtime can only be created outside the “async context” of tokio. So if we need to use tuned Runtimes, we have to create them in advance. This involves lots of refactors.

Propagate the panic outward

We must pay attention to panics inside the actor’s message loop: the handler, whether a thread or a coroutine, will only surface the panic when it is eventually joined, by which time the failure may have gone unnoticed for too long. What I recommend is to:

  • Employ the panic_hook to capture the exact scene where things go wrong.

    1
    2
    3
    4
    panic::set_hook(Box::new(|info| {
    eprintln!("Task panicked: {}", info);
    println!("Task panicked: {}", info);
    }));
  • Eliminate unwraps and expects

    1
    2
    #![cfg_attr(not(test), deny(clippy::unwrap_used))]
    #![cfg_attr(not(test), deny(clippy::expect_used))]

Shared Memory or Actor model?

If we use the coroutine runtime, we may need to decide how to handle race conditions.

Why are “deadlock”s so hard to diagnose when using coroutines?

  1. There is neither a wait-for graph in the coroutine runtime nor one in the OS
    await does not block a thread, so we can’t find anything with gdb/strace/perf.
    Meanwhile, these “deadlocks” are hard to be detected because they appears that there is no CPU, no blocking thread, and the program is in a “vegetative state”.
    Coroutine frameworks like tokio provides some o11y tools, however, they are hard to use, and have performance overhead.
  2. No actual “deadlock”
    These stalls are mostly “waiting for a train at a bus stop” errors. For example, we may read from a channel which will never be written, which is an easy mistake when we bail on an error without calling .send() first.
    So we recommend to send a Result<T>, and implement a Drop trait that automatically sends Err(Error::DropWithoutReport) as a last-minute remedy.
  3. No actual “stack”
    Coroutines don’t carry a real stack. When they hit an await they yield a continuation, and that continuation may be resumed on the same or a different thread.

tokio::RwLock or std::sync::Mutex?

There is a public belief that we have to always use tokio locks in asynchronous code. However, according to the reference of tokio, it is ok and better to use synchronous locks such as std::sync::Mutex or parking_lot::Mutex.

I’d like to refer to these cases as “atomic access structures”, because they all follow the following patten:

1
2
3
4
5
6
7
8
9
struct Wrapped {
inner: Mutex<String>,
}

impl Wrapped {
pub fn change_inner(&self, s: String) {
self.inner.lock().expect("") = s;
}
}

The key point of this code is to avoid directly exposing the lock itself: we must not allow external callers to access it, and we must atomically release the lock after mutating the protected value. The underlying rationale is that we must not allow a coroutine to “sleep” while holding the lock, as this guarantees that no deadlocks will occur, because:

  • If a coroutine holds the lock, it will not “sleep”, because the code change_inner is structured to avoid calling .await while the lock is held. Moreover, the executor thread will not sleep either, since it is not waiting on any condition.
  • If a coroutine does not hold the lock, it can eventually acquire it, because the current holder will release the lock promptly. And of course, the lock is released before any suspension point.

Implementing an “incomplete” actor mode

In the traditional actor model, each actor node encapsulates its own private data. However, this model is difficult to implement because:

  • To rebalance data across nodes, we must introduce new message types and corresponding handlers.
  • Inspecting the internal state of actor nodes is difficult.

So, as a simpler alternative, we can:

  • Use a concurrent hash map to store all data, with each actor node mutating a portion of the map.
  • Allow other components to read or inspect entries in the concurrent hash map. Such inspectors can not mutate the entries, and their accessment must be atomic.

A preferred candidate for the hash map is DashMap. Although this structure frees us from requiring &mut self, most of its methods return a Ref or RefMut that holds a lock guard, so incorrect usage can lead to deadlocks. The following codes shows a simple example.

1
2
3
4
5
6
7
8
9
10
11
12
13
#[test]
fn test_dashmap() {
let map = DashMap::new();
map.insert(1, 1);
map.insert(2, 2);

for entry in map.iter() {
println!("{} -> {}", entry.key(), entry.value());
map.insert(3, 3);
}

println!("test end");
}

There is a simple yet effective way to detect potential issues in our code: use #[tokio::test] instead of #[tokio::test(flavor = "multi_thread")]. With the single-threaded runtime, the program will fail immediately if a coroutine “sleeps” while holding a lock.

The linking problem

FFI

TODO

How to support TLS?

TODO

Online config change

There are some ways to update configs without restarting the program:

  • For every actor, introduce a new UpdateConfig event, and handle it in the message loop.
  • Using arc_swap.

I don’t think the service itself should persist the updated configuration to the config file. Instead, this should be handled by the operator.