<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Calvin&#39;s Marbles</title>
  
  
  <link href="http://www.calvinneo.com/atom.xml" rel="self"/>
  
  <link href="http://www.calvinneo.com/"/>
  <updated>2026-02-24T13:36:23.908Z</updated>
  <id>http://www.calvinneo.com/</id>
  
  <author>
    <name>Calvin Neo</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>呼和浩特和大同游记</title>
    <link href="http://www.calvinneo.com/2026/02/24/meet-in-hohhot-datong/"/>
    <id>http://www.calvinneo.com/2026/02/24/meet-in-hohhot-datong/</id>
    <published>2026-02-23T17:20:33.000Z</published>
    <updated>2026-02-24T13:36:23.908Z</updated>
    
    <content type="html"><![CDATA[<p>过年期间去了呼和浩特和大同。这是首个依照 codex 做的旅游规划玩的活动。</p><a id="more"></a><h1 id="D0"><a href="#D0" class="headerlink" title="D0"></a>D0</h1><p>我的行程在 <a href="https://github.com/CalvinNeo/CalvinSchedule" target="_blank" rel="noopener">https://github.com/CalvinNeo/CalvinSchedule</a> 中。</p><h1 id="D1-初一"><a href="#D1-初一" class="headerlink" title="D1 初一"></a>D1 初一</h1><p>因为我对象年三十值班，所以我们是定了初一下午的飞机飞呼和浩特。为什么去呼和浩特是因为去长春的飞机票买晚了，所以在选择从大连出发玩东北，还是从呼和浩特出发玩呼和浩特和大同之间的权衡。</p><p>事实证明，呼和浩特是一个挺好的地方，首先它去大同比太原还要方便，其次，它的机场到高铁站很方便，所以我们直接定一个高铁站旁边的酒店就非常舒服了。不过据说呼和浩特的新机场要启用了，因此后面它就没太原爽了。</p><p>因为我们值机比较靠前，所以下飞机是比较快的，然后就直往地铁站跑，赶上了倒数第二班地铁。事实证明坐地铁还是对的，因为后面我们点了个鼎鼎眼镜烧烤，发现等了好久都没有骑手接单，最后还是加钱人家才送的。</p><h1 id="D2-初二"><a href="#D2-初二" class="headerlink" title="D2 初二"></a>D2 初二</h1><p>今天起来还是很感冒，还是流鼻涕。</p><p>早上 8 点，我们的小团才接我。这也是因为我们住在呼市最东边的缘故，所以不用起那么早。车上除了我全是女的，所以我就坐在副驾驶了。这个副驾驶的头枕非常靠后，所以我睡起来很不方便。</p><p>第一站是辉腾草原上面的马场。这个马场除了骑马和卡丁车（实际上没人玩）啥都没有，所以尽管我们没有定骑马的套餐，所以逼到最后还是只能骑马。那边报价 140，而淘宝只要加 120，所以我们用淘宝价还价骑了马，然后又加了 120 骑了快马。<br>大概流程就是先上一匹小马，然后几匹马一起被牵着走到跑马场。到那边老板就问要不要骑快马，如果骑，就换到大马身上。然后老板就会同时拉着你的缰绳和自己的缰绳把马跑起来，嘴上还嘚嘚嘚嘚的。一开始马是小跑，这个时候你得用腿用力夹住马的肚子。不过后面马就会撒开来快跑了，这个时候，就是颠屁股了，感觉整个人都要散架，必须要拉着马鞍才会稳一点，然后就跑完了。</p><p>总的来讲感觉价格偏贵。体验上的话，快马还是可以的，不过时间也很短。当然，再长一点，我呛风也受不了了。然后这地方的旱厕也是堪称一绝，里面的屎冻成了棍子，我也是第一次见。导游说你就路边上尿吧，我寻思这么大的风，不吹到身上么。</p><p>然后就是去吃饭，开车到了右翼中旗县城里面，就开了那么几家店。我们选了一家，我对象点了个羊杂，点了个驼饼。我吃了下，感觉羊杂还挺香的，驼饼感觉就是普通的肉饼吧，然后查了下，说是骆驼肉。</p><p>吃完饭，就去火山了，这个也是要开一大段路的。那一块其实有很多火山，有的火山形状像草帽山，我把它叫土帽山。不过我们实际去逛的那个叫南炼丹炉，是一个平顶的火山。相比我们在冰岛看到的，这个火山确实有点火山的感觉。</p><p>从火山回来，就是长达三个小时的回程了。我们让司机帮我们送到宽巷子，这样就可以直接吃东西了。下车就能看到杨老大焙子店，结果没开门。旁边的星月排满了队，所以我们就去清和园了。这边买芝士奶饼好多都是几十个买的，我们只买了两个感觉很亏。然后我们就去珠萨拉买了几个酸奶糕，是冰着的，感觉挺好吃。</p><p>吃完就准备打车去泽成冰煮羊了，这个店在附近有两家。我决定去万象城那一家，因为那边等的时候至少可以逛一逛。事实上感觉也是对的，因为那天晚上我感冒贼难受，加上又很干，嗓子不舒服。</p><h1 id="D3-初三"><a href="#D3-初三" class="headerlink" title="D3 初三"></a>D3 初三</h1><p>今天需要早起去大同。因为我们在火车站旁边住，所以提前一个小时起床绰绰有余了。内蒙古的火车会说什么内蒙古好羊肉，内蒙古羊肉好，非常洗脑。反正一路上也是比较困，睡睡醒醒的。<br>从一路上看，内蒙古和山西确实不太一样，首先，内蒙古真的有很多风车。然后，内蒙古感觉是草原的感觉，是有黄草的，但山西的话会更有黄土高原的感觉。</p><p>下了大同站，想要拉屎。我对象不在车上拉，结果下了车就要排队。其实始发站的火车，我一般是喜欢在火车上拉屎的，因为一般都很干净。拉完屎还有五分钟，就跑到直通车那里，准备去云冈石窟。上了车，开个大概有一个多小时才到，真的是非常非常堵了。</p><p>云冈石窟人非常多，主要体现在：首先，它景区检票口开了好几个，但居然都要排队。其次，是它的讲解也要排队，我们想想还是旅途随身听算了。反正进去之后，就绕了一圈，看了一下它的一个寺庙的前中后殿，然后过了桥，就是石窟的主体。</p><p>因为我之前在车上已经预习过了，所以我大概知道这些石窟大概分为几个系列，然后哪些是比较精华的。反正一开始进去的第一窟和第二窟都是比较惨烈的，因为它们的开口很小，所以基本人只能站在石阶上往里面伸着头看，我似乎就看到几个石柱子，感觉没啥意思。</p><p>第五窟和第六窟是一起的，实际上我们排最长的也是第六窟，因为第五窟当时已经维修了。保安说要排一个多小时，但实际排了四十几分钟就进去了。这个窟实际上是有前窟和后窟的，云冈石窟大部分也都是这个式样。并且相比于莫高窟或者龙门石窟，云冈石窟的颜色没怎么掉，所以看起来更鲜艳，据说这个可能是因为之前有人补过颜色吧。</p><p>琥珀知道我们到了大同，就想跟我们面面基。不过他显然对大同的人流没有预期，搞了半天说要吃紫泥，结果我们去古城紫泥拿了个号，发现要等 290+ 桌。不过我们倒是在古城的凯鸽也排了个号，只有 29。不过考虑到它一直不叫号，并且古城也没啥可以逛的（人特别多），所以我们就准备出去找点东西吃。经过一系列的电话咨询，我们发现紫泥在南环路的一家店还是可以取号的，所以就骑自行车过去。另外，我们一进古城就发现东门排着队，应该是上城墙准备去看花灯的，所以我们也果断不去看灯展了。</p><p>骑车过去的路也是比较奇葩，主要是出了古城城门需要往右拐，结果交警让我们走上面走，结果骑着骑着就骑到了永泰门广场上了。这个广场是被拦着的，自行车没法进，我们也下不去。结果我们只能闯过一片烂泥地强行进去，然后又在前面的栅栏的地方，把自行车搬了出去。骑到紫泥那里，是个直梯，上去之后果然全是人，反正是取了个号，然后打算先去找点东西垫垫肚子。先骑车去旁边的和笙财，点了个沙棘冰淇淋，这个冰淇淋是真好吃。然后发现旁边就是老柴削面的总店，但是我对象还是想吃喜晋道，所以就骑车去旁边的喜晋道，中途路过大同一中，感觉这个学校真的是富丽堂皇啊。进去喜晋道，他家是一个非常大的门面，装修非常漂亮，但是一进去，就说没位置了，并且都不发号了。于是就灰溜溜排老柴削面。再过去，就发现老柴削面外面都全是人了。</p><h1 id="D4-初四"><a href="#D4-初四" class="headerlink" title="D4 初四"></a>D4 初四</h1><p>今天继续是早起的一天，我们需要去恒山景区。从酒店到南站有一定距离，还是扫了个车骑了过去。然后，发现百度导航真的是糟糕，直通车明明是车站的另一端，它给标错了。总之赶到了直通车那，说恒山的车堵在高架上，可能要晚点。然后我就去上了个厕所，果然是上厕所定律了，上到一半，车就来了，结果我赶快跑了回去。在车上迷迷糊糊睡到恒山，到了那个游客中心附近又开始堵车了。总之到了下面，就要坐摆渡车，我也看到了之前在小红书上看到的排队标志，所幸今天的人没有昨天多，不过悬空寺方向还是挺长的，所以我们决定先去爬恒山了。</p><p>去恒山的车会经过悬空寺，在经过一个隧道之前，能看到悬空寺，并且这是一个很好的从上往下看的机位，建议不要错过，因为返程的时候，需要倒着头向后来看，不是很方便。</p><p>到了恒山脚下，我决定还是坐摆渡车上山，爬上去，然后索道下来。中间我对象买了个帽子，然后就又去排上山的队。今天可能景区被骂优化过了，所以我们在排摆渡车之前就检票了，没有等很久。一辆车走了之后，等了一会，然后突然开过来五辆车，都并排停了。我们上了第一辆车的最后一排，我对象不太乐意，因为她有点晕车。</p><p>上了摆渡车，很快就到真武庙了，从这里就往上爬。我带了个 insta360 的相机，可以看到，全程都很简单，我觉得都没有南京紫金山难爬。中间有个庙，爬上去的石阶比较陡，感觉应该是最困难的部分了，但其实也就那样。然后从庙出来，就是冲顶阶段了，过了一个叫氵麦极门的地方之后，就看到小红书上堵人的地方，我们不出意料也开始堵了。感觉就是从这里开始，台阶变得高了点，所以有的人就要歇歇了，加上石阶又变窄了，所以交通就阻塞了，好在这一段不是很长，很快就到了一个亭子那。那是一个三岔路，往左是索道下山方向，往右是登顶方向。我们登顶，好在那里石阶就宽很多了，基本都可以跑起来，所以跑了大概十几分钟，我也登顶了。总共算上等的时间是六十几分钟吧。总的来说，恒山其实没啥意思，主打一个五岳打卡。然后确实很多人都在峰顶的那个地理标识那边排队拍照，说那边一个人只有 20s 的拍照时间。</p><p>下到缆车的地方，看到人也不是很多，就打算坐缆车，结果下来之后发现别有洞天，还是排了不少的。不过其实是可以忍受的，实际上也就等了不到二十分钟。中间前面的京爷因为不知道什么事情吵起来了，然后又打起来了，然后又吵起来了，反正我们就翻栏杆排到了他们前面，感觉也是个乐子。下山就直奔摆渡车站，去古城的在排队，但是去悬空寺的不需要排队，我们就直接上了。</p><h1 id="D5-初五"><a href="#D5-初五" class="headerlink" title="D5 初五"></a>D5 初五</h1><p>今天算是比较奇妙的一天。首先，昨天晚上睡觉前跟我对象讨论了，说今天早上去应县木塔吧。然后晚一点我就想把直通车的票买了，结果一看，9.30 的票只有一张了。因为木塔下午人多，并且，从木塔回古城更顺，所以我们又只能买早上的，就很尴尬。我还不信邪，又刷了几遍，中间还错买了去恒山的，总归坐直通车是不行了。灵机一动，决定去搜下有没有其他的巴士，结果 902 好像要两个多小时，不过中途发现，应县是有高铁站的，并且高铁站也是有巴士去木塔景区的，所以我就决定买火车票了。结果 12306 晚上关门，没法买票，于是我只能定了早上的闹钟。早上起来，发现买票是需要在系统里面排队的，我等了几十秒发现还没刷出来，就又小眯了会。几分钟又醒了，发现刷了两个不挨着的座位，果断付了款，结果再一开，火车票就卖空了。</p><p>结果上了高铁，发现这趟高铁也不算特别挤啊，不知道在那里票都卖光了。不过我们一站就到了应县西站，然后门口就是蓝色巴士，五块钱就到木塔了。然后发现这个巴士似乎也接路上的村民，并且送到木塔之后，也会继续往前开的。</p><p>因为我们是高铁来的，所以应该比大部队要早，到的时候，木塔没啥人，跟之前小红书上看到的完全不一样。我们进去之后，逛了两次一层，分别是从左边和右边走的，所以对里面的雕塑和浮雕看的都比较清楚。实际上我们去的时候，木塔二层上是有工作人员在跑动的，只是不知道他们在做什么。站在木塔下面，如果角度不对，其实是不太看的出来木塔已经歪了的。但是确实能看到它的暗层，也确实能看到之前被敲掉的泥墙被换成的窗户，也理解那边的本地人为什么要把泥墙砸掉。</p><p>准备离开木塔的时候，风突然大了起来，并且沙子也多了。身边的游客在感叹，这地方还真的是太干旱了，风沙多，结果后来才知道，这是多年没见过的沙尘暴，是从蒙古国吹过来的。总而言之，离开的时候，回望木塔已经是灰黄笼罩的一片了。</p><p>前面那条街还在表演，看了会，就往回走，结果那妖风是越来越大了。走到南门出来的哪个巷子中，推开一个铁门，然后就走到一片很很乡下的地方。前面几个人躲着不肯走，我们过去一看，好家伙，前面黄沙都飞起来了。</p><p>在排队的时候，我要上厕所，结果用百度找了几个厕所，一个都没有用，还是问的别人。后面立即下了高德，发现高德是准的。上完厕所，就直接去晋喜道了，当时已经快排到我们了。我们点了个肉末的面，一个番茄面，几个小菜，鸡爪，串和沙棘汁。小菜方面，我觉得山西的醋确实香，配上大蒜末，所以那个蒜泥肘花就很好吃了。鸡爪感觉就是茶叶蛋的味道。凉拌黄花算是山西特色了，和其他店里没啥区别。肉末面和昨天吃的老柴总店也是比较类似的，不过没有那么咸，更有点汤面的感觉。番茄面比之前在全季早饭吃的稍微好点，主要它不太是那种面是面，浇头是浇头的感觉，相对来说入味点，番茄泥也弄得比较干净。</p><p>因为这两天走的屁股都肿了，所以吃完走到东门打了个车，就直接回去了。</p><h1 id="D6-初六"><a href="#D6-初六" class="headerlink" title="D6 初六"></a>D6 初六</h1><p>今天在大同逛一下博物馆，就可以回呼和浩特了。早上起来退房，打车去博物馆。我们没有抢到预约，所以只能买特展的票。一开始我以为可以在门口给大爷看完就退了，结果发现这只是一个预检票，后面还有一道需要刷二维码的闸机呢。</p><p>进去之后，逛了一会一楼，上了个厕所，出来然后一个女的就问我们要不要拼团讲解，说是 100 块钱，看我们可能要急着走，就说 80 块钱算了，结果我们就定了。中间想先去看一半穆夏的特展，结果人家说只能进去一次，所以就放弃了，又下来。不过好在那讲解员拉客能力比较强，很快就齐活了，她先照着大厅里面的那个壁画讲了一遍，后面我们才知道，这个壁画并不被公开展示。然后我们就又从恐龙那边开始逛了。不过不同的讲解员的路线略有区别，所以我们这次就没有讲那个编织壶。我问了下，那个恐龙为什么那么完整，讲解员说这个叫什么天镇恐龙，当时挖出来也不是完整的，说是考古学家拼的。反正没懂我意思，因为有些恐龙是会缺几个骨头的，但是这几个恐龙是完整的。</p><p>二楼是比较精华的部分，即北魏展厅，因为大同直接是北魏的都城。其实它有两个展厅，讲解员只带我们逛了第一个。第一个中又分为两个主题，第一个主题是一个墓葬里面的发掘物，包含了漆画屏风以及它的附属，包括挂它的架子、柱子、石墩子都展示出来了。然后还有对应的墓志铭，以及一些陪葬的陶俑。第二个主题是丝绸之路，包含一些陶俑、壁画以及玻璃制品。这里的玻璃制品还是很漂亮的，是非常剔透的蓝色。</p><p>三楼是辽金和明清展厅，大同是北魏的陪都。然后，看到了一个非常大的鸱吻，说是从某个大殿上拿下来的。说鸱吻是龙和鲸鱼的后代，它长得就很像鲸鱼的尾巴。</p><p>讲解团解散之后，就去逛了下穆夏的特展。这个人是画板画的，不过眼睛画的很传神。另外，他也会画油画。</p><h1 id="D7-初七"><a href="#D7-初七" class="headerlink" title="D7 初七"></a>D7 初七</h1><p>今天早上打车去了大召寺。这个寺是藏传佛教的，我不太懂，感觉没啥意思。有一个乃琼庙，里面都是骷髅头，很吓人。</p><p>从大召寺出来，就是塞上老街，我们又去吃了泽成冰煮羊，吃完，就打车去内蒙古博物馆。这个车是真离谱，调个头就花了十几分钟，堵得要死，结果我们等了有 20min 才上车。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;过年期间去了呼和浩特和大同。这是首个依照 codex 做的旅游规划玩的活动。&lt;/p&gt;</summary>
    
    
    
    
    <category term="游记" scheme="http://www.calvinneo.com/tags/游记/"/>
    
  </entry>
  
  <entry>
    <title>Vibe 一个桌游模拟器</title>
    <link href="http://www.calvinneo.com/2026/01/30/vibe-open-board-game/"/>
    <id>http://www.calvinneo.com/2026/01/30/vibe-open-board-game/</id>
    <published>2026-01-30T15:09:06.000Z</published>
    <updated>2026-02-06T12:44:47.458Z</updated>
    
    <content type="html"><![CDATA[<p>作为一个桌游爱好者，我打算用 Codex 去 Vibe 一个桌游模拟器，这样我可以尝试自定义规则和 Bot 强度。这篇文章我会介绍我 Vibe 的经验。</p><p>我的项目是 <a href="https://github.com/CalvinNeo/OpenBoardGame%E3%80%82" target="_blank" rel="noopener">https://github.com/CalvinNeo/OpenBoardGame。</a></p><a id="more"></a><h1 id="D0"><a href="#D0" class="headerlink" title="D0"></a>D0</h1><p>作为 Demo 实现了一个掼蛋游戏。 </p><p>发现问题：</p><ul><li>AI 对规则理解非常不正确，例如缺少对三带二、同花顺炸弹的支持，并且也不能顺子的长度和一手牌数量的上限。</li><li>AI 对空间感不熟悉，一些提示文字和牌重合。</li></ul><h1 id="Jan28-Jan29"><a href="#Jan28-Jan29" class="headerlink" title="# Jan28 - Jan29"></a># Jan28 - Jan29</h1><p>在这 2 天中，我大概用了 45 刀左右的额度。完成了 Cabo、骷髅牌、你画我猜、璀璨宝石四个游戏逻辑和 Bot 的开发。并且，我还支持了断开重连、房间管理等机制。我还优化了 UI 的美观度和便捷度。</p><p>AI 会理解错一些点。例如 <a href="https://github.com/CalvinNeo/OpenBoardGame/commit/ab26a9c00cd5d9720b39bf3e248b672881cb52ed" target="_blank" rel="noopener">https://github.com/CalvinNeo/OpenBoardGame/commit/ab26a9c00cd5d9720b39bf3e248b672881cb52ed</a> 这个修复 commit 就展示了 AI 对璀璨宝石最大数量的多次理解问题：</p><ol><li>一开始，它根本没有实现这个限制。</li><li>后面，它实现为只有超过 10 才不能拿，但是从 9 到 12 这个行为是被它允许的。</li><li>最后，才修改对了。</li></ol><p>AI 会漏掉一些情况。例如 <a href="https://github.com/CalvinNeo/OpenBoardGame/commit/c8e82736975325f0b9300b471524ec86b005129e#diff-794e220aafafcfac193a89abd6fb142d92255d544728f217d64e7a1162c79e28" target="_blank" rel="noopener">https://github.com/CalvinNeo/OpenBoardGame/commit/c8e82736975325f0b9300b471524ec86b005129e#diff-794e220aafafcfac193a89abd6fb142d92255d544728f217d64e7a1162c79e28</a> 这个修复 commit 展示了 AI 对用户离开规则的遗漏：</p><ol><li>先前，AI 处理了 in game 的情况。此时如果最后一个活人玩家离开游戏，那么这个房间可以被手动清理掉。</li><li>但是，它漏掉了 in lobby 的情况。此时房间并没有开始游戏，那么玩家离开房间（比如创建一个新的房间）不会导致该房间出于可以被清理的状态。</li></ol><p>经验：</p><ul><li>让 agent 缩小阅读的范围。例如可以告诉它“这是完全的前端修改，你不需要看后端代码或者其他游戏的代码”，这样它就可以更快解决问题，并且能节省不少额度。这随着项目增大是尤为有效的，因为看 thinking 是可以发现它有倾向去学习其他代码是怎么做的。</li><li>让 AI 先整理信息，然后再写一个 design 征求意见非常重要。因为 AI 在实现的时候还是偏向于漏点东西的，这也可能是出于对问题的不正确理解。</li><li>逻辑比较独立的部分，可以让 AI 整理出测试。虽然目前也没看到 AI 会主动修改代码从而 break 掉测试，但这样会更有自信。</li></ul><h1 id="Jan-30-Jan-31"><a href="#Jan-30-Jan-31" class="headerlink" title="Jan 30 - Jan 31"></a>Jan 30 - Jan 31</h1><p>这几天主要实现了出包魔法师、猜狐狸、截码战三个游戏。</p><h1 id="Feb-1-Feb-3"><a href="#Feb-1-Feb-3" class="headerlink" title="Feb 1 - Feb 3"></a>Feb 1 - Feb 3</h1><p>这几天主要实现了角斗士棋、Store&amp;Load、印象花语、AI 画物语四个游戏。主要是由 Gemini 生成游戏说明书，再由 Codex 生成 design。等我 Review 了之后，再实现代码。</p><p>角斗士棋和印象花语中都涉及到了拖拽旋转对象的设计。我发现 AI 在适配手机端上的拖拽是相对比较蠢的，需要手动告诉怎么搞。</p><p>角斗士棋实现起来很简单，但是其实要打磨的很多：</p><ul><li>手机版如何精准地把方块放到格子里面就很难困难。因为每个方块的焦点并不是方块的重心。后来加了上下左右移动的键，但是感觉还是很蠢。</li><li>还有一些奇怪的 bug。</li></ul><p>印象花语这个游戏，我对实体版的一些物理效果进行了模拟：</p><ul><li>因为印章可以印得深浅不一，所以我增加了点按时长来决定透明度的机制。并且我也增加了根据色彩学原理来模拟印章叠加的效果。</li><li>因为可以倾斜印章，使得只盖一半。所以我增加了一个 Mask 机制，可以去 Clip 印章。这个就好像我在按之前垫了一张废纸，这样章可以只盖一半在画布上。</li></ul><p>我觉得这一部分涉及到物理世界的原理的，应该只有人类能意识到并且提出来。</p><p>Store&amp;Load 是非常有意思的点，AI 当时给我提供了一个完全不一样的解决方案。例如它执着于解决 Session 的问题，如何处理同名玩家和同名 Room。但是我提出了一个非常简单的做法，就是完全新建一个 Room，然后新玩家可以进去 Claim 老玩家的 Seat。这样做能够减少很多的复杂度。另外，我还发现新创建的房间并不支持 Auto Save 选择了，这个我觉得也是 AI 本身缺乏引申的能力。</p><p>AI 画物语的实现是跟其他桌游完全不一样的，因为要涉及生成图。所以我开了 <a href="https://github.com/CalvinNeo/aidixitgen" target="_blank" rel="noopener">https://github.com/CalvinNeo/aidixitgen</a> 这个 repo。</p><p>一开始它是用的 pollinations 做的。然后我发现这个工具经常生成水母，然后问了他，他说是提示词的问题：</p><blockquote><p>为什么这个代码里面的 prompt 这么喜欢生成 jellyfish？<br>这是一个非常经典且有趣的 AI 生成现象，通常被称为 “模式坍塌” (Mode Collapse) 或 “安全区偏差”。出现大量水母（Jellyfish）、热气球、云朵、漂浮的岛屿，原因主要有三点：<br>关键词的“刻板印象”： 你的代码里用了 surreal（超现实）、dreamlike（梦幻）、whimsical（异想天开）这些词。 在 AI 的训练数据（Latent Space）里，水母是“梦幻”和“超现实”的超级代表物。它们透明、发光、漂浮，长得就像外星生物，完美符合 AI 对“梦境”的理解。只要你还要“唯美”和“超现实”，AI 就会懒惰地选择水母。<br>Prompt 缺乏约束： 我们在请求 AI 写创意时，给的指令太宽泛了（”Generate ONE single short… description”）。当指令太宽泛时，AI 会倾向于输出它认为“最稳妥、最符合题意”的高概率词汇。对于 Dixit 风格，水母就是那个“最稳妥”的答案。<br>风格修饰词的引导： 我们在画图时强制加了 soft colors（柔和色彩）和 floating（虽然没写但 dreamlike 隐含了漂浮感）。这也极大地增加了生成海洋生物或云端生物的概率。</p></blockquote><p>然后，我又发现它生成的东西太容易重复了，例如几乎生成了两个只是风格不同的钟。所以我后面又设计了双主体的方案，结果最后生成的结果是比较理想的了。</p><p>后来发现 pollinations 一直 502 了，所以就换 hugglingface 了。</p><h1 id="Feb-4-Feb-6"><a href="#Feb-4-Feb-6" class="headerlink" title="Feb 4 - Feb 6"></a>Feb 4 - Feb 6</h1><p>这几天主要实现了前端美化、Flip 7、德国心脏病、绝妙误解。</p><p>主要是由 Gemini 生成游戏说明书以及设计。然后由 Codex 去实现。但是我要求 Codex 在实现前先问我不清楚的项目，而不是自己随便实现一版。事实发现，让 Codex 去问一下自己不知道的，而不代替我做决定，是很重要的。</p><p>AI 前端的主要问题：</p><ul><li>UI 直白<br>  例如用一个列表表示玩家信息。用一个表格表示当前状态。这对玩家而言体感不好，感觉是在上班。</li><li>没有交互设计<br>  特别是手机端玩家，操作的时候需要翻来翻去。</li><li>UI 可能存在 Bug<br>  例如在手机端会发现 Flip 7 的 Game 面板会非常少。</li></ul><p>德国心脏病的开发是非常典型的。主要包含几点：</p><ul><li>得到的开发计划是经典版的 Halli Galli，也就是牌上的水果一定是相同的。这个跟我们玩的不一样，所以后面让它开发了一个 DLC 一样的东西。</li><li>电脑根本没有给翻拍和按铃的等待时间，所以加上 bot 之后，基本 bot 都是秒按铃，秒翻牌，根本没法玩。即使没有 bot，我们也需要考虑人类的反应时间，以及各个网络的延迟。所以我这里要求加了等待 3s 的按铃时间，以及在点击翻拍后，有一个 1s 的倒计时，方便大家准备看新水果。</li><li>AI 生成的界面依然是列表，这个我让 AI 改成了围成一个圆。</li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;作为一个桌游爱好者，我打算用 Codex 去 Vibe 一个桌游模拟器，这样我可以尝试自定义规则和 Bot 强度。这篇文章我会介绍我 Vibe 的经验。&lt;/p&gt;
&lt;p&gt;我的项目是 &lt;a href=&quot;https://github.com/CalvinNeo/OpenBoardGame%E3%80%82&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;https://github.com/CalvinNeo/OpenBoardGame。&lt;/a&gt;&lt;/p&gt;</summary>
    
    
    
    
    <category term="数据结构" scheme="http://www.calvinneo.com/tags/数据结构/"/>
    
    <category term="VibeCoding" scheme="http://www.calvinneo.com/tags/VibeCoding/"/>
    
  </entry>
  
  <entry>
    <title>3FS 学习</title>
    <link href="http://www.calvinneo.com/2026/01/15/study-3fs/"/>
    <id>http://www.calvinneo.com/2026/01/15/study-3fs/</id>
    <published>2026-01-15T15:09:06.000Z</published>
    <updated>2026-01-15T09:26:56.813Z</updated>
    
    <content type="html"><![CDATA[<p>学习 3FS。</p><a id="more"></a><h1 id="设计"><a href="#设计" class="headerlink" title="设计"></a>设计</h1><p>主要概括了 design notes。</p><h2 id="需要解决的问题"><a href="#需要解决的问题" class="headerlink" title="需要解决的问题"></a>需要解决的问题</h2><p>OSS 方面：</p><ul><li>现在的 OSS 并不支持原子地移动一系列文件或者一整个目录，或者递归地删除整个目录。而 3FS 的场景（实际上数据库的场景也是这样）涉及要创建一个临时目录，然后对这个目录写入数据，最后将这个目录 move 到最终的位置。</li><li>3FS 需要广泛使用 symbolic 或者 hard 链接</li><li>提供一个熟悉的文件接口，我理解这也是 3FS 选择 FUSE 的原因</li></ul><p>FUSE 方面：</p><ul><li>在 <a href="/2025/03/09/learn-fuse/">Fuse 学习</a>一文中介绍了为什么 FUSE 不支持 Zero Copy。但这也是 FUSE 的缺点之一。</li><li>FUSE 使用一个由 spin lock 保护的多线程共享的队列。3FS 团队的测试显示，400K 4KiB reads per second 的写入负载之后，因为 lock contention 的方法。</li><li>Linux 5.x 上的 fuse 不支持对一个文件的并发写。所以很多需要更大带宽的程序会并发写多个文件。</li><li>对小的随机非对齐读性能不好，SSD 和 RDMA 网络的带宽没有被利用充分。</li></ul><p>但是将 client 实现为一个 VFS 内核模块则能解决上面说的问题，但也更有挑战性。内核的 bug 更难定位和修复。此外，升级的时候需要停掉所有访问这个 fs 的进程，或者重启。</p><p>因此，3FS 选择在 FUSE daemon 里面设计一套原生的 client，由它来支持异步的 Zero copy IO。其中，File meta operation 例如 open、close 等还是被 FUSE daemon 处理。但是在 open 的时候会把拿到的 fd 通过 native API 注册。然后就可以通过 native client 去读取数据了。</p><p>这个 API 类似于 io_uring，其中关键结构如下：</p><ul><li>Iov<br>  user process 和 native client 共享的内存</li><li>Ior<br>  user process 和 native client 通过这个 ring buffer 进行交互。具体方式类似于 io_uring。<br>  请求会被按照 io_depth 攒批执行，不同的 batch 的执行是并行的。</li></ul><h2 id="Metadata-存储"><a href="#Metadata-存储" class="headerlink" title="Metadata 存储"></a>Metadata 存储</h2><h3 id="chunk-的分布"><a href="#chunk-的分布" class="headerlink" title="chunk 的分布"></a>chunk 的分布</h3><p>文件按 chunk 为粒度，被条带化到多个 replication chain 中。<br>创建新文件的时候，会根据 stripe size，使用 round robin 的方式，选择一系列的 chain。选出来的这些 chain，会随机分给不同的 chunk 写入。</p><h3 id="存储-file-atributes"><a href="#存储-file-atributes" class="headerlink" title="存储 file atributes"></a>存储 file atributes</h3><p>为什么 3FS 的 inode 里的 length 会不准确？因为写路径走的是 CRAQ，而 inode 是在 metadata service 里面的。如果每次写操作完都更新一下 metadata，那么会多一次 metadata RTT，写放大严重，吞吐和延迟都会变差。<br>但是如果 metadata 迟迟不更新，例如是 100mb，而客户端写到 120mb 就挂了，此时，虽然数据已经通过 CRAQ 持久化到 chunk 存储了，但因为读的时候从 metadata 获得的长度偏小，所以还是在效果上丢失数据。<br>一种方式是按照 interval 上报更新 metadata，但这就存在不一致窗口。不过先考虑容灾问题，大概有两个方案：</p><ol><li>重启之后，从 chunk 存储中恢复数据，并由此更新 metadata。但是从 chunk 扫描数据恢复的代价很大</li><li>3FS 的设计是由 client 按照 interval 上报 max writer position，因为 client 上报的 position 一定是已经被 tail 提交了的，所以是安全的。但是如果 client 长期丢失，那么 gap 就得通过第一种方式补齐了。</li></ol><h2 id="Chunk-存储"><a href="#Chunk-存储" class="headerlink" title="Chunk 存储"></a>Chunk 存储</h2><p>Suppose there are 6 nodes: A, B, C, D, E, F. Each node has 1 SSD. Create 5 storage targets on each SSD: 1, 2, … 5. Then there are 30 targets in total: A1, A2, A3, …, F5. If each chunk has 3 replicas, a chain table is constructed as follows.</p><table><thead><tr><th align="center">Chain</th><th align="center">Version</th><th align="center">Target 1 (head)</th><th align="center">Target 2</th><th align="center">Target 3 (tail)</th></tr></thead><tbody><tr><td align="center">1</td><td align="center">1</td><td align="center"><code>A1</code></td><td align="center"><code>B1</code></td><td align="center"><code>C1</code></td></tr><tr><td align="center">2</td><td align="center">1</td><td align="center"><code>D1</code></td><td align="center"><code>E1</code></td><td align="center"><code>F1</code></td></tr><tr><td align="center">3</td><td align="center">1</td><td align="center"><code>A2</code></td><td align="center"><code>B2</code></td><td align="center"><code>C2</code></td></tr><tr><td align="center">4</td><td align="center">1</td><td align="center"><code>D2</code></td><td align="center"><code>E2</code></td><td align="center"><code>F2</code></td></tr><tr><td align="center">5</td><td align="center">1</td><td align="center"><code>A3</code></td><td align="center"><code>B3</code></td><td align="center"><code>C3</code></td></tr><tr><td align="center">6</td><td align="center">1</td><td align="center"><code>D3</code></td><td align="center"><code>E3</code></td><td align="center"><code>F3</code></td></tr><tr><td align="center">7</td><td align="center">1</td><td align="center"><code>A4</code></td><td align="center"><code>B4</code></td><td align="center"><code>C4</code></td></tr><tr><td align="center">8</td><td align="center">1</td><td align="center"><code>D4</code></td><td align="center"><code>E4</code></td><td align="center"><code>F4</code></td></tr><tr><td align="center">9</td><td align="center">1</td><td align="center"><code>A5</code></td><td align="center"><code>B5</code></td><td align="center"><code>C5</code></td></tr><tr><td align="center">10</td><td align="center">1</td><td align="center"><code>D5</code></td><td align="center"><code>E5</code></td><td align="center"><code>F5</code></td></tr></tbody></table><p>这里的 Version 是配置的 Version，节点下线会导致这个增大。</p><p>这里的一个 Chain 类似于一个 Raft Group 的概念。但是它也不是像 TiKV 一样跟某一段数据绑定的。一个 Chain 可以被多个 chain table 包含。引入 chain table 的概念，这样对于每一个 file，metadata service 就可以为它选一个 chain table，并根据这个 table 中的 Chain 去 strip 这个 file 的所有 chunk。</p><h3 id="Balanced-traffic-during-recovery"><a href="#Balanced-traffic-during-recovery" class="headerlink" title="Balanced traffic during recovery"></a>Balanced traffic during recovery</h3><p>如果一个节点 A 故障了，就需要由 Chain 中的其他节点来承担原来 A 的流量。而之前的 Chain table 中，A 节点基本上只和 B、C 玩。</p><p>在新的架构中，A 在 Chain 2 里和 B/D 在一起，在 Chain 5 里和 C/F 在一起。</p><h3 id="Data-replication"><a href="#Data-replication" class="headerlink" title="Data replication"></a>Data replication</h3><p>一个 Write request 可能是从 client 或者 Chain 的前驱发送出来的。一个节点收到 Write request 后的处理：</p><ul><li>校验 write request 中的 chain version。</li><li>通过 RDMA Read 去 pull 写入的数据。如果 client 或者前驱挂掉了，导致拿不到数据。写入就 abort。</li><li>Once the write data is fetched into local memory buffer, a lock for the chunk to be updated is acquired from a lock manager. Concurrent writes to the same chunk are blocked. All writes are serialized at the head target.</li><li>读取这个 Chunk 的 committed version，对它 apply change，然后将更新后的版本存储为 pending version。版本号是单调连续递增的。</li><li>If the service is the tail, the committed version is atomically replaced by the pending version and an acknowledgment message is sent to the predecessor. Otherwise, the write request is forwarded to the successor. When the committed version is updated, the current chain version is stored as a field in the chunk metadata.</li><li>When an acknowledgment message arrives at a storage service, the service replaces the committed version with the pending version and continues to propagate the message to its predecessor. The local chunk lock is then released.</li></ul><h1 id="实现"><a href="#实现" class="headerlink" title="实现"></a>实现</h1><h2 id="Chain-replication"><a href="#Chain-replication" class="headerlink" title="Chain replication"></a>Chain replication</h2><p>CRAQ 的最大的特点是：复制链路是固定的，而 Raft 的复制链路因为存在 Quorum 是随机的。因此，对于单个 entry 的复制，Raft 的延迟可能会比 CRAQ好，但是 CRAQ 的延迟和吞吐更稳定，可以被预测。</p><p>延迟维度的对比：</p><ol><li>Raft 最优<br> a. 条件：follower 延迟几乎一致、没有网络问题<br> b. commit latency 约等于 RTT</li><li>Raft 最劣<br> a. 条件：quorum 中某个 follower 抖动，例如出现了 Write Stall<br> b. Throughput 约等于 1 / avg([max(follower latency)]，这里的 avg 因为抖动是不均匀的，因为多数派每次都不一定一样。</li><li>CRAQ 最优<br> a. 条件：链上每个 node 的 hop_latency 相同，管道用满<br> b. Throughput ≈ min(hop throughput)，这个很简单，木桶效应嘛。</li><li>CRAQ 最劣<br> a. 条件：链上每个 node 变慢<br> b. 存在木桶效应，每个 entry 的提交都被固定变慢。</li></ol><p>吞吐维度的对比：</p><ol><li>Raft 最优<br> a. 条件：同延迟。<br> b. Throughput ≈ follower replication rate</li><li>Raft 最劣<br> a. 条件：同延迟。<br> b. commit latency 等于 max(quorum follower latency)，出现了木桶效应</li><li>CRAQ 最优<br> a. 条件：链上每个 node 的 hop_latency 相同<br> b. commit latency 约等于 chain_length * hop_latency。因为 CRAQ 在 Tail 返回确认，所以 3 副本也是两次数据传输，和 Raft 的 1 个 RTT 是接近的。但是链长了，CRAQ 就会变慢。</li><li>CRAQ 最劣<br> a. 条件：存在一个很慢的 node<br> b. 同上，但是木桶效应被放大很明显。</li></ol><p>CRAQ 的其他特点：</p><ol><li>没有 quorum 容错</li><li>failover 不需要重新选主，但需要重构链。需要根据挂的是哪一个来讨论。</li></ol><h1 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h1><ul><li><a href="https://github.com/deepseek-ai/3FS/blob/main/docs/design_notes.md" target="_blank" rel="noopener">https://github.com/deepseek-ai/3FS/blob/main/docs/design_notes.md</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;学习 3FS。&lt;/p&gt;</summary>
    
    
    
    
    <category term="数据库" scheme="http://www.calvinneo.com/tags/数据库/"/>
    
    <category term="aiinfra" scheme="http://www.calvinneo.com/tags/aiinfra/"/>
    
  </entry>
  
  <entry>
    <title>用英文写作计算机博客</title>
    <link href="http://www.calvinneo.com/2026/01/12/write-blogs-in-english/"/>
    <id>http://www.calvinneo.com/2026/01/12/write-blogs-in-english/</id>
    <published>2026-01-12T14:42:32.000Z</published>
    <updated>2026-01-18T19:06:49.071Z</updated>
    
    <content type="html"><![CDATA[<p>介绍下用英文写作计算机博客的一些经验。</p><a id="more"></a><h1 id="常见的表达"><a href="#常见的表达" class="headerlink" title="常见的表达"></a>常见的表达</h1><h2 id="避免直接翻译汉语词"><a href="#避免直接翻译汉语词" class="headerlink" title="避免直接翻译汉语词"></a>避免直接翻译汉语词</h2><p>少用汉语式的名词化表达，例如“执行 xx 行动”，“处理 xx 过程”。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">the lock is released</span><br><span class="line"></span><br><span class="line">===&gt;</span><br><span class="line"></span><br><span class="line">the release of the lock is performed</span><br></pre></td></tr></table></figure><p>不要自以为是去形容词化。如下，adaptive 的意思是自适应的，和我们实际要指代的“适配代码的行为”是不对应的。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">We need to **adapt** the code to the new behavior. However, this **adaptive** work is not easy.</span><br><span class="line"></span><br><span class="line">===&gt;</span><br><span class="line"></span><br><span class="line">... However, the adaptation work is not easy.</span><br></pre></td></tr></table></figure><h3 id="实词虚化、具体词抽象化"><a href="#实词虚化、具体词抽象化" class="headerlink" title="实词虚化、具体词抽象化"></a>实词虚化、具体词抽象化</h3><p>汉语中，很多虚词功能是通过复用实词来的。但是在英语中，如果有对应的虚词，就不要直接翻译汉语中的实词了。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">Based on this reason, ...</span><br><span class="line"></span><br><span class="line">===&gt;</span><br><span class="line"></span><br><span class="line">As a result, ...</span><br><span class="line">For this reason, ...</span><br></pre></td></tr></table></figure><p>又例如下面的“带来麻烦”，这里的带来没必要用 bring 这个实词</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">brings lots of trouble to investigate ...</span><br><span class="line"></span><br><span class="line">===&gt;</span><br><span class="line"></span><br><span class="line">made the issue difficult to investigate ...</span><br><span class="line">// Or</span><br><span class="line">significantly complicated the investigation to ...</span><br></pre></td></tr></table></figure><p>又例如下面的“定位到问题”</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">The problem is located by ...</span><br><span class="line"></span><br><span class="line">===&gt;</span><br><span class="line"></span><br><span class="line">The problem was also detected by ...</span><br></pre></td></tr></table></figure><h3 id="虚词实化"><a href="#虚词实化" class="headerlink" title="虚词实化"></a>虚词实化</h3><p>但是，一些汉语中的虚词，在英语中要实化。例如，“这会导致难以理解的代码”，就是</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">It could result in confusing codes.</span><br><span class="line"></span><br><span class="line">===&gt;</span><br><span class="line"></span><br><span class="line">It could generate confusing codes.</span><br></pre></td></tr></table></figure><p>这个原则甚至不限于词，对于任何表达都是这样。<code>This may cause problems</code>，建议直接具体一点说是什么 problems。如</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">This behavior can cause a deadlock when two tasks hold the mutex across an await point.</span><br><span class="line">// Or</span><br><span class="line">This design is problematic in terms of concurrency safety, because it allows a task to hold a mutex across an await point.</span><br></pre></td></tr></table></figure><h2 id="需要调整句子结构"><a href="#需要调整句子结构" class="headerlink" title="需要调整句子结构"></a>需要调整句子结构</h2><p>尽量避免使用形式主语 It is 或者 there is 等来拖长表达</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">It is hard to inspect the state.</span><br><span class="line"></span><br><span class="line">===&gt;</span><br><span class="line"></span><br><span class="line">Inspecting the state is difficult.</span><br></pre></td></tr></table></figure><p>但注意，非形式主语是可以用 it 的，如下所示，这好过说 “I think” 等</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">It indicates ...</span><br><span class="line">This implies ...</span><br><span class="line">The result shows ...</span><br></pre></td></tr></table></figure><p>从下面的例子中，能感觉到动名词前置的用法，更干练</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">I don&apos;t think the service itself should persist the updated configuration to the config file. Instead, this should be handled by the operator.</span><br><span class="line"></span><br><span class="line">===&gt;</span><br><span class="line"></span><br><span class="line">Persisting the updated configuration should be the responsibility of the operator, not the service itself.</span><br></pre></td></tr></table></figure><h2 id="语序"><a href="#语序" class="headerlink" title="语序"></a>语序</h2><p>副词顺序应该稳定在动词后。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">We must immediately do this.</span><br><span class="line"></span><br><span class="line">===&gt;</span><br><span class="line"></span><br><span class="line">We must do this immediately.</span><br></pre></td></tr></table></figure><h2 id="其他"><a href="#其他" class="headerlink" title="其他"></a>其他</h2><p>下面的表达，相比更能体现出不仅不能做之前说的，也不能做现在说的。而 Also 显得更像一个连接。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">Also, A can&apos;t do sth.</span><br><span class="line"></span><br><span class="line">===&gt;</span><br><span class="line"></span><br><span class="line">Moreover, A can not do sth either.</span><br><span class="line">We can&apos;t ..., nor can we ... .</span><br></pre></td></tr></table></figure><h1 id="使用更恰当的单词"><a href="#使用更恰当的单词" class="headerlink" title="使用更恰当的单词"></a>使用更恰当的单词</h1><h2 id="使用更精确的词"><a href="#使用更精确的词" class="headerlink" title="使用更精确的词"></a>使用更精确的词</h2><p>避免使用 </p><p>possible -&gt; feasible</p><h2 id="使用恰当的搭配"><a href="#使用恰当的搭配" class="headerlink" title="使用恰当的搭配"></a>使用恰当的搭配</h2><p>动名词搭配。</p><h2 id="辩证具体含义的差别"><a href="#辩证具体含义的差别" class="headerlink" title="辩证具体含义的差别"></a>辩证具体含义的差别</h2><p>如：</p><ul><li><a href="https://english.stackexchange.com/questions/277073/which-is-correct-confident-in-or-confident-of?newreg=42e56658dfef4cb1b187d36e10d24f6d" target="_blank" rel="noopener">be confident of 和 be confident in</a></li></ul><h1 id="宏观写法"><a href="#宏观写法" class="headerlink" title="宏观写法"></a>宏观写法</h1><p>先给结论，再给解释（Top-down writing）。</p><h1 id="特定场景的表达"><a href="#特定场景的表达" class="headerlink" title="特定场景的表达"></a>特定场景的表达</h1><h2 id="偏向数据分析"><a href="#偏向数据分析" class="headerlink" title="偏向数据分析"></a>偏向数据分析</h2><p>表达“A使用的内存占A的调用者的比重，相比B使用的内存占B调用者的比重是相近了”，从书面到口语。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">// 这里的 footprint 更专业术语一点。</span><br><span class="line">A’s memory footprint relative to its callers is similar to B’s.</span><br><span class="line"></span><br><span class="line">A and B exhibit similar memory-to-caller ratios.</span><br><span class="line"></span><br><span class="line">The proportion of memory used by A relative to its callers is similar to that of B.</span><br><span class="line"></span><br><span class="line">A and B have similar memory usage ratios with respect to their callers.</span><br></pre></td></tr></table></figure>]]></content>
    
    
    <summary type="html">&lt;p&gt;介绍下用英文写作计算机博客的一些经验。&lt;/p&gt;</summary>
    
    
    
    
    <category term="English" scheme="http://www.calvinneo.com/tags/English/"/>
    
  </entry>
  
  <entry>
    <title>太原游记</title>
    <link href="http://www.calvinneo.com/2026/01/04/traval-in-taiyuan/"/>
    <id>http://www.calvinneo.com/2026/01/04/traval-in-taiyuan/</id>
    <published>2026-01-03T18:20:13.000Z</published>
    <updated>2026-01-09T07:37:06.530Z</updated>
    
    <content type="html"><![CDATA[<p>总的来说，太原是类似于西安的城市，类似的气候，类似的历史文化。但在游玩体验来讲，太原总体略优于西安，主要是：</p><ul><li>西安商业化太浓重了，例如大唐不夜城、城墙上骑车等。当然这也不是坏事，但如果玩的多了，就会觉得商业化严重的景点如同预制菜一样，不能说不好吃，但感觉容易腻</li><li>西安人太多了</li></ul><p>但太原的问题主要是：</p><ul><li>交通很不方便。虽然也看到它有专门的旅游线路，并且感觉是用心的，但体验上确实还有欠缺。</li></ul><a id="more"></a><h1 id="D1"><a href="#D1" class="headerlink" title="D1"></a>D1</h1><p>我们是早上 7.55 的飞机，所以定了个 5.10 的送机，相比打车要贵一点，但好在不需要担心到时候没有车。正好前一天是跨年夜，我们就去雨花台吃了点饭，逛了下雨花万象，就住在雨花了。因为我经典的失眠，导致那一天我实际上没怎么睡。</p><p>在机场躺大座睡了一个小时，然后登机门开了，风吹进来冷的要死，没法继续睡了。好在也快登机了。这鬼飞机是摆渡车，真的冷死，外面还下着雪。</p><p>到了机场，出门就能打车，打车秒打，司机就在停车场，很快就上了车。太原机场离市区是真的近，离太原南站也很近，所以是非常适合中转的城市，这很类似之前去过的张家界。</p><p>北齐壁画博物馆分为三个展厅：</p><ul><li>第一展厅是娄叡墓的壁画。比较有印象的一个是狩猎图，一个是升天图。有印象的细节一个是马一边跑一边被吓得拉屎。另一个是雷公，长得特别奇怪。</li><li>第二展厅徐显秀墓。因为整个博物馆实际上就是盖在这个墓上面的，所以比较有趣。我们可以看到当时发现这个墓的土丘上的盗洞。值得一提的是，虽然我们没法下到最下面的墓里面去，但是旁边有个清晰度贼高的 vr 可以看，不仅壁画很清楚，而且能看到总共五个盗洞。我想，为什么我们看不到棺椁，可能就是这么多盗洞把它们都偷走了了吧。另外，后面在山西博物院中，我们能看到这个墓里面的壁画，但不知道是仿制品，还是移过来的。</li><li>第三展厅是拼好展，里面展示了不同的墓的壁画。</li></ul><p>从北齐壁画博物馆出来，打车非常难打。索性看到有个公交车，就上去，发现好像是个旅游专线，只到双塔公园和火车站。用微信乘车码就可以刷码，坐上去之后，司机一个人发了一张门票一样的东西，实际上就是旅游专线的车票，感觉挺用心的。</p><p>考虑了下，我们觉得先回宾馆比较好。一来，双塔公园离着纯阳宫和宾馆都很远，后面两者反而比较近。二来，我那个包确实太重了。于是打了辆车，直奔宾馆。</p><p>到了宾馆，准备出门，发现山西博物院突然放了一千多张票。于是打电话问，答复说只要是今天的时间预约的就可以来。接线的是个女人，用非常热情的预期表达了对我们的欢迎，非常点赞。所以，我们临时改了计划，去看山西博物院。</p><p>山西博物院就靠着我们的亚朵，中间隔了自然博物馆和图书馆。走过去的时候，门口就排了个小队了。进去之后，发现博物馆里面全是人。这个博物馆主要是四层，重点的是在 2/3 两层，另外有几个重要展物被放到了第 1 层和第 4 层。山西博物馆主要就是看各种青铜器。四楼有很多主体展厅，比如玉、应县木塔、钱币等。总体感觉这博物馆年代比陕历博要老，基本上全是青铜器。比较有印象的一个是那个猫头鹰，还有就是大蜗牛上骑着小蜗牛，以及博物馆徽标上的大鸟上骑小鸟的。有个叫雁鱼铜灯的文物好像被国博借走了，直接在下面放了个不知道是啥的青铜器掩耳盗铃。</p><p>从山西博物馆出来，快五点了，我们赶快打车去河东颐祥阁。事实证明这决定很正确，因为我们到的时候饭店还没开门。但是我们刚坐下来，一堆人就陆陆续续进来了，然后他们就要排队。我们在饭店点了非常多的东西，在此点评一下：</p><ul><li>涮肚 我老婆说这个麻辣烫很好，因为里面的麻酱没有麻酱味，她能接受。我觉得也不错。</li><li>芥末凉粉 很爽口。</li><li>风葫芦 感觉是炸的一个脆皮球，然后里面是很嫩的鸡蛋。我对象很喜欢，我觉得还行，但没有特别很好吃。</li><li>黄米排骨 这个是最贵的，但吃起来感觉一般。</li><li>麻辣串 实际上就是豆干，但是加了他那个口感很椒麻胡辣的汤汁，很好吃。我对象不喜欢胡椒和辣口味，她不怎么喜欢。</li><li>羊腿 我觉得很好吃，我对象也觉得。而且她觉得不太膻。</li><li>槐花 小学里课文学过，但实际上是第一次吃槐花，感觉挺好吃的。没想到是咸的，感觉有点吃野菜的感觉，但并不像野菜那样有茎的感觉。</li><li>野菜丸子 感觉就是北方的那种丸子。</li><li>绿豆糕 挺甜的，但是很好吃。</li><li>烧饼 有点像肉夹馍的馍，但是是很热的，吃起来很香。</li></ul><p>从颐祥阁出来，已经打包了一堆东西，我们打算去鼓楼街再去看看。太原的地铁在长风街这一段特别稀疏，但吃的又都在这里，所以我们走了好远上了地铁。从府西街站下来，我们拐进了帽儿巷。</p><p>刚去的时候，人是一般多，我老婆还说这个离长沙差远了。我们随便走走，就看到一家要排队的卖麻花的，说特别有名。买了黑芝麻和蜂蜜的，感觉就是小时候的味道，有点类似于我奶奶买的那个火腿肠面包。我记得那个面包皮我特别喜欢，就是这个麻花的味道。</p><p>我们后面又买了枣糕和碗托。碗托的那个面感觉挺好吃的，是糯糯的。而很早之前我对象在淘宝买的感觉就很干，跟那种硬胶水一样。不过我当时已经又累又撑，不太吃的下了。</p><p>我准备往地铁站走准备回去了，但是我老婆说，这个鼓楼街，我们还没见到鼓楼呢。于是走到横过来的一条街上，这条街就洋气很多了，都是一些洋人的建筑。再往前走，还有一个巨大的钟楼。这里人就开始特别多了，有种长沙的感觉。</p><p>在认一力吃了几道菜：</p><ul><li>羊肉水饺 点了两种，但感觉都一个味道。我老婆说羊肉味特别大，根本没法吃。我觉得还行，但是感觉很咸。感觉羊肉馅里面加了很多葱姜蒜。另外，第二天我从冰箱里面拿出来的时候，感觉它们散发着一股呕吐味。但加热之后，又能吃了，感觉还行。</li><li>沙棘醪糟 感觉挺好吃的，很清淡，不是特别的酸，也不怎么甜。但是很爽口。</li><li>头脑 头脑里面的羊肉挺好吃的，炖的烂。但是那个粥一样的东西就是一言难尽了。吃起来感觉就好像是熬的很浓的羊油和粥一起煮成的粘稠物。但是呢，它本身又不具备酸甜苦辣咸这些基础的味道，所以吃在嘴里就是一股腥味。</li></ul><p>从认一力出来，人就开始比肩接踵了，于是我们准备往回走了。当时我已经又撑又困到神志不清了，终于走到了柳南地铁站。坐地铁到太原理工大学站，发现到酒店还要走好远。</p><p>晚上回来困得要死，倒头就睡。中途 23 点的时候被老婆叫起来看，原来是酒店对面就放起了烟花。话说山西今年不禁止放烟花，所以大家过年都在放，感觉挺热闹的。汾河上的那条龙今天晚上也是一直亮着灯，还挺好看的。</p><h1 id="D2"><a href="#D2" class="headerlink" title="D2"></a>D2</h1><p>早上起的有点晚，睡了大概 10 个多小时。吃了个酒店的自助餐，顺便把昨天的水饺热了下。</p><p>打车去晋祠，这一路挺远的，司机开得贼快。太原的早晨全是雾霾，路过还有几个大烟囱在冒着白气，可能是供暖工厂吧。到了晋祠，寻思着买了个天龙山的门票。晋祠一进去是非常大的免费公园，里面做了一堆假山雕塑啥的，有个唐太宗的雕塑后面还出现在太原的城市宣传画上，但是感觉雕的人脸都一样。</p><p>晋祠博物馆是挺有意思的，原本我以为它就是一个祠堂，拍照打卡走人这样，但里面挺大的，而且建筑的排布很好看，基本上处处是景。</p><p>游览这个景点，必须要有个讲解。我当时还是用的旅途随身听，感觉讲的是足够了。总而言之，这个晋祠原来是纪念一个儿子的，后来慢慢的，妈妈名气反而更大了，所以改为了主要祭奠妈妈，结果导致儿子的祠堂偏安在整个晋祠的最角落。</p><p>一进门就是晋祠最网红的孙悟空同款了，但其实它并不是祠堂，而是一个戏台，叫水镜台。很多人在正面拍照合影，走到背面有个康熙题的匾额。</p><p>然后，我们就按照旅途随身听的导览逛了，先逛了一遍关帝庙、岳飞庙等，我也认识了一堆歇山顶、硬山顶等屋顶的类型，从而判断规格。然后就到了儿子的唐叔虞的祠堂。这哥们属实惨，真的就在最角落，门口还贼小。要不是随身听，我可能都不会进去。</p><p>从唐叔虞祠堂出来，就看到前面一堆人。这里应该就是晋祠最核心的地方，也就是圣母堂了。圣母堂有几点比较独特：</p><ul><li>堂前的十字桥，据说是首创。这个桥在晋祠外围的免费公园中也有个拙劣的模仿。</li><li>七开间的规格，每个柱子上都有龙盘旋，这些龙长得都不一样，有的还很抽象。</li><li>圣母堂中的宋代雕塑，惟妙惟肖。</li><li>圣母堂旁边有个周朝的柏树，然后它被另一棵柏树架着。</li></ul><p>从圣母堂转了一圈出来，旁边还有一个苗裔娘娘堂，再绕出来，就到了难老泉。这个说是晋阳第一泉，感觉就是比较谦虚了，毕竟无锡还有个天下第二泉呢。这个亭子是北齐建的，所以可以看到斗拱是非常的大。虽然后面在嘉靖年返修了，但仍然是使用了北齐的手法。而它对面的亭子，斗拱就很小，一看就是明清的建筑。其实在晋祠中，分布着不同朝代的建筑，我们都可以从斗拱的大小，以及房顶上鸱吻头和尾巴的比例来分辨年代。晋祠中比较主要的建筑如圣母殿等都是在唐左右建设的。</p><p>难老泉会通过一个龙头流到旁边的一条小溪里面，然后很多人排队在那里接水。我们绕过那群人，就可以走到子乔祠、董寿平美术馆那一带，不过那些就不是重点了。</p><p>我去逛了逛美术馆，我老婆根本就没去逛，而是再绕回来到正面，到了圣母堂之前的一些建筑，如会仙桥、献殿等。比较有意思的是金人台，上面有个非常小的小楼，不知道那是干嘛的。献殿上面有个万历四年的匾额，我对象说是不是那个万历四年春，我说那是庆历四年春。</p><p>从晋祠出来，下一站就是天龙山。但我们必须走很长的路才能穿过外围的免费公园。我们还路过了晋文公艺术博物馆，不过没进去看，不知道里面怎么样。博物馆附近有很多大湖环绕，里面可以划船，但是现在是冬天，这些湖面都结冰了。博物馆前按照先秦的特色，搞了点夯土柱子。绕过博物馆，就到了出口，在这里进行了一些简单的修整，买了个烤红薯和上海阿姨，就去找景区交通。</p><p>公交站排队等了比较长的时间，感觉得有至少十分钟，车来了。因为我们觉得不太妙，所以当时就站的比较靠前，所以有座位坐，但没想到这个公交车是不给站的。所以我们眼睁睁看着后面的人上不来，司机跟他们说等下一班吧。这公交车开了，没一会就上了盘山公路。这路应该太原花了很多钱搞了个旅游公路，很多盘山公路被搞成了类似南浦大桥那样的展线，所谓网红桥。这些展线上停了一堆小轿车，人们在那里往低处拍照片。到了龙门站，我们下车，其实已经花了几十分钟了。</p><p>这天龙山其实很坑，它主要是看石窟的，分为东峰和西峰，但是我们去的时候东峰全封掉了，所以我们就直接往上走回去了，也没再去下面的天龙寺。具体到西峰石窟，里面也是丢的丢，乱涂乱画的也有。后面打了个车去国宝馆，里面看到了东峰石窟中第 8 窟的佛头，说是被日本人之前抢走了的，后来被中国又追回来了。国宝馆旁边还有个数字馆，就是给你看看一些石窟的宣传片。</p><p>从国宝馆直接坐车到晋祠，下了车就打车去植物园。太原植物园很大，一进去是一个湖，湖面已经结冰。围湖种了一圈银杏，叶子很漂亮，不过走近一看发现是假的。因为是冬天，主要能逛的就是几个温室。这几个温室设计的都很好，造景很有一套，并且有高低不同的步道，可以方便游客游览观看不同高度的植物。热带雨林馆里面有个瀑布，大家挤在那里拍照。不过我最喜欢的还是沙生植物馆，一进门是三个大黄柱子，加上远处落日，夕阳照在上面的感觉，给我一种置身于沙漠或者火星表面的感觉。另外还有一个蝴蝶馆，以及一个园艺馆，反正都挺好看的。太原植物园还有萤火虫看，不过冬天就暂停了。</p><p>最终也没等到植物园那个网红天花板亮灯，而是五点二十不到就打车走了。当时打车很容易，不过刚开到市区就发现植物园门口堵红了。我觉得这里的交通设计有问题，汽车要去植物园门口，实际上要走很远到前面掉头才行。这也是太原的一个特点，快速路很多，但是跨越高速路，或者掉头，就很麻烦了。</p><p>去吃东北爱情麻辣拌，在一个非常荒芜的小巷中的破落小平房，里面墙上天花板上都是之前顾客的留言。只能选择加料，以及辣度。我加了麻花和什么的，我对象加了另外一种。这麻辣拌全是素的，全是碳水，要不就是豆制品。旁边一个京爷在吹牛逼说自己老爹是个煤矿里面的啥科长，然后后面北京就不让挖煤了 blabla，说房子贵得要死，幸亏买得早。</p><p>从麻辣烫出来，不远就能走到附近的上帝炸鸡，因为我觉得尽管有外卖，但是没必要一定吃总店。路上还遇到一个卖碗托的路边摊子，不过这次卖的是保德碗托。味道很不一样，我看到她加了一种黄色的酱汁，我想这个应该导致了我们最终吃到的是有点酸味的。加上我们没有要很多辣油，就导致碗托并不腻。不过它的面感觉就和淘宝的荞麦面碗托一样，不如昨天的那个有糯劲。卖碗托的斜对面就是上帝炸鸡，我们点了一份大份的，吃起来感觉就是那种老派炸鸡的感觉，外皮特别脆，但是里面很多汁。不过我觉得得加点辣粉更好吃。</p><p>买完上帝炸鸡，就去利源沾片子。中途路过似乎是太原比较繁华的体育路街区，有个盒马。我们继续往北走就到了沾片子，在一个灯光很暗的小街上。我老婆先去的，然后出来告诉我要排位，留了个电话号码。然后我们就寻思附近转转，这附近可真没啥能转的。一些很拉胯的咖啡厅，泰山啤酒，一堆成人用品店。最后，我们回到那个店，发现人都换了一遍，显然老板看生意好，就没有叫我们。所以我们被迫在店里面站着等位，等的时候，后面最多来了四队人。老板让我们提前点菜，我们点了一堆：</p><ul><li>沾片子 吃起来比较素。我比较喜欢豇豆的那个，感觉保留了蔬菜的本味。</li><li>炒茄子 很好吃，茄子很脆，酱汁很香。</li><li>莜面</li><li>清徐灌肠 实际上也是素的，感觉挺油的</li></ul><p>这家店不光等位慢，做得也慢，出来了就九点多了。打车不太打得到，所以就准备坐地铁，路上还买了一盒草莓，大个的，才 30 块钱。好不容易坐上了地铁，然后下错站了，所幸直接打车算了。不过这样晚上在汾河边散步的计划就泡汤了，实际上那天晚上步道的灯光包括那条龙也没开，所以就算了。</p><h1 id="D3"><a href="#D3" class="headerlink" title="D3"></a>D3</h1><p>早上七点五十就起来了，先把昨天的炸鸡拿到宾馆餐厅里面热热，顺便再吃吃他的刀削面和豆花。太钢汽水也是尝了尝别的味道，感觉都还可以。</p><p>吃完饭，就去汾河边散散步。这次见识到了传说中的自行车道，其实挺窄的，一个方向只能容纳一个人骑。并且这个自行车道的下匝道有点少，很多地方是高架。要散步的话，可以穿过自行车道再往河边走。</p><p>逛完回去已经快九点半了，赶快打车去晋商博物馆。晋商博物馆实际上就是原来的巡抚衙门，在民国时期也是山西的中枢所在，到了新中国，一度是省委和省政府的办公地，一直到 2017 年改建为博物馆。这个博物馆确实透露着政府办公室的气息，特别是中间一栋苏联式样的办公楼，一进去就能闻到浓烈的复写纸的味道。</p><p>我发现山西这边的博物馆有一个喜好，就是搞大全套收集。例如山西博物院就搞了个钱币展，非常夸张的收集了先秦时期各个国家的钱币，以及后面历朝历代不同年号的钱币。晋商博物馆中更是收集了七八套大小不一的编钟，收集了鼎、盆、壶、斛等各种器皿。最特别的是，我第一次看到了一个东西叫灶。晋商博物馆挺长的，进去先是一个高楼，里面存了据说是山西第一巡抚诺敏的匾额。后面是2号楼和3号楼，两栋楼通过走廊连在一起。在后面是个会议室，和一个苏联样式的高楼。再往后有一个钟楼，但是不让过去了。比较幸运的是，东花园原来在修缮，但是这一次对我们开放了。进去可以看到民国时候的一些陈列，以及五几年的时候省委书记的住房。</p><p>从晋商博物院出来，就去吃了一诺铜火锅。我老婆照例点了一辈子都吃不完的菜，点评如下：</p><ul><li>铜锅 感觉一般</li><li>黄米凉糕 很好吃，但是不要把上面的糖拌进去，不然就太甜了</li><li>丸子串 一般</li><li>羊肉串 很好吃</li><li>涮肚 感觉很好吃，我老婆觉得麻将很多，但是我觉得跟颐祥阁一模一样</li><li>蒜泥茄条 挺好吃的，酸酸甜甜</li><li>小酥肉 肉感觉一般，但是旁边的蘑菇还不错</li><li>烤饼 感觉很好吃，也带回去了</li><li>鸡包豆腐 感觉就是千页豆腐？一般</li><li>过油肉 很好吃</li><li>酸辣白菜 酸酸甜甜的很好吃</li></ul><p>吃完饭，才 12.40，感觉还有一段时间，因此就准备去纯阳宫。因为我在打车过来的路上就看到纯阳宫了，所以就直接骑车过去，风风火火，花了不到十分钟就骑到了。进去之后，我老婆终于要上厕所了，我先进去逛。它的展厅都好没意思，啥陈列都没有，就干放视频。走到最后，是个主题展，里面陈列了一个常阳天尊像，居然是一级国宝，并且是 195 文物。这玩意也没有被框起来，能够近距离观赏感觉挺好，大家也都很有素质，不上手摸。出门，上楼梯，二层的展览就很蠢了，有个展厅里面全是文物已借出。二楼可以通到前面的九宫八卦院，有很多人在那边拍照。</p><p>往回走，打开了旅途随身听，介绍了九宫八卦院，这个建筑布局很有意思，据说是全国唯一的。再往回，就是吕洞宾殿，也就是纯阳宫的主要建筑。再往回就是弥勒佛铜像。我老婆跟我说这里面有个西汉的石狮子，还有明代的铜狮子，居然就直接摆在那里，也没有保护，感觉山西还是富，人们和文物生活在一起。</p><p>再往回走，就快到了出口。此时左边是一个假山，上面是一个关羽像。关羽像没有胡子，因为这个像是在明朝的，而关羽有胡子的说法是明末根据三国演义才有的。右边是个碑廊，然后里面放着另一个 195 文物，涅槃变相碑。这玩意也没有被保护起来，我觉得还是不太好，毕竟这个是室外嘛，还是要有点防护为好。</p><p>我在看涅槃变相碑的时候，因为空间狭小，我的书包还把后面的一个壁画的说明牌给蹭下来了，幸亏没有伤着壁画，不过这也说明了这个陈列比较拥挤，容易碰到。</p><p>从纯阳宫出来立马打车赶往机场，不得再次羡慕太原机场离市区实在是近。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;总的来说，太原是类似于西安的城市，类似的气候，类似的历史文化。但在游玩体验来讲，太原总体略优于西安，主要是：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;西安商业化太浓重了，例如大唐不夜城、城墙上骑车等。当然这也不是坏事，但如果玩的多了，就会觉得商业化严重的景点如同预制菜一样，不能说不好吃，但感觉容易腻&lt;/li&gt;
&lt;li&gt;西安人太多了&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;但太原的问题主要是：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;交通很不方便。虽然也看到它有专门的旅游线路，并且感觉是用心的，但体验上确实还有欠缺。&lt;/li&gt;
&lt;/ul&gt;</summary>
    
    
    
    
    <category term="游记" scheme="http://www.calvinneo.com/tags/游记/"/>
    
  </entry>
  
  <entry>
    <title>TiCI 上线过程</title>
    <link href="http://www.calvinneo.com/2025/11/30/tici-go-online/"/>
    <id>http://www.calvinneo.com/2025/11/30/tici-go-online/</id>
    <published>2025-11-30T03:57:20.000Z</published>
    <updated>2026-01-09T08:11:04.729Z</updated>
    
    <content type="html"><![CDATA[<p>记录了 TiCI 上线过程中遇到的一些问题：</p><ul><li>针对这些问题的技术性解法和运维性的解法<br>  涉及到某些内部知识的将不予公开。</li><li>对于问题严重程度应该如何判断</li><li>如何为了达成上线的既定目标，设计临时性的缓解措施</li></ul><a id="more"></a><h1 id="POC-on-TiDB-X-for-Customer-U-late-Nov-2025"><a href="#POC-on-TiDB-X-for-Customer-U-late-Nov-2025" class="headerlink" title="POC on TiDB-X for Customer U, late Nov 2025"></a>POC on TiDB-X for Customer U, late Nov 2025</h1><h2 id="主要问题"><a href="#主要问题" class="headerlink" title="主要问题"></a>主要问题</h2><ol><li>支持 Keyspace</li><li>运维手段<br> 包含 Shard、Reader 和 Importer</li><li>支持 Security</li><li>openssl 的编译问题</li><li>arm 编译的问题</li><li>上线后出现的影响可用性的 bug</li><li>上线后出现的影响性能的 bug</li></ol><h2 id="Nov-21"><a href="#Nov-21" class="headerlink" title="Nov 21"></a>Nov 21</h2><p>讨论 import into 场景下，如果某个 worker 因为 OOM 重启，则因为目前缺少 Heartbeat 机制和 reschedule 机制，整个任务会因为这个小任务失败而停止。</p><p>另外，也提到了和 import into 流控相关的问题。</p><h2 id="Nov-23"><a href="#Nov-23" class="headerlink" title="Nov 23"></a>Nov 23</h2><p>遇到了 tokio Sender 报错 channel closed 问题。当时没有空查。</p><p>这个问题实际上就是协程因为哪里 unwrap 而 panic 了，而因为协程池是跑的消息循环，所以 panic 了不 join 也不知道。一直没找到是哪里，后来我加了点 panic hook，才在 Nov 30 定位到是服务发现的问题。</p><h2 id="Nov-26"><a href="#Nov-26" class="headerlink" title="Nov 26"></a>Nov 26</h2><ol><li>TLS options are only supported with HTTPS URLs<br> 这个错误很 plain</li><li>Client had asked for TLS connection but TLS support is disabled. Please enable one of the following features: [“native-tls-tls”, “rusttls-tls”]<br> 我们不能用 native，只能用内置的 rusttls</li><li>undefined symbol: pthread_atfork<br> ARM 上的编译问题，有专门<a href="/2025/11/25/pthread_atfork/">文章</a>介绍</li><li>TLS error error:0A000086<br> 这个就是要启动的时候指定下用 ring 还是 aws_lc_rs</li></ol><h2 id="Nov-27"><a href="#Nov-27" class="headerlink" title="Nov 27"></a>Nov 27</h2><ol><li>排查 ARM 上的问题，这里做了两套方案，先设法用 x86 跑起来。这证明也是正确的，白天我们发现了更多的问题，最终因为 TLS 的问题也没有跑起来。但是晚上我们把 ARM 验证了下，发现不报原先的错了。</li><li>遇到一个新问题是我们的云上 operator 机制不支持改参数，我们挂的又是 ro 文件系统，导致很难验证</li><li>etcd client unavailable or unhealthy, attempting reconnect<br> 这个是 <code>kebab-case</code> 的锅，Rust 和 C++ 的格式不一样，写配置的同学把 ca-path 写成 ca_path 了。</li><li>后来发现，TiFlash CN 还是绕不开 TLS 的问题。所以还是得兼容。</li></ol><h2 id="Nov-28"><a href="#Nov-28" class="headerlink" title="Nov 28"></a>Nov 28</h2><ol><li>get meta channel failed times<br> 这个错误还是 etcd client 的 ca-path 没传对</li><li>到这里为止，整体看上去能跑了，但是查询报错，看日志感觉核心服务没起来<br> 首先 lsof 看了一下，发现服务器都没起来。<br> 因为没有 panic，直觉是哪里有个报错死在协程里面没传播出去。进而发现是 rust 的 grpc server 不能在 dns 格式的 url 上启动。不得不说我几天前的 advertise_addr 的改动很有先见之明 <a href="https://github.com/pingcap-inc/tici/pull/515" target="_blank" rel="noopener">https://github.com/pingcap-inc/tici/pull/515</a></li></ol><h2 id="Nov-29"><a href="#Nov-29" class="headerlink" title="Nov 29"></a>Nov 29</h2><ol><li>白天是在 import into 导入数据，晚上出现了一堆问题。</li><li>首先，是之前的那个 panic 导致协程退出的阴魂不散又来了。我紧急去掉了一堆 unwrap，并且加了 panic hook 打印错误日志。</li><li>然后，是 writer 那边没有对 meta 返回的错误码进行处理，但是加上了处理之后，一个集成测试不通过了。大家觉得要不就把这个测试禁用了吧，但是我很反对，因为这个测试是唯一一个带上真实的 worker 和 reader 跑完 e2e 的测试。并且我看了日志发现一个 writer node 被超时移出了，所以尽管 pr owner 认为他并没有改动到这一块的逻辑，我仍然认为他的修改导致一个严重问题暴露了。因此，我们应该先查问题。原因是即使我们强行合并了，也不能拿这个 commit 去跑生产。后来，确实发现了是在处理 heartbeat 的时候会等待一个异步任务结束，所以导致后面 heartbeat 消息循环直接卡死了。所以，这个问题确实会导致 writer 因为丢失心跳从而被全不 failover，进而整个集群没有 writer 可用的 critical 问题。不 approve 这个 PR 是完全正确的。</li></ol><h2 id="Nov-30"><a href="#Nov-30" class="headerlink" title="Nov 30"></a>Nov 30</h2><p>今天主要切换到性能方面的支持上：</p><ul><li>warmup 的速度太慢了，所以想把这一块改成并发执行<br>  在这个处理之后，8 个节点，24 个并发，大概是不到两个小时就处理完毕了。</li><li>因为 worker 出现了异常重启，导致重启后 meta 向它发送了大量的 add shard message，导致触发了 grpc 的消息大小上限。临时通过增加上限来 workaround 了。实际上是可以通过拆成多个 heartbeat response 来解决。另外同事也提到，可以只发部分信息，其他的后续让主动请求，但这个可能会产生很多的 grpc 调用。</li></ul><h2 id="Dec-1"><a href="#Dec-1" class="headerlink" title="Dec 1"></a>Dec 1</h2><ul><li>发现查询速度比较低，原因是串行访问的所有 shard</li></ul><h2 id="Dec-2"><a href="#Dec-2" class="headerlink" title="Dec 2"></a>Dec 2</h2><ul><li>发现查询并发比较低，原因是查询的 filter 不能通过主键进行过滤，所以导致每次查询需要访问所有的 shard</li></ul><h2 id="Dec-3"><a href="#Dec-3" class="headerlink" title="Dec 3"></a>Dec 3</h2><ul><li>关于昨天的并发问题，认为一个 shard 中有多个 fragment，并且一个 fragment 中又有多个 segment，因此会影响查询的速度。通过 Manual compaction 和调大内存的方式，使得 fragment 和 segment 都变少。</li><li>因为目前没有自动 balance 机制，所以还需要 Manual reschedule。</li></ul><h2 id="Dec-5"><a href="#Dec-5" class="headerlink" title="Dec 5"></a>Dec 5</h2><p>功能侧：</p><ul><li>主要处理 duplicate fragment 的问题。我认为因为 apply compaction 需要根据 frag path 来定位，重名的 frag 会导致问题，所以应该由 meta 来禁用。</li></ul><p>测试侧：</p><ul><li>执行了 Compaction，并且后续自动执行了 Merge，发现查询 QPS 提高了 50%，但依然是比较差的</li><li>因此决定用 (token0_address, ts) 作为新的主键，和 TiKV 解耦</li></ul><h2 id="Dec-6"><a href="#Dec-6" class="headerlink" title="Dec 6"></a>Dec 6</h2><p>这一轮测试发现调整了新的主键之后，QPS 增加了很多，可以看出列存中使用不同的 Sharding key 的重要性：</p><ul><li>静态数据 QPS 2.6k+，P999 61.6 ms，且仍有弹性。其中 TiFlash CPU 是 2300%，内存是 8 * 91G。<br>  <a href="https://github.com/CalvinNeo/ue-bench/blob/master/src/main.rs" target="_blank" rel="noopener">https://github.com/CalvinNeo/ue-bench/blob/master/src/main.rs</a>  <figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">SELECT</span> * <span class="keyword">FROM</span> ... <span class="keyword">WHERE</span> token0_address = ? <span class="keyword">AND</span> platform = ? <span class="keyword">AND</span> ts &lt; ? <span class="keyword">LIMIT</span> <span class="number">5</span></span><br></pre></td></tr></table></figure></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;记录了 TiCI 上线过程中遇到的一些问题：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;针对这些问题的技术性解法和运维性的解法&lt;br&gt;  涉及到某些内部知识的将不予公开。&lt;/li&gt;
&lt;li&gt;对于问题严重程度应该如何判断&lt;/li&gt;
&lt;li&gt;如何为了达成上线的既定目标，设计临时性的缓解措施&lt;/li&gt;
&lt;/ul&gt;</summary>
    
    
    
    
    <category term="数据库" scheme="http://www.calvinneo.com/tags/数据库/"/>
    
    <category term="Rust" scheme="http://www.calvinneo.com/tags/Rust/"/>
    
  </entry>
  
  <entry>
    <title>undefined symbol pthread_atfork</title>
    <link href="http://www.calvinneo.com/2025/11/25/pthread_atfork/"/>
    <id>http://www.calvinneo.com/2025/11/25/pthread_atfork/</id>
    <published>2025-11-24T18:20:13.000Z</published>
    <updated>2025-12-03T08:21:21.643Z</updated>
    
    <content type="html"><![CDATA[<p>在 x86 上可以跑，但是在 arm linux 上就报这个错误。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/tiflash/tiflash: symbol lookup error: /tiflash/libtici_search_lib.so: undefined symbol: pthread_atfork</span><br></pre></td></tr></table></figure><a id="more"></a><p>首先，ldd 看到，链接的是本地的 <code>/lib64/libpthread.so.0</code>。</p><p>可以通过 <code>strings /lib64/libpthread.so.0 | grep &#39;^GLIBC_&#39;</code> 命令查询 GLIBC 的版本。</p><p>然后，nm 了一下 /tiflash/libtici_search_lib.so，结果是：</p><ol><li><p>x86 开发机</p> <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">nm libtici_search_lib.so | grep pthread</span><br><span class="line">0000000003c97e70 t __pthread_atfork</span><br><span class="line">0000000003c97e70 t pthread_atfork</span><br><span class="line">                 U pthread_attr_destroy</span><br><span class="line">                 U pthread_attr_getguardsize</span><br><span class="line">                 U pthread_attr_getstack</span><br><span class="line">                 U pthread_attr_init</span><br><span class="line">                 U pthread_attr_setstacksize</span><br></pre></td></tr></table></figure></li><li><p>arm tiflash:v2025.8.10-2-gc9e3144-centos7 镜像<br> <img src="/img/pthread_atfork/arm.jpg"></p></li><li><p>x86 tiflash:v2025.8.10-2-gc9e3144-centos7 镜像<br> 这个是 multi arch 镜像，但是 glibc 版本不一样</p></li></ol><p>那么这个符号在 <code>/lib64/libpthread.so.0</code> 里面有么？nm 了一下：</p><ul><li>x86 的版本是 2.34，显示这个 so 是没有 Debug info 的。难道被 trim 了么？ls 了一下这个文件，发现只有 15KiB 左右。后来了解到，在较新的 GLIBC 中，pthread 相关的被整合到了 libc.so 中，我 nm 了 libc.so 确实可以看到。</li><li>arm 版本是 2.17，nm 了可以看到其他 pthread 符号，但是看不到 pthread_atfork。</li></ul><p><img src="/img/pthread_atfork/arm33.png"></p><p>原因是在大多数 Linux 发行版（使用 glibc 的系统）中，pthread_atfork 的符号并不在常规的共享库如 libpthread.so.0 中，而是通过链接器脚本特殊处理，其具体实现位于 libpthread_nonshared.a 这个静态归档文件中，而非 .so 文件里 。</p><p>解决方案参考 <a href="https://github.com/pingcap/tiflash/pull/10571%EF%BC%8C%E5%BC%BA%E5%88%B6%E4%BD%BF%E7%94%A8" target="_blank" rel="noopener">https://github.com/pingcap/tiflash/pull/10571，强制使用</a> <code>-pthread</code> 而不是 <code>-lpthread</code> 即可。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;在 x86 上可以跑，但是在 arm linux 上就报这个错误。&lt;/p&gt;
&lt;figure class=&quot;highlight plain&quot;&gt;&lt;table&gt;&lt;tr&gt;&lt;td class=&quot;gutter&quot;&gt;&lt;pre&gt;&lt;span class=&quot;line&quot;&gt;1&lt;/span&gt;&lt;br&gt;&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;line&quot;&gt;/tiflash/tiflash: symbol lookup error: /tiflash/libtici_search_lib.so: undefined symbol: pthread_atfork&lt;/span&gt;&lt;br&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/figure&gt;</summary>
    
    
    
    
    <category term="C++" scheme="http://www.calvinneo.com/tags/C/"/>
    
    <category term="Rust" scheme="http://www.calvinneo.com/tags/Rust/"/>
    
    <category term="GLIBC" scheme="http://www.calvinneo.com/tags/GLIBC/"/>
    
  </entry>
  
  <entry>
    <title>tokio channel 实现</title>
    <link href="http://www.calvinneo.com/2025/11/23/tokio_channel_src/"/>
    <id>http://www.calvinneo.com/2025/11/23/tokio_channel_src/</id>
    <published>2025-11-23T15:09:06.000Z</published>
    <updated>2025-12-12T14:50:31.126Z</updated>
    
    <content type="html"><![CDATA[<p>基于 tokio 1.46.0 版本</p><a id="more"></a><h1 id="mpsc"><a href="#mpsc" class="headerlink" title="mpsc"></a>mpsc</h1><p>mpsc 有 bounded 和 unbounded 两种形式。通过不同的 semaphore 来区别。</p><h2 id="Chan-结构"><a href="#Chan-结构" class="headerlink" title="Chan 结构"></a>Chan 结构</h2><p>对于 unbounded semaphore，其最低的 bit 表示这个 channel 有没有关闭。</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// ===== impl Semaphore for (::Semaphore, capacity) =====</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">impl</span> Semaphore <span class="keyword">for</span> bounded::Semaphore &#123;</span><br><span class="line">    <span class="function"><span class="keyword">fn</span> <span class="title">add_permit</span></span>(&amp;<span class="keyword">self</span>) &#123;</span><br><span class="line">        <span class="keyword">self</span>.semaphore.release(<span class="number">1</span>);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">fn</span> <span class="title">add_permits</span></span>(&amp;<span class="keyword">self</span>, n: <span class="built_in">usize</span>) &#123;</span><br><span class="line">        <span class="keyword">self</span>.semaphore.release(n)</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">fn</span> <span class="title">is_idle</span></span>(&amp;<span class="keyword">self</span>) -&gt; <span class="built_in">bool</span> &#123;</span><br><span class="line">        <span class="keyword">self</span>.semaphore.available_permits() == <span class="keyword">self</span>.bound</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">fn</span> <span class="title">close</span></span>(&amp;<span class="keyword">self</span>) &#123;</span><br><span class="line">        <span class="keyword">self</span>.semaphore.close();</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">fn</span> <span class="title">is_closed</span></span>(&amp;<span class="keyword">self</span>) -&gt; <span class="built_in">bool</span> &#123;</span><br><span class="line">        <span class="keyword">self</span>.semaphore.is_closed()</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// ===== impl Semaphore for AtomicUsize =====</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">impl</span> Semaphore <span class="keyword">for</span> unbounded::Semaphore &#123;</span><br><span class="line">    <span class="function"><span class="keyword">fn</span> <span class="title">add_permit</span></span>(&amp;<span class="keyword">self</span>) &#123;</span><br><span class="line">        <span class="keyword">let</span> prev = <span class="keyword">self</span>.<span class="number">0</span>.fetch_sub(<span class="number">2</span>, Release);</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> prev &gt;&gt; <span class="number">1</span> == <span class="number">0</span> &#123;</span><br><span class="line">            <span class="comment">// Something went wrong</span></span><br><span class="line">            process::abort();</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">fn</span> <span class="title">add_permits</span></span>(&amp;<span class="keyword">self</span>, n: <span class="built_in">usize</span>) &#123;</span><br><span class="line">        <span class="keyword">let</span> prev = <span class="keyword">self</span>.<span class="number">0</span>.fetch_sub(n &lt;&lt; <span class="number">1</span>, Release);</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> (prev &gt;&gt; <span class="number">1</span>) &lt; n &#123;</span><br><span class="line">            <span class="comment">// Something went wrong</span></span><br><span class="line">            process::abort();</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">fn</span> <span class="title">is_idle</span></span>(&amp;<span class="keyword">self</span>) -&gt; <span class="built_in">bool</span> &#123;</span><br><span class="line">        <span class="keyword">self</span>.<span class="number">0</span>.load(Acquire) &gt;&gt; <span class="number">1</span> == <span class="number">0</span></span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">fn</span> <span class="title">close</span></span>(&amp;<span class="keyword">self</span>) &#123;</span><br><span class="line">        <span class="keyword">self</span>.<span class="number">0</span>.fetch_or(<span class="number">1</span>, Release);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">fn</span> <span class="title">is_closed</span></span>(&amp;<span class="keyword">self</span>) -&gt; <span class="built_in">bool</span> &#123;</span><br><span class="line">        <span class="keyword">self</span>.<span class="number">0</span>.load(Acquire) &amp; <span class="number">1</span> == <span class="number">1</span></span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>无论是 unbounded 还是 bounded，最终都是到 chan::Tx 和 chan::Rx 这两个结构里面。这两个类都持有一个 <code>Arc&lt;Chan&lt;T, S&gt;&gt;</code>。</p><p>我们会看到有两个 Tx：</p><ol><li>chan::Tx 是一个 <code>Arc&lt;Chan&lt;T, S&gt;&gt;</code>，它实际上封装了 Chan，更上层一点</li><li>list::Tx 是一个 Block 的链表。它是 <code>Chan&lt;T, S&gt;</code> 的 filed，更底层一点</li></ol><p>list::tx 是一个无锁队列。这个队列的内存是以 Block 为基础分配的，每个 block 能装 <code>const BLOCK_CAP: usize = 32;</code> 个 Value。所以，Chan 中实现了解耦：</p><ul><li>list 模块只负责无锁队列的实现</li><li>Chan 的其他部分负责容量控制、notify/waker 的逻辑</li></ul><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">pub</span>(<span class="keyword">super</span>) <span class="class"><span class="keyword">struct</span> <span class="title">Chan</span></span>&lt;T, S&gt; &#123;</span><br><span class="line">    <span class="comment">/// Handle to the push half of the lock-free list.</span></span><br><span class="line">    tx: CachePadded&lt;list::Tx&lt;T&gt;&gt;,</span><br><span class="line"></span><br><span class="line">    <span class="comment">/// Receiver waker. Notified when a value is pushed into the channel.</span></span><br><span class="line">    rx_waker: CachePadded&lt;AtomicWaker&gt;,</span><br><span class="line"></span><br><span class="line">    <span class="comment">/// Notifies all tasks listening for the receiver being dropped.</span></span><br><span class="line">    notify_rx_closed: Notify,</span><br><span class="line"></span><br><span class="line">    <span class="comment">/// Coordinates access to channel's capacity.</span></span><br><span class="line">    semaphore: S,</span><br><span class="line"></span><br><span class="line">    <span class="comment">/// Tracks the number of outstanding sender handles.</span></span><br><span class="line">    <span class="comment">///</span></span><br><span class="line">    <span class="comment">/// When this drops to zero, the send half of the channel is closed.</span></span><br><span class="line">    tx_count: AtomicUsize,</span><br><span class="line"></span><br><span class="line">    <span class="comment">/// Tracks the number of outstanding weak sender handles.</span></span><br><span class="line">    tx_weak_count: AtomicUsize,</span><br><span class="line"></span><br><span class="line">    <span class="comment">/// Only accessed by `Rx` handle.</span></span><br><span class="line">    rx_fields: UnsafeCell&lt;RxFields&lt;T&gt;&gt;,</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="block-的实现"><a href="#block-的实现" class="headerlink" title="block 的实现"></a>block 的实现</h3><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#[cfg(all(target_pointer_width = <span class="meta-string">"64"</span>, not(loom)))]</span></span><br><span class="line"><span class="keyword">const</span> BLOCK_CAP: <span class="built_in">usize</span> = <span class="number">32</span>;</span><br><span class="line"></span><br><span class="line"><span class="meta">#[repr(transparent)]</span></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">Values</span></span>&lt;T&gt;([UnsafeCell&lt;MaybeUninit&lt;T&gt;&gt;; BLOCK_CAP]);</span><br><span class="line"></span><br><span class="line"><span class="keyword">pub</span>(<span class="keyword">crate</span>) <span class="class"><span class="keyword">struct</span> <span class="title">Block</span></span>&lt;T&gt; &#123;</span><br><span class="line">    <span class="comment">/// The header fields.</span></span><br><span class="line">    header: BlockHeader&lt;T&gt;,</span><br><span class="line"></span><br><span class="line">    <span class="comment">/// Array containing values pushed into the block. Values are stored in a</span></span><br><span class="line">    <span class="comment">/// continuous array in order to improve cache line behavior when reading.</span></span><br><span class="line">    <span class="comment">/// The values must be manually dropped.</span></span><br><span class="line">    values: Values&lt;T&gt;,</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">/// Extra fields for a `Block&lt;T&gt;`.</span></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">BlockHeader</span></span>&lt;T&gt; &#123;</span><br><span class="line">    <span class="comment">/// The start index of this block.</span></span><br><span class="line">    <span class="comment">///</span></span><br><span class="line">    <span class="comment">/// Slots in this block have indices in `start_index .. start_index + BLOCK_CAP`.</span></span><br><span class="line">    start_index: <span class="built_in">usize</span>,</span><br><span class="line"></span><br><span class="line">    <span class="comment">/// The next block in the linked list.</span></span><br><span class="line">    next: AtomicPtr&lt;Block&lt;T&gt;&gt;,</span><br><span class="line"></span><br><span class="line">    <span class="comment">/// Bitfield tracking slots that are ready to have their values consumed.</span></span><br><span class="line">    ready_slots: AtomicUsize,</span><br><span class="line"></span><br><span class="line">    <span class="comment">/// The observed `tail_position` value *after* the block has been passed by</span></span><br><span class="line">    <span class="comment">/// `block_tail`.</span></span><br><span class="line">    observed_tail_position: UnsafeCell&lt;<span class="built_in">usize</span>&gt;,</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>看 Block::grow，就是一个无锁链表的实现 </p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">...</span><br><span class="line">        <span class="keyword">let</span> <span class="keyword">mut</span> curr = next;</span><br><span class="line"></span><br><span class="line">        <span class="comment">// <span class="doctag">TODO:</span> Should this iteration be capped?</span></span><br><span class="line">        <span class="keyword">loop</span> &#123;</span><br><span class="line">            <span class="keyword">let</span> actual = <span class="keyword">unsafe</span> &#123; curr.as_ref().try_push(&amp;<span class="keyword">mut</span> new_block, AcqRel, Acquire) &#125;;</span><br><span class="line"></span><br><span class="line">            curr = <span class="keyword">match</span> actual &#123;</span><br><span class="line">                <span class="literal">Ok</span>(()) =&gt; &#123;</span><br><span class="line">                    <span class="keyword">return</span> next;</span><br><span class="line">                &#125;</span><br><span class="line">                <span class="literal">Err</span>(curr) =&gt; curr,</span><br><span class="line">            &#125;;</span><br><span class="line"></span><br><span class="line">            crate::loom::thread::yield_now();</span><br><span class="line">        &#125;</span><br><span class="line">...</span><br></pre></td></tr></table></figure><h4 id="为什么是关于-Block-的无锁队列？"><a href="#为什么是关于-Block-的无锁队列？" class="headerlink" title="为什么是关于 Block 的无锁队列？"></a>为什么是关于 Block 的无锁队列？</h4><blockquote><p>为什么是关于 Block 的无锁队列，而不是关于 Value 的呢？</p></blockquote><ol><li>减少锁竞争和原子操作 (Reduced Contention and Atomic Operations)<br> 集中操作 (Batch Operations): 在高并发场景下，如果每次发送一个 Value 就需要对共享队列（如链表或原子指针）进行一次原子操作（例如 CAS - Compare-and-Swap）来添加节点，那么多个发送者 (Multi-Producer) 之间的竞争会非常激烈，导致性能瓶颈。<br> Block 机制: 使用 Block（块），每个 Block 中可以存放多个 Value。这样，发送者可以一次性分配一个 Block，并填充多个 Value。在将这个 Block 链接到队列末尾时，只需要进行一次原子操作来更新队列的尾指针。这大大减少了对核心共享数据结构（队列头/尾）的原子操作次数，从而降低了锁竞争和系统开销。<br> 局部性 (Locality): 一旦一个发送者成功获取并填充了一个 Block，它就可以在没有竞争的情况下，向该 Block 中写入若干条消息，这利用了 CPU 缓存的局部性原理。</li><li>优化内存分配 (Optimized Memory Allocation)<br> 批量分配: 每次发送一个 Value 就进行一次内存分配是低效的。Block 允许一次性分配一块较大的内存，用于存储多个 Value。<br> 更少的元数据: 如果每条 Value 都是一个独立的链表节点，那么每个节点都需要存储一个指针（指向下一个节点）作为元数据。在 Block 方案中，只有 Block 之间有链接指针，一个 Block 内部的多个 Value 可以紧凑存储，减少了内存开销。</li></ol><h2 id="channel-创建"><a href="#channel-创建" class="headerlink" title="channel 创建"></a>channel 创建</h2><h2 id="send-实现"><a href="#send-实现" class="headerlink" title="send 实现"></a>send 实现</h2><h3 id="unbounded"><a href="#unbounded" class="headerlink" title="unbounded"></a>unbounded</h3><p>unbounded 的 send 的实现，可以看到，最终调用了 Chan 的 send，后面会详细介绍。</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">pub</span> <span class="function"><span class="keyword">fn</span> <span class="title">send</span></span>(&amp;<span class="keyword">self</span>, message: T) -&gt; <span class="built_in">Result</span>&lt;(), SendError&lt;T&gt;&gt; &#123;</span><br><span class="line">    <span class="keyword">if</span> !<span class="keyword">self</span>.inc_num_messages() &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">Err</span>(SendError(message));</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">self</span>.chan.send(message);</span><br><span class="line">    <span class="literal">Ok</span>(())</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>inc_num_messages 的逻辑如下，主要是每次增加 2，保证末尾是 0</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> curr &amp; <span class="number">1</span> == <span class="number">1</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">.compare_exchange(curr, curr + <span class="number">2</span>, AcqRel, Acquire)</span><br></pre></td></tr></table></figure><h3 id="bounded"><a href="#bounded" class="headerlink" title="bounded"></a>bounded</h3><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">async <span class="function"><span class="keyword">fn</span> <span class="title">reserve_inner</span></span>(&amp;<span class="keyword">self</span>, n: <span class="built_in">usize</span>) -&gt; <span class="built_in">Result</span>&lt;(), SendError&lt;()&gt;&gt; &#123;</span><br><span class="line">    crate::trace::async_trace_leaf().await;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">if</span> n &gt; <span class="keyword">self</span>.max_capacity() &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">Err</span>(SendError(()));</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">match</span> <span class="keyword">self</span>.chan.semaphore().semaphore.acquire(n).await &#123;</span><br><span class="line">        <span class="literal">Ok</span>(()) =&gt; <span class="literal">Ok</span>(()),</span><br><span class="line">        <span class="literal">Err</span>(_) =&gt; <span class="literal">Err</span>(SendError(())),</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">pub</span> <span class="function"><span class="keyword">fn</span> <span class="title">max_capacity</span></span>(&amp;<span class="keyword">self</span>) -&gt; <span class="built_in">usize</span> &#123;</span><br><span class="line">    <span class="keyword">self</span>.chan.semaphore().bound</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">pub</span>(<span class="keyword">crate</span>) <span class="function"><span class="keyword">fn</span> <span class="title">acquire</span></span>(&amp;<span class="keyword">self</span>, num_permits: <span class="built_in">usize</span>) -&gt; Acquire&lt;<span class="symbol">'_</span>&gt; &#123;</span><br><span class="line">    Acquire::new(<span class="keyword">self</span>, num_permits)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="Chan-的-send-实现"><a href="#Chan-的-send-实现" class="headerlink" title="Chan 的 send 实现"></a>Chan 的 send 实现</h3><p>Chan 的 send 的实现如下</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">fn</span> <span class="title">send</span></span>(&amp;<span class="keyword">self</span>, value: T) &#123;</span><br><span class="line">    <span class="comment">// Push the value</span></span><br><span class="line">    <span class="keyword">self</span>.tx.push(value);</span><br><span class="line"></span><br><span class="line">    <span class="comment">// Notify the rx task</span></span><br><span class="line">    <span class="keyword">self</span>.rx_waker.wake();</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这个 rx_waker 是一个 AtomicWaker 对象。</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">pub</span>(<span class="keyword">crate</span>) <span class="class"><span class="keyword">struct</span> <span class="title">AtomicWaker</span></span> &#123;</span><br><span class="line">    state: AtomicUsize,</span><br><span class="line">    waker: UnsafeCell&lt;<span class="built_in">Option</span>&lt;std::task::Waker&gt;&gt;,</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h1 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h1>]]></content>
    
    
    <summary type="html">&lt;p&gt;基于 tokio 1.46.0 版本&lt;/p&gt;</summary>
    
    
    
    
    <category term="Rust" scheme="http://www.calvinneo.com/tags/Rust/"/>
    
  </entry>
  
  <entry>
    <title>“良定义”的状态</title>
    <link href="http://www.calvinneo.com/2025/11/20/well-defined-state/"/>
    <id>http://www.calvinneo.com/2025/11/20/well-defined-state/</id>
    <published>2025-11-20T15:09:06.000Z</published>
    <updated>2025-11-21T15:52:10.678Z</updated>
    
    <content type="html"><![CDATA[<p>我觉得“良定义”的状态需要具备：</p><ul><li>完备性</li><li>唯一性</li></ul><a id="more"></a><h1 id="Case-by-case"><a href="#Case-by-case" class="headerlink" title="Case by case"></a>Case by case</h1><h2 id="如何表示一个空区间？"><a href="#如何表示一个空区间？" class="headerlink" title="如何表示一个空区间？"></a>如何表示一个空区间？</h2><p>有一些 string，那么可以用 [s1, s2) 来圈出一些 string。因为是左闭右开区间，所以空集难以表示。这里只能通过</p><h2 id="如何表示“和上次一样”"><a href="#如何表示“和上次一样”" class="headerlink" title="如何表示“和上次一样”"></a>如何表示“和上次一样”</h2><h2 id="Option-lt-Vec-gt-还是-Vec？"><a href="#Option-lt-Vec-gt-还是-Vec？" class="headerlink" title="Option&lt;Vec&gt; 还是 Vec？"></a>Option&lt;Vec<t>&gt; 还是 Vec<t>？</t></t></h2><p>Vec 是 Option 的上位类型。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;我觉得“良定义”的状态需要具备：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;完备性&lt;/li&gt;
&lt;li&gt;唯一性&lt;/li&gt;
&lt;/ul&gt;</summary>
    
    
    
    
    <category term="数据结构" scheme="http://www.calvinneo.com/tags/数据结构/"/>
    
    <category term="编程思想" scheme="http://www.calvinneo.com/tags/编程思想/"/>
    
  </entry>
  
  <entry>
    <title>Efficient IO with io_uring 学习</title>
    <link href="http://www.calvinneo.com/2025/10/30/efficient-io-uring/"/>
    <id>http://www.calvinneo.com/2025/10/30/efficient-io-uring/</id>
    <published>2025-10-30T14:20:13.000Z</published>
    <updated>2025-11-06T15:16:59.866Z</updated>
    
    <content type="html"><![CDATA[<p>通过 <a href="https://kernel.dk/io_uring.pdf" target="_blank" rel="noopener">https://kernel.dk/io_uring.pdf</a> 简单学习下 io_uring。</p><a id="more"></a><h1 id="1-0-Introduction"><a href="#1-0-Introduction" class="headerlink" title="1.0 Introduction"></a>1.0 Introduction</h1><p>Linux 的读写 API 经历了：</p><ul><li><p>read</p></li><li><p>pread：增加了 offset</p></li><li><p>preadv：offset 是 iovec 的形式，就是支持分散读</p>  <figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">iovec</span></span></span><br><span class="line"><span class="class">&#123;</span></span><br><span class="line">    <span class="keyword">void</span> __user *iov_base;</span><br><span class="line">    <span class="keyword">__kernel_size_t</span> iov_len;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure></li><li><p>preadv2：加了 flags<br>  可以参考 <a href="https://www.man7.org/linux/man-pages/man2/preadv2.2.html" target="_blank" rel="noopener">https://www.man7.org/linux/man-pages/man2/preadv2.2.html</a></p></li></ul><p>但上述的 API 都是同步的。posix 有 <code>aio_</code> 系列的 API 标准，但是没啥人用，性能也不好。</p><p>Linux 有个 libaio，它和 POSIX 的 <code>aio_</code> 系列不是一个东西。但它也有问题：</p><ul><li><p>它要求 O_DIRECT，不然就和同步调用没啥区别。而 O_DIRECT 会 bypass cache，并且有严格的对齐要求，所以用途受限制。</p></li><li><p>即使满足 async 的所有条件，最终也不一定是 async 的。比如：</p><ul><li>如果要修改元数据，可能会 block</li><li>storage device 的 request slots 的数量是固定的<br>  这里的 request slots 表示 storage device 同时可以处理的并发数。<br>  传统存储协议如 SATA、SAS 中，只有一个命令队列，存放未完成的 io，它的长度就是 io depth。如果下层 storage device 的 request slots 数量小于 io depth，那么 io 请求就可能在 io 队列中等待。<br>  NVMe SSD 支持多个 Submission Queues (SQ) 和 Completion Queues (CQ)，每个 SQ 条目可对应一个正在执行的 I/O 命令。比如有 64 个 queue，每个 queue 深度是 1024，那么理论上最多可并行执行 64 × 1024 = 65536 个命令。</li></ul></li><li><p>提交一个 io 需要复制 64 + 8 bytes。完成一个 io 需要复制 32 bytes。</p>  <figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">iocb</span> &#123;</span></span><br><span class="line">   __u64   aio_data;</span><br><span class="line">   __<span class="function">u32   <span class="title">PADDED</span><span class="params">(aio_key, aio_rw_flags)</span></span>;</span><br><span class="line">   __u16   aio_lio_opcode;</span><br><span class="line">   __s16   aio_reqprio;</span><br><span class="line">   __u32   aio_fildes;</span><br><span class="line">   __u64   aio_buf;</span><br><span class="line">   __u64   aio_nbytes;</span><br><span class="line">   __s64   aio_offset;</span><br><span class="line">   __u64   aio_reserved2;</span><br><span class="line">   __u32   aio_flags;</span><br><span class="line">   __u32   aio_resfd;</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">io_event</span> &#123;</span></span><br><span class="line">    __u64   data;</span><br><span class="line">    __u64   obj;</span><br><span class="line">    __s64   res;</span><br><span class="line">    __s64   res2;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><p>  Depending on your IO size, this can definitely be noticeable.<br>  IO always requires at least two system calls (submit + wait-for-completion), which in these post spectre/meltdown days is a serious slowdown.</p></li></ul><h1 id="2-0-Improving-the-status-quo"><a href="#2-0-Improving-the-status-quo" class="headerlink" title="2.0 Improving the status quo"></a>2.0 Improving the status quo</h1><p>一开始有一些改良 libaio 的工作：</p><ul><li>If you can extend and improve an existing interface, that’s preferable to providing a new one.</li><li>It’s a lot less work in general.</li></ul><p>libaio 主要有三个接口：</p><ul><li>io_setup</li><li>io_submit 用来提交一个 io</li><li>io_getevents 用来等待完成，并收获结果</li></ul><p>后面觉得，这种改良会把接口改得非常复杂，而且只能解决上面列出的一个问题。</p><h1 id="3-0-New-interface-design-goals"><a href="#3-0-New-interface-design-goals" class="headerlink" title="3.0 New interface design goals"></a>3.0 New interface design goals</h1><ul><li>Easy to use, hard to misuse.</li><li>Extendable. 希望这个接口不止支持 block oriented IO。对于网络，和非块存储设备，它都能适用。</li><li>Feature rich. Linux aio caters to a subset (of a subset) of applications. I did not want to create yet another<br>interface that only covered some of what applications need, or that required applications to reinvent the same<br>functionality over and over again (like IO thread pools).</li><li>Efficiency. While storage IO is mostly still block based and hence at least 512b or 4kb in size, efficiency at those<br>sizes is still critical for certain applications. Additionally, some requests may not even be carrying a data payload.<br>It was important that the new interface was efficient in terms of per-request overhead.</li><li>Scalability. While efficiency and low latencies are important, it’s also critical to provide the best performance<br>possible at the peak end. For storage in particular, we’ve worked very hard to deliver a scalable infrastructure. A<br>new interface should allow us to expose that scalability all the way back to applications.</li></ul><h1 id="4-0-Enter-io-uring"><a href="#4-0-Enter-io-uring" class="headerlink" title="4.0 Enter io_uring"></a>4.0 Enter io_uring</h1><p>首先是摘录作者的感言，性能必须从一开始，在<strong>设计接口</strong>的时候就考虑。</p><blockquote><p>Despite the ranked list of design goals, the initial design was centered around efficiency. Efficiency isn’t something that can be an afterthought, it has to be designed in from the start - you can’t wring it out of something later on once the interface is fixed.</p></blockquote><p>作者认为，新的设计要避免 submission 和 completion 事件在内核和用户空间之间的复制，也要避免 indirection，所以他由浅及深得出了下面几点：</p><ol><li>内核和用户空间需要 share 这些结构</li><li>因此，这些结构应该在内核和用户的共享内存中</li><li>因此，必须要去维护这里面的同步关系</li><li>如果要用锁，那么就肯定会有系统调用，系统调用肯定 overhead 就大了</li><li>因此，single producer single consumer ring buffer 是适合的</li></ol><p>考虑到对于 submission 事件，用户是生产者，内核是消费者；而 completion 事件则相反。所以需要两个队列：SQ 和 CQ。</p><h2 id="4-1-DATA-STRUCTURES"><a href="#4-1-DATA-STRUCTURES" class="headerlink" title="4.1 DATA STRUCTURES"></a>4.1 DATA STRUCTURES</h2><p>cqe 的后缀表示 Completion Queue Event。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">io_uring_cqe</span> &#123;</span></span><br><span class="line">   <span class="comment">// 从 submission 中透传过来</span></span><br><span class="line">   __u64 user_data;</span><br><span class="line">   __s32 res;</span><br><span class="line">   __u32 flags;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><p>sqe 则复杂很多</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">io_uring_sqe</span> &#123;</span></span><br><span class="line">   <span class="comment">// 操作类型，例如 IORING_OP_READV 表示向量读</span></span><br><span class="line">   __u8 opcode;</span><br><span class="line">   __u8 flags;</span><br><span class="line">   __u16 ioprio;</span><br><span class="line">   __s32 fd;</span><br><span class="line">   __u64 off;</span><br><span class="line">   <span class="comment">// 指向内存地址，如果是向量读写，则指向一个 iovec array 的地址</span></span><br><span class="line">   __u64 addr;</span><br><span class="line">   <span class="comment">// 表示长度，或者 iovec array 的长度</span></span><br><span class="line">   __u32 len;</span><br><span class="line">   <span class="keyword">union</span> &#123;</span><br><span class="line">      <span class="keyword">__kernel_rwf_t</span> rw_flags;</span><br><span class="line">      __u32 fsync_flags;</span><br><span class="line">      __u16 poll_events;</span><br><span class="line">      __u32 sync_range_flags;</span><br><span class="line">      __u32 msg_flags;   </span><br><span class="line">   &#125;;</span><br><span class="line">   __u64 user_data;</span><br><span class="line">   <span class="keyword">union</span> &#123;</span><br><span class="line">      __u16 buf_index;</span><br><span class="line">      <span class="comment">// 64 bytes 对齐</span></span><br><span class="line">      __u64 __pad2[<span class="number">3</span>];</span><br><span class="line">   &#125;;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><h2 id="4-2-COMMUNICATION-CHANNEL"><a href="#4-2-COMMUNICATION-CHANNEL" class="headerlink" title="4.2 COMMUNICATION CHANNEL"></a>4.2 COMMUNICATION CHANNEL</h2><p>SQ 和 CQ 的 indexing 是不太一样的，先从简单的 CQ 开始。</p><p>cqe 是一个内核和用户共享的 ring buffer，内核写会更新 tail，用户读会更新 head。ring buffer 的大小是 2 的幂，它的好处我在 <a href="/2018/07/23/redis_learn_object/">Redis底层对象实现原理分析</a>中有所解析。</p><p>如下所示，head 是可以自然溢出的。当然，正如我在 <a href="/2017/12/05/libutp%E6%BA%90%E7%A0%81%E7%AE%80%E6%9E%90/">libutp源码简析</a>或者<a href="https://github.com/calvinneo/atp" target="_blank" rel="noopener">ATP</a>中的实现那样，当 tail 比 head 小的时候，我们也可以认为发生了溢出。<br><code>cqring-&gt;cqes</code> 是被共享的结构。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">unsigned</span> head;</span><br><span class="line">head = cqring-&gt;head;</span><br><span class="line">read_barrier();</span><br><span class="line"><span class="keyword">if</span> (head != cqring-&gt;tail) &#123;</span><br><span class="line">   <span class="class"><span class="keyword">struct</span> <span class="title">io_uring_cqe</span> *<span class="title">cqe</span>;</span></span><br><span class="line">   <span class="keyword">unsigned</span> index;</span><br><span class="line">   index = head &amp; (cqring-&gt;mask);</span><br><span class="line">   cqe = &amp;cqring-&gt;cqes[index];</span><br><span class="line">   <span class="comment">/* process completed cqe here */</span></span><br><span class="line">   ...</span><br><span class="line">   <span class="comment">/* we've now consumed this entry */</span></span><br><span class="line">   head++;</span><br><span class="line">&#125;</span><br><span class="line">cqring-&gt;head = head;</span><br><span class="line">write_barrier();</span><br></pre></td></tr></table></figure><p>SQ 这边，就是用户生产，内核消费了。之前说到，SQ 的 indexing 不一样，它是有个 indirection 的。submission 的 ring buffer 中存放了 index，索引到 sqe 中的位置。例如下面的例子中，提交顺序是：sqe5 → sqe2 → sqe3。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">SQ array: [5, 2, 3]</span><br><span class="line">SQEs:     [sqe0, sqe1, sqe2, sqe3, sqe4, sqe5]</span><br></pre></td></tr></table></figure><p>在文章中，作者提出一个好处是可以将 request units 放到 internal structure 中，我理解就是后面看到的自定义的 <code>app_sq_ring</code>。另外，也能允许在一个操作中提交多个 sqe。我理解就是如下代码所示，先 fill sqe，再写 array 的操作。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">io_uring_sqe</span> *<span class="title">sqe</span>;</span></span><br><span class="line"><span class="keyword">unsigned</span> tail, index;</span><br><span class="line">tail = sqring-&gt;tail;</span><br><span class="line">index = tail &amp; (*sqring-&gt;ring_mask);</span><br><span class="line">sqe = &amp;sqring-&gt;sqes[index];</span><br><span class="line"><span class="comment">/* this call fills in the sqe entries for this IO */</span></span><br><span class="line">init_io(sqe);</span><br><span class="line"><span class="comment">/* fill the sqe index into the SQ ring array */</span></span><br><span class="line">sqring-&gt;<span class="built_in">array</span>[index] = index;</span><br><span class="line">tail++;</span><br><span class="line">write_barrier();</span><br><span class="line">sqring-&gt;tail = tail;</span><br><span class="line">write_barrier();</span><br></pre></td></tr></table></figure><p>只要 sqe 被内核消费了，application 就可以复用 sqe entry，即使内核还没有完全处理完毕，内核会在需要的时候复制这个结构。</p><p>这样，sqe 的生命周期就比较短，而 application 可能会发送更多的 submission，从而导致 CQ ring 可能溢出。所以默认下的 CQ ring 的大小是 SQ ring 的两倍。</p><p>Completion events 可能以任意顺序到达，它和 submission 的顺序是没有关系的。SQ 和 CQ 两个 ring 是独立运行的。但是每个 submission 事件和每个 completion 事件都能一一对应。</p><h1 id="5-0-io-uring-interface"><a href="#5-0-io-uring-interface" class="headerlink" title="5.0 io_uring interface"></a>5.0 io_uring interface</h1><p>下面介绍的是 io_uring 的“裸”接口，即 system call。</p><h2 id="io-uring-setup"><a href="#io-uring-setup" class="headerlink" title="io_uring_setup"></a>io_uring_setup</h2><p>entries 的取值是 1..=4096，表示 sqe 的数量，必须是 2 的幂。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">io_uring_setup</span><span class="params">(<span class="keyword">unsigned</span> entries, struct io_uring_params *params)</span></span>;</span><br></pre></td></tr></table></figure><p>params 如下</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">io_uring_params</span> &#123;</span></span><br><span class="line">   <span class="comment">// 由内核填写，表示支持多少个 sqe</span></span><br><span class="line">   __u32 sq_entries;</span><br><span class="line">   <span class="comment">// 由内核填写，表示支持多少个 cqe</span></span><br><span class="line">   __u32 cq_entries;</span><br><span class="line">   __u32 flags;</span><br><span class="line">   __u32 sq_thread_cpu;</span><br><span class="line">   __u32 sq_thread_idle;</span><br><span class="line">   __u32 resv[<span class="number">5</span>];</span><br><span class="line">   <span class="class"><span class="keyword">struct</span> <span class="title">io_sqring_offsets</span> <span class="title">sq_off</span>;</span></span><br><span class="line">   <span class="class"><span class="keyword">struct</span> <span class="title">io_cqring_offsets</span> <span class="title">cq_off</span>;</span></span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">io_sqring_offsets</span> &#123;</span></span><br><span class="line">   __u32 head; <span class="comment">/* offset of ring head */</span></span><br><span class="line">   __u32 tail; <span class="comment">/* offset of ring tail */</span></span><br><span class="line">   __u32 ring_mask; <span class="comment">/* ring mask value */</span></span><br><span class="line">   __u32 ring_entries; <span class="comment">/* entries in ring */</span></span><br><span class="line">   __u32 flags; <span class="comment">/* ring flags */</span></span><br><span class="line">   __u32 dropped; <span class="comment">/* number of sqes not submitted */</span></span><br><span class="line">   __u32 <span class="built_in">array</span>; <span class="comment">/* sqe index array */</span></span><br><span class="line">   __u32 resv1;</span><br><span class="line">   __u64 resv2;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><p>io_uring_setup 返回的 int，实际上是一个 fd。如之前所说，这个 fd 是被内核和用户共享的。而 sq_off 和 cq_off 就表示了这共享内存中，SQ 和 CQ 的位置。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">define</span> IORING_OFF_SQ_RING          0ULL</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> IORING_OFF_CQ_RING  0x8000000ULL <span class="comment">// 128MB，指的那个 index 数组</span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> IORING_OFF_SQES    0x10000000ULL</span></span><br></pre></td></tr></table></figure><p>用户可以自定义 sq ring 的结构，这个结构中的每个字段都是一个指向到共享内存中位置的指针</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">app_sq_ring</span> &#123;</span></span><br><span class="line">   <span class="keyword">unsigned</span> *head;</span><br><span class="line">   <span class="keyword">unsigned</span> *tail;</span><br><span class="line">   <span class="keyword">unsigned</span> *ring_mask;</span><br><span class="line">   <span class="keyword">unsigned</span> *ring_entries;</span><br><span class="line">   <span class="keyword">unsigned</span> *flags;</span><br><span class="line">   <span class="keyword">unsigned</span> *dropped;</span><br><span class="line">   <span class="keyword">unsigned</span> *<span class="built_in">array</span>;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><p>如下面的 setup 所示，可以看到自定义的 sring 是如何通过 ptr 和 sq_off 组装起来的。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">struct app_sq_ring <span class="title">app_setup_sq_ring</span><span class="params">(<span class="keyword">int</span> ring_fd, struct io_uring_params *p)</span></span></span><br><span class="line"><span class="function"></span>&#123;</span><br><span class="line">   <span class="class"><span class="keyword">struct</span> <span class="title">app_sq_ring</span> <span class="title">sqring</span>;</span></span><br><span class="line">   <span class="keyword">void</span> *ptr;</span><br><span class="line">   ptr = mmap(<span class="literal">NULL</span>, p-&gt;sq_off.<span class="built_in">array</span> + p-&gt;sq_entries * <span class="keyword">sizeof</span>(__u32), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,</span><br><span class="line">   ring_fd, IORING_OFF_SQ_RING);</span><br><span class="line">   sring-&gt;head = ptr + p-&gt;sq_off.head;</span><br><span class="line">   sring-&gt;tail = ptr + p-&gt;sq_off.tail;</span><br><span class="line">   sring-&gt;ring_mask = ptr + p-&gt;sq_off.ring_mask;</span><br><span class="line">   sring-&gt;ring_entries = ptr + p-&gt;sq_off.ring_entries;</span><br><span class="line">   sring-&gt;flags = ptr + p-&gt;sq_off.flags;</span><br><span class="line">   sring-&gt;dropped = ptr + p-&gt;sq_off.dropped;</span><br><span class="line">   sring-&gt;<span class="built_in">array</span> = ptr + p-&gt;sq_off.<span class="built_in">array</span>;</span><br><span class="line">   <span class="keyword">return</span> sring;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="io-uring-enter"><a href="#io-uring-enter" class="headerlink" title="io_uring_enter"></a>io_uring_enter</h2><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">io_uring_enter</span><span class="params">(</span></span></span><br><span class="line"><span class="function"><span class="params">   <span class="keyword">unsigned</span> <span class="keyword">int</span> fd, <span class="comment">// io_uring_setup 返回的那个 fd</span></span></span></span><br><span class="line"><span class="function"><span class="params">   <span class="keyword">unsigned</span> <span class="keyword">int</span> to_submit, <span class="comment">// tells the kernel that there are up to that amount of sqes ready to be consumed and submitted</span></span></span></span><br><span class="line"><span class="function"><span class="params">   <span class="keyword">unsigned</span> <span class="keyword">int</span> min_complete, <span class="comment">// asks the kernel to wait for completion of that amount of requests.</span></span></span></span><br><span class="line"><span class="function"><span class="params">   <span class="keyword">unsigned</span> <span class="keyword">int</span> flags, </span></span></span><br><span class="line"><span class="function"><span class="params">   <span class="keyword">sigset_t</span> sig)</span></span>;</span><br></pre></td></tr></table></figure><p>可以发现，这个 syscall 可以同时 submit 和 wait for completion，这个也对应了本文作者之前提到的对 aio 的批评之一。</p><p>flags 中有一个参数，设置它，则内核会 actively wait for min_complete events to be available。简单来说，如果希望 wait for completion，则必须设置这个 flag。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">define</span> IORING_ENTER_GETEVENTS (1U &lt;&lt; 0)</span></span><br></pre></td></tr></table></figure><h2 id="5-1-SQE-ORDERING"><a href="#5-1-SQE-ORDERING" class="headerlink" title="5.1 SQE ORDERING"></a>5.1 SQE ORDERING</h2><p>这一节主要讲了如何实现 fsync/fdatasync。</p><p>因为之前提到 SQ 和 CQ 是完全独立的，所以这样的机制需要额外的设计。并且因为写入是乱序的，所以我们在乎的是确定所有的写入已经完成。</p><p>io_uring 的机制是，支持 draining the submission side queue，直到之前的 completion 事件都已经结束。在这之前，application 会将后续写入入队。</p><p>通过 IOSQE_IO_DRAIN 这个 flag 来实现这个特性，它会 stall 住整个 SQ。因此，application 可以考虑使用多个 io_uring context，来保证不相关的写是并行的。</p><blockquote><p>io_uring supports draining the submission side queue until all previous completions have finished. This allows the application queue the above mentioned sync operation and know that it will not start before all previous commandshave completed.</p></blockquote><h2 id="5-2-LINKED-SQES"><a href="#5-2-LINKED-SQES" class="headerlink" title="5.2 LINKED SQES"></a>5.2 LINKED SQES</h2><p>所有连续的指定了 IOSQE_IO_LINK 的 io 请求会被串联起来执行，这些请求一定是按照顺序执行的。但是它们和没有指定 IOSQE_IO_LINK 这个 flag 的请求之间的关系是不确定的。</p><h2 id="5-3-TIMEOUT-COMMANDS"><a href="#5-3-TIMEOUT-COMMANDS" class="headerlink" title="5.3 TIMEOUT COMMANDS"></a>5.3 TIMEOUT COMMANDS</h2><h1 id="6-0-Memory-ordering"><a href="#6-0-Memory-ordering" class="headerlink" title="6.0 Memory ordering"></a>6.0 Memory ordering</h1><p>在 <a href="/2017/12/28/Concurrency-Programming-Compare/">并发编程重要概念及比较</a> 中，我们知道 memory order 主要是考虑读-写和写-写问题，如下所示：</p><blockquote><p>read_barrier(): Ensure previous writes are visible before doing subsequent memory reads.<br>write_barrier(): Order this write after previous writes.</p></blockquote><p>我们也知道，不同的 CPU 架构的乱序执行逻辑是不一样的，所以这里只是讨论概念。</p><p>考虑用户侧写入一个 seq，并且通知 kernel 可以去消费了。这就包含了两个过程：</p><ul><li>填写 sqe 中的字段，并且将 sqe index 写入 SQ ring array</li><li>更新 SQ ring 队列的 tail</li></ul><p>这个操作可以简化成下面的伪代码，每一行代表一个内存操作。如果没有合适的 memory order，CPU 是有理由进行乱序执行的。也就是说，无法保证 write 7 是在最后执行的。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">1: sqe→opcode = IORING_OP_READV;</span><br><span class="line">2: sqe→fd = fd;</span><br><span class="line">3: sqe→off = 0;</span><br><span class="line">4: sqe→addr = &amp;iovec;</span><br><span class="line">5: sqe→len = 1;</span><br><span class="line">6: sqe→user_data = some_value;</span><br><span class="line">7: sqring→tail = sqring→tail + 1;</span><br></pre></td></tr></table></figure><p>所以，需要添加如下的 write barrier</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">1: sqe→opcode = IORING_OP_READV;</span><br><span class="line">2: sqe→fd = fd;</span><br><span class="line">3: sqe→off = 0;</span><br><span class="line">4: sqe→addr = &amp;iovec;</span><br><span class="line">5: sqe→len = 1;</span><br><span class="line">6: sqe→user_data = some_value;</span><br><span class="line"> write_barrier(); /* ensure previous writes are seen before tail write */</span><br><span class="line">7: sqring→tail = sqring→tail + 1;</span><br><span class="line"> write_barrier(); /* ensure tail write is seen */</span><br></pre></td></tr></table></figure><h1 id="7-0-liburing-library"><a href="#7-0-liburing-library" class="headerlink" title="7.0 liburing library"></a>7.0 liburing library</h1><p>通过这个库，可以：</p><ul><li>不需要写一堆 boiler plate code</li><li>不需要考虑 memory ordering 的问题</li><li>不需要考虑自己维护 ring buffer 的问题</li></ul><h2 id="7-1-LIBURING-IO-URING-SETUP"><a href="#7-1-LIBURING-IO-URING-SETUP" class="headerlink" title="7.1 LIBURING IO_URING SETUP"></a>7.1 LIBURING IO_URING SETUP</h2><h1 id="8-0-Advanced-use-cases-and-features"><a href="#8-0-Advanced-use-cases-and-features" class="headerlink" title="8.0 Advanced use cases and features"></a>8.0 Advanced use cases and features</h1><h2 id="8-1-FIXED-FILES-AND-BUFFERS"><a href="#8-1-FIXED-FILES-AND-BUFFERS" class="headerlink" title="8.1 FIXED FILES AND BUFFERS"></a>8.1 FIXED FILES AND BUFFERS</h2><h2 id="8-2-POLLED-IO"><a href="#8-2-POLLED-IO" class="headerlink" title="8.2 POLLED IO"></a>8.2 POLLED IO</h2><h2 id="8-3-KERNEL-SIDE-POLLING"><a href="#8-3-KERNEL-SIDE-POLLING" class="headerlink" title="8.3 KERNEL SIDE POLLING"></a>8.3 KERNEL SIDE POLLING</h2><h1 id="9-0-Performance"><a href="#9-0-Performance" class="headerlink" title="9.0 Performance"></a>9.0 Performance</h1><h1 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h1><ul><li><a href="https://man7.org/linux/man-pages/man2/io_submit.2.html" target="_blank" rel="noopener">https://man7.org/linux/man-pages/man2/io_submit.2.html</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;通过 &lt;a href=&quot;https://kernel.dk/io_uring.pdf&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;https://kernel.dk/io_uring.pdf&lt;/a&gt; 简单学习下 io_uring。&lt;/p&gt;</summary>
    
    
    
    
    <category term="Linux" scheme="http://www.calvinneo.com/tags/Linux/"/>
    
    <category term="FileSystem" scheme="http://www.calvinneo.com/tags/FileSystem/"/>
    
  </entry>
  
  <entry>
    <title>LLM 基础概念和核心问题整理</title>
    <link href="http://www.calvinneo.com/2025/10/25/on-llm/"/>
    <id>http://www.calvinneo.com/2025/10/25/on-llm/</id>
    <published>2025-10-24T18:20:13.000Z</published>
    <updated>2026-01-09T18:08:41.278Z</updated>
    
    <content type="html"><![CDATA[<p>主要关注：</p><ul><li>LLM 的基础原理</li><li>KVCache</li></ul><a id="more"></a><h1 id="Attention-机制"><a href="#Attention-机制" class="headerlink" title="Attention 机制"></a>Attention 机制</h1><p>NLP 对 token 序列 X 的三种编码方式：</p><ul><li><p>RNN<br>  RNN 是递归的结构，所以只能串行计算。</p>  <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">y_t = f(y_&#123;t-1&#125;, x_t)</span><br></pre></td></tr></table></figure></li><li><p>CNN<br>  CNN 能够并行计算，但是因为引入了窗口，所以只能看到局部信息。</p>  <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">y_t = f(x_&#123;t-1&#125;, x_t, x_&#123;t+1&#125;)</span><br></pre></td></tr></table></figure></li><li><p>Attention</p></li></ul><h2 id="Q、K、V"><a href="#Q、K、V" class="headerlink" title="Q、K、V"></a>Q、K、V</h2><p>从定义上看，对于 token 流</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">x1, x2, x3, ... xn</span><br></pre></td></tr></table></figure><p>每个 xi 通过三组线性变换生成：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">Qi = xi * Wq</span><br><span class="line">Ki = xi * Wk</span><br><span class="line">Vi = xi * Wv</span><br></pre></td></tr></table></figure><p>考虑下面的句子</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">The animal didn’t cross the street because it was too tired.</span><br></pre></td></tr></table></figure><p>句子中的每一个 token，都有一个自己的 Q。用 <code>Wq</code> 可以提取出这个 Q，如下：</p><ul><li>animal → 它在问：后面有没有补充信息？</li><li>because → 它在问：因果关系是什么？</li><li>it → 它在问：我指的是谁？</li><li>tired → 它在问：谁在 tired？</li></ul><p>对于 K，则告诉了这个 token 可以回答什么样的 Q：</p><ul><li>animal → 我是一个名词、可能是指代目标</li><li>street → 我是一个地点名词</li><li>because → 我是因果连接词</li><li>tired → 我是状态形容词</li><li>cross → 我是动作</li></ul><p>对于 V，承载了语义信息的本体：</p><ul><li>animal → 动物这个实体的语义</li><li>street → 街道的概念</li><li>tired → 疲劳的状态语义</li><li>cross → 穿越动作</li></ul><h2 id="Self-Attention"><a href="#Self-Attention" class="headerlink" title="Self-Attention"></a>Self-Attention</h2><p>Self-Attention 指的是 Q、K、V 都来自同一组 token 的 attention。即</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">X → Q = XWq</span><br><span class="line">X → K = XWk</span><br><span class="line">X → V = XWv</span><br></pre></td></tr></table></figure><p>然后</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">Attention(Q, K, V)</span><br></pre></td></tr></table></figure><p>为什么叫 Self？</p><ul><li>不是去外部文档查</li><li>不是去另一段文本查</li><li>而是在自己这段序列内部相互对齐</li></ul><p>对应的是 Cross-Attention：</p><ul><li>Q 来自 Decoder</li><li>K/V 来自 Encoder</li></ul><p>容易发现，Cross-Attention 更适合翻译或者对齐另一段文本。而 Self-Attention 更适合理解一句话内部关系。</p><h2 id="Multi-Head-Attention"><a href="#Multi-Head-Attention" class="headerlink" title="Multi-Head Attention"></a>Multi-Head Attention</h2><p>如果一个 token 同时想问多种不同类型的问题怎么办？引入多组问题呗。所以上面的三个矩阵会变成三组矩阵。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">Wq¹ Wk¹ Wv¹</span><br><span class="line">Wq² Wk² Wv²</span><br><span class="line">Wq³ Wk³ Wv³</span><br><span class="line">...</span><br></pre></td></tr></table></figure><p>还是对上面的例子而言</p><p>对 it 这个 token：</p><ul><li>Head1 问“我指代谁？”，会关注 animal（指代）</li><li>Head2 问“是否有因果关系？”，会关注 because（因果）</li><li>Head3 问“我与哪个动词相关？”，会关注 cross（动作）</li></ul><h1 id="KVCache"><a href="#KVCache" class="headerlink" title="KVCache"></a>KVCache</h1><h2 id="Why"><a href="#Why" class="headerlink" title="Why"></a>Why</h2><p>Token 是模型处理文本的最小离散单位。所以 LLM 并不是直接处理文字，而是直接处理 token。Token 是通过分词器从文本切出来的子串单位。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&quot;Hello world&quot; → [15496, 2159]</span><br></pre></td></tr></table></figure><p>但是</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">&quot;refund&quot; = 1 token</span><br><span class="line">&quot;refunding&quot; = [&quot;refund&quot;, &quot;ing&quot;]</span><br></pre></td></tr></table></figure><p>不同的模型能够接受不同的上下文长度，因此，它们的 KVCache 也要更大：</p><ul><li>GPT-3.5 4k tokens</li><li>GPT-4   8k / 32k</li><li>Claude  100k</li></ul><p>那么 token 是不是特指用户 prompt输入的 token 或者模型输出的 token 呢？其实根据下面的自回归生成，这两个是一个东西。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">prompt1 → prompt2 → prompt3 → output1 → output2 ...</span><br></pre></td></tr></table></figure><p>自回归生成:模型按顺序生成 token，每个 token 都只依赖之前已经生成的 token。<br>例如，下面的 token 序列中，x1 到 x3 是用户的 prompt 输入</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">x1 → x2 → x3 → x4 → ... → xT</span><br></pre></td></tr></table></figure><p>则 xT 生成的方式是</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">P(x_t | x_1, x_2, ..., x_&#123;T-1&#125;)</span><br></pre></td></tr></table></figure><p>由此可见，自回归生成导致推理是串行的。为什么要自回归生成呢？原因比较深入，可以理解为：</p><ul><li>语言本质是序列，唯一通用可行的分解方式就是 chain rule</li><li>自回归训练极其稳定<br>  输入是前缀，目标是预测下一个 token，loss 是交叉熵。不需要强化学习。</li><li>从历史演化角度来看，n-gram、RNN、LSTM 等都是自回归的</li></ul><p>因为自回归生成，所以导致了 KVCache 的出现。KVCache 把 O(n²) 变成 O(n)。</p><p>KVCache 具有如下的形式：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">KVCache[layer][token_index] = (K, V)</span><br></pre></td></tr></table></figure><ul><li><p>layer 是 Transformer 架构的层<br>  现在的 LLM 大都是 Decoder only 的架构，所以这里的层如下所示。</p>  <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">Input tokens</span><br><span class="line">  ↓</span><br><span class="line">Embedding</span><br><span class="line">  ↓</span><br><span class="line">Decoder Block 1</span><br><span class="line">  ↓</span><br><span class="line">Decoder Block 2</span><br><span class="line">  ↓</span><br><span class="line">...</span><br><span class="line">  ↓</span><br><span class="line">Decoder Block N</span><br><span class="line">  ↓</span><br><span class="line">LM Head</span><br></pre></td></tr></table></figure><p>  不同层的 schema 都不同：</p><ul><li>第一层：按字形索引</li><li>中间层：按语法索引</li><li>高层：按语义索引</li></ul><p>  每一层内部包含：</p><ul><li>Masked Self-Attention<br>  让每个 token 从历史 token 中选择性地读取信息。<br>  当前 token 只能看到自己和之前的 token，不能看到未来的 token。</li><li>FFN<br>  这里就是前馈神经网络，目的是在单个 token 维度上做语义升维和重映射。可以看成是在理解输入的 token。</li><li>Residual / Norm</li></ul></li><li><p>token_index 表示这是第几个 token<br>  因为 attention 在第 t 步需要：当前 token 的 Q、对比所有历史 token 的 K、加权读取所有历史 token 的 V。所以这些 K 和 V 需要按照 token_index 来存储。</p></li><li><p>K 和 V<br>  K 是我提供什么信息给别人关注。<br>  V 是别人关注我时能读到什么内容。</p></li><li><p>为什么不需要缓存 Q？<br>  因为 Q 只在当前步骤被使用。<br>  在第 t 步，会用 <code>Q_t</code> 去访问 <code>K_{0..t-1}</code>, <code>V_{0..t-1}</code>。但是在未来，不会再去访问 <code>Q_t</code> 了。</p></li></ul><p>如果没有 KVCache，每生成一个新 token 需要重新计算所有历史 token 的 K/V 复杂度是 O(n²)</p><h1 id="相关场景"><a href="#相关场景" class="headerlink" title="相关场景"></a>相关场景</h1><ul><li>训练 infra<br>  分布式训练、参数同步、checkpoint、通信优化<br>  DeepSpeed, Megatron-LM, FSDP, NCCL, ZeRO</li><li>推理 infra<br>  模型加载、KV Cache 管理、动态批处理、并发调度<br>  vLLM, TensorRT-LLM, TGI, Ray Serve</li><li>模型存储与加载<br>  权重分片、lazy loading、权重格式<br>  Safetensors, GGUF, Tensor Parallel</li><li>向量检索<br>  向量数据库、索引结构、量化 FAISS, Milvus, ScaNN, HNSW, IVF</li><li>资源编排与调度<br>  相比传统 k8s 多了 GPU 调度、混部、弹性伸缩。<br>  相关技术：K8S, Ray, RunPod, vGPU</li><li>数据管线与特征存储<br>  数据清洗、分片、版本控制<br>  Petastorm, Delta Lake, Feature Store</li></ul><h1 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h1><ul><li><a href="https://zh.d2l.ai/chapter_attention-mechanisms/attention-cues.html" target="_blank" rel="noopener">https://zh.d2l.ai/chapter_attention-mechanisms/attention-cues.html</a></li><li><a href="https://transformers.run/c1/attention/" target="_blank" rel="noopener">https://transformers.run/c1/attention/</a></li><li><a href="https://zhuanlan.zhihu.com/p/338817680" target="_blank" rel="noopener">https://zhuanlan.zhihu.com/p/338817680</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;主要关注：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM 的基础原理&lt;/li&gt;
&lt;li&gt;KVCache&lt;/li&gt;
&lt;/ul&gt;</summary>
    
    
    
    
    <category term="机器学习" scheme="http://www.calvinneo.com/tags/机器学习/"/>
    
    <category term="LLM" scheme="http://www.calvinneo.com/tags/LLM/"/>
    
  </entry>
  
  <entry>
    <title>Persistent data structures</title>
    <link href="http://www.calvinneo.com/2025/10/06/persistent-data-structures/"/>
    <id>http://www.calvinneo.com/2025/10/06/persistent-data-structures/</id>
    <published>2025-10-05T18:20:13.000Z</published>
    <updated>2025-11-21T17:13:07.931Z</updated>
    
    <content type="html"><![CDATA[<p>在 rust 中，immutable 的数据结构的性质是非常好的。在大部分函数式语言中，都不允许存在 mutable 的数据。</p><p>如果要在不可变数据结构上进行修改，就需要 clone 一份出来。因此：</p><ul><li>对于一些较大的结构，希望能够尽量复用</li><li>如果此时只有一份引用，则可以直接获取 mut 引用就地修改</li></ul><p>所以有了 Persistent data structures 的概念：</p><ul><li>每一次修改该结构，都会保留之前的版本</li><li>历史的版本可以被查询</li><li>如果历史版本的数据也支持修改，则称为 Full persistence，否则称为 Partial persistence</li></ul><a id="more"></a><h1 id="实现方案"><a href="#实现方案" class="headerlink" title="实现方案"></a>实现方案</h1><h2 id="Copy-on-Write"><a href="#Copy-on-Write" class="headerlink" title="Copy on Write"></a>Copy on Write</h2><p>用一个数组存放所有的历史版本，非常 bruteforce。</p><h2 id="Fat-node"><a href="#Fat-node" class="headerlink" title="Fat node"></a>Fat node</h2><p>为每一个 field 维护历史记录。例如</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">Node</span> &#123;</span></span><br><span class="line">    <span class="keyword">int</span> value;</span><br><span class="line">    Node* left;</span><br><span class="line">    Node* right;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>会变成</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">FatNode</span> &#123;</span></span><br><span class="line">    <span class="built_in">vector</span>&lt;Pair&lt;version, value&gt;&gt; value_history;</span><br><span class="line">    <span class="built_in">vector</span>&lt;Pair&lt;version, Node*&gt;&gt; left_history;</span><br><span class="line">    <span class="built_in">vector</span>&lt;Pair&lt;version, Node*&gt;&gt; right_history;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>查询的过程就有点像是 MVCC 了。给定一个 version，去 lower_bound 找到最大的小于 version 的修改。</p><p>容易看到，Fat node 不支持 Full persistence。</p><h2 id="Split"><a href="#Split" class="headerlink" title="Split"></a>Split</h2><p>Fat Node 不能无限制增长，否则：</p><ul><li>历史太长</li><li>查询复杂度上升</li><li>节点缓存局部性变差</li></ul><p>因此，需要将节点 split 为两个节点：</p><ul><li>新节点记录最新版本的值</li><li>老节点保留早期历史</li><li>结构中指向该节点的指针，也在相应版本中被更新为指向新的节点</li></ul><h2 id="Path-copying"><a href="#Path-copying" class="headerlink" title="Path copying"></a>Path copying</h2><h1 id="常见的结构实现"><a href="#常见的结构实现" class="headerlink" title="常见的结构实现"></a>常见的结构实现</h1><h2 id="List"><a href="#List" class="headerlink" title="List"></a>List</h2><h2 id="Vec"><a href="#Vec" class="headerlink" title="Vec"></a>Vec</h2><h1 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h1><ul><li><a href="https://github.com/orium/rpds" target="_blank" rel="noopener">https://github.com/orium/rpds</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;在 rust 中，immutable 的数据结构的性质是非常好的。在大部分函数式语言中，都不允许存在 mutable 的数据。&lt;/p&gt;
&lt;p&gt;如果要在不可变数据结构上进行修改，就需要 clone 一份出来。因此：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;对于一些较大的结构，希望能够尽量复用&lt;/li&gt;
&lt;li&gt;如果此时只有一份引用，则可以直接获取 mut 引用就地修改&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;所以有了 Persistent data structures 的概念：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;每一次修改该结构，都会保留之前的版本&lt;/li&gt;
&lt;li&gt;历史的版本可以被查询&lt;/li&gt;
&lt;li&gt;如果历史版本的数据也支持修改，则称为 Full persistence，否则称为 Partial persistence&lt;/li&gt;
&lt;/ul&gt;</summary>
    
    
    
    
    <category term="Rust" scheme="http://www.calvinneo.com/tags/Rust/"/>
    
    <category term="数据结构" scheme="http://www.calvinneo.com/tags/数据结构/"/>
    
  </entry>
  
  <entry>
    <title>Zero-Copy 技术</title>
    <link href="http://www.calvinneo.com/2025/09/30/zero-copy/"/>
    <id>http://www.calvinneo.com/2025/09/30/zero-copy/</id>
    <published>2025-09-30T15:07:22.000Z</published>
    <updated>2026-01-14T18:35:35.431Z</updated>
    
    <content type="html"><![CDATA[<p>介绍 Linux 中的零拷贝技术。从 <a href="/2025/03/09/learn-fuse/">Fuse 学习</a> 中独立出来。</p><a id="more"></a><h1 id="read、write-接口"><a href="#read、write-接口" class="headerlink" title="read、write 接口"></a>read、write 接口</h1><p>从普通文件 read，涉及两次复制：</p><ul><li>从磁盘通过 DMA 读到内核的 page cache<br>  这里的 page cache 机制也是一种 kernel buffer，但专门提供给磁盘文件的。</li><li>从内核的 page cache 复制到 user buffer</li></ul><p>从套接口读数据：</p><ul><li>从网卡通过 DMA 直接写入 kernel buffer</li><li>从 kernel buffer 复制到 user buffer</li></ul><p>注意，在使用 DMA 之前，磁盘读出来的数据会放到一个寄存器里面，然后通过中断通知 CPU 把数据写到临时的内存中攒批，最后写到 page cache 中。但是该方式性能太差，早已经淘汰了。</p><p>读数据过程：</p><ul><li>调用 read() 函数陷入内核，第一次 context switch</li><li>DMA 控制器将数据从磁盘拷贝到 kernel buffer，这是第一次 DMA 拷贝</li><li>CPU 将数据从 kernel buffer 复制到 user buffer，这是第一次 CPU 拷贝</li><li>CPU 完成拷贝之后，read() 函数返回到用户态，第二次 context switch</li></ul><p>写过程类似。</p><h1 id="mmap"><a href="#mmap" class="headerlink" title="mmap"></a>mmap</h1><p>把 kernel space 的页映射到 user space，所以可以避免从 kernel space 到 user space 的一次复制。<br>关于 mmap 可以见 <a href="/2025/01/03/memory-context-knowledge/">内存领域知识</a>。</p><h1 id="sendfile"><a href="#sendfile" class="headerlink" title="sendfile"></a>sendfile</h1><h2 id="原始-sendfile"><a href="#原始-sendfile" class="headerlink" title="原始 sendfile"></a>原始 sendfile</h2><p>sendfile 将数据从磁盘读到内核的 page cache，然后将 page cache 复制到 socket 的 buffer 中。</p><p>它的好处是减少了 syscall 的次数。将 read + write 或者 mmap + write 打包了。<br>但是，仍然需要 2 次 DMA 拷贝和 1 次 CPU 拷贝。</p><h2 id="sendfile-DMA-优化"><a href="#sendfile-DMA-优化" class="headerlink" title="sendfile + DMA 优化"></a>sendfile + DMA 优化</h2><p>将从 page cache 到 socket buffer 的那一次 CPU 拷贝去掉了。DMA 可以直接从 page cache 拷贝数据到网卡里面。</p><h1 id="splice"><a href="#splice" class="headerlink" title="splice"></a>splice</h1><p>限制是 fd_in 和 fd_out 中，至少有一个是 pipe：</p><ul><li>如果 fd_in 是 pipe，那么 off_in 必须是 NULL</li><li>如果 fd_in 不是 pipe，且 off_in 是 NULL，那么 bytes are read from fd_in starting from the file offset, and the file offset is adjusted appropriately.</li><li>如果 fd_in 不是 pipe，且 off_in 不是 NULL，off_in must point to a buffer which specifies the starting offset from which bytes will be read from fd_in; in this case, the file offset of fd_in is not changed, and the offset pointed to by off_in is adjusted appropriately instead.</li></ul><p>这里解释一下什么是 linux 中的管道：</p><ul><li>匿名管道（anonymous pipe）<br>  由父进程创建，用在具有亲缘关系的进程之间通信。<br>  通过 pipe() 系统调用创建，返回一对文件描述符：一个用于写，一个用于读。<br>  只存在于内存中，它不是一个磁盘上的文件，不能用 ls 查看，也没有 inode 号。</li><li>命名管道（named pipe，也叫 FIFO）<br>  具有名字的管道，可以存在于文件系统中，有路径。文件类型是 p，代表 pipe。<br>  通过 mkfifo 命令或者 mkfifo() 系统调用创建。<br>  可以实现非亲缘进程之间的通信。</li></ul><p>所有的匿名管道都支持 splice，通常借助匿名管道来实现 zero copy。此时，pipefd 就起到了中转管道的作用，它连接了两个彼此之间不支持零拷贝的 fd。我觉得是一个比较有意思的设计，通过匿名管道的中介，减少了不同 fd 之间实现相互 zero copy 的复杂度。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">splice(file_fd, <span class="literal">NULL</span>, pipefd[<span class="number">1</span>], <span class="literal">NULL</span>, len, <span class="number">0</span>);</span><br><span class="line">splice(pipefd[<span class="number">0</span>], <span class="literal">NULL</span>, socket_fd, <span class="literal">NULL</span>, len, <span class="number">0</span>);</span><br></pre></td></tr></table></figure><p>一些命名管道也支持 splice，但是可能只是可读写，非零拷贝中转。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">ssize_t</span> splice(<span class="keyword">int</span> fd_in, <span class="keyword">off_t</span> *_Nullable off_in,</span><br><span class="line">              <span class="keyword">int</span> fd_out, <span class="keyword">off_t</span> *_Nullable off_out,</span><br><span class="line">              <span class="keyword">size_t</span> size, <span class="keyword">unsigned</span> <span class="keyword">int</span> flags);</span><br></pre></td></tr></table></figure><p>Flag 如下：</p><ul><li>SPLICE_F_MOVE<br>  Attempt to move pages instead of copying. 这里的 move 指的是内核页缓存中的物理页面的引用在 fd 之间进行转移。而不需要读出、复制到 user space、写入这样的流程了。<br>  注意，这个 flag 只是一个 hint。如果内核无法移动，则还是需要复制。如果 pipe buffer 不指向整个页面。<br>  The initial implementation of this flag was buggy: therefore starting in Linux 2.6.21 it is a no-op (but is still permitted in a splice() call); in the future, a correct implementation may be restored.</li><li>SPLICE_F_NONBLOCK<br>  Do not block on I/O. This makes the splice pipe operations nonblocking, but splice() may nevertheless block because the file descriptors that are spliced to/from may block (unless they have the O_NONBLOCK flag set).</li><li>SPLICE_F_MORE<br>  More data will be coming in a subsequent splice. This is a helpful hint when the fd_out refers to a socket (see also<br>  the description of MSG_MORE in send(2), and the description of TCP_CORK in tcp(7)).</li><li>SPLICE_F_GIFT<br>  Unused for splice(); see vmsplice(2).</li></ul><h1 id="vmsplice"><a href="#vmsplice" class="headerlink" title="vmsplice"></a>vmsplice</h1><p>splice 主要是服务内核空间中的数据传输，原因是指定的都是 fd 或者 pipe，并不包含用户空间中内存的信息。<br>而 vmsplice 主要服务用户空间和管道之间的数据读写，它们都能实现零拷贝。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">define</span> _GNU_SOURCE         <span class="comment">/* See feature_test_macros(7) */</span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string">&lt;fcntl.h&gt;</span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">ssize_t</span> vmsplice(<span class="keyword">int</span> fd, <span class="keyword">const</span> struct iovec *iov,</span><br><span class="line">                <span class="keyword">size_t</span> nr_segs, <span class="keyword">unsigned</span> <span class="keyword">int</span> flags);</span><br></pre></td></tr></table></figure><p><code>iov</code> 是一个长度为 <code>nr_segs</code> 的数组，表示用户内存中的多段可能不连续的 buffer。</p><p>参数：</p><ul><li><p>SPLICE_F_MOVE<br>  Unused for vmsplice(); see splice(2).</p></li><li><p>SPLICE_F_NONBLOCK<br>  Do not block on I/O; see splice(2) for further details.</p></li><li><p>SPLICE_F_MORE<br>  Currently has no effect for vmsplice(), but may be implemented in the future; see splice(2).</p></li><li><p>SPLICE_F_GIFT<br>  The user pages are a gift to the kernel.<br>  表示用户程序不会修改这段 buffer，否则，page cache 和磁盘中的数据就可能不一致。<br>  将 pages gifting 给内核意味着后面的 splice SPLICE_F_MOVE 能够成功移动 pages。如果不指定，则后续的 splice SPLICE_F_MOVE 必须复制。<br>  数据必须要 page aligned。我理解这里指的是：</p><ul><li><code>iovec[i].iov_base</code> 需要对齐到页</li><li><code>iov_len</code> 需要是页大小的整数倍</li></ul><p>  如果不满足，则退化到 copy 的行为。</p></li></ul><h1 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h1><ul><li><a href="https://man7.org/linux/man-pages/man2/splice.2.html" target="_blank" rel="noopener">https://man7.org/linux/man-pages/man2/splice.2.html</a></li><li><a href="https://man7.org/linux/man-pages/man2/pipe.2.html" target="_blank" rel="noopener">https://man7.org/linux/man-pages/man2/pipe.2.html</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;介绍 Linux 中的零拷贝技术。从 &lt;a href=&quot;/2025/03/09/learn-fuse/&quot;&gt;Fuse 学习&lt;/a&gt; 中独立出来。&lt;/p&gt;</summary>
    
    
    
    
    <category term="Linux" scheme="http://www.calvinneo.com/tags/Linux/"/>
    
    <category term="FileSystem" scheme="http://www.calvinneo.com/tags/FileSystem/"/>
    
  </entry>
  
  <entry>
    <title>My Experience of Building a Hybrid Rust/C++ Project</title>
    <link href="http://www.calvinneo.com/2025/07/21/start-new-rust-project/"/>
    <id>http://www.calvinneo.com/2025/07/21/start-new-rust-project/</id>
    <published>2025-07-21T15:07:22.000Z</published>
    <updated>2026-01-29T19:11:06.591Z</updated>
    
    <content type="html"><![CDATA[<p>Since April 2025, I have been actively contributing to a new Rust–C++ project. Through this work, I have gained many valuable insights. Although I cannot disclose most project details, there are numerous technical challenges worth discussing.</p><p>One of the most notable aspects of this project is that it has been developed alongside the rapid evolution of AI agents, which led us to encounter many pitfalls when practicing vibe coding.</p><a id="more"></a><h1 id="About-Vide-Coding-The-benefits-and-the-pitfalls"><a href="#About-Vide-Coding-The-benefits-and-the-pitfalls" class="headerlink" title="About Vide Coding: The benefits and the pitfalls"></a>About Vide Coding: The benefits and the pitfalls</h1><h2 id="Pitfalls-of-Vibe-Coding"><a href="#Pitfalls-of-Vibe-Coding" class="headerlink" title="Pitfalls of Vibe Coding"></a>Pitfalls of Vibe Coding</h2><p>In early 2025, at the initial stage of our project, one of our core contributors quickly prototyped a demo using Cursor, covering multiple modules such as the read scheduler, index writer, and meta service.</p><p>Traces of this early implementation can still be found in the following pull requests:</p><ul><li><a href="https://github.com/pingcap-inc/tici/pull/48" target="_blank" rel="noopener">https://github.com/pingcap-inc/tici/pull/48</a></li><li><a href="https://github.com/pingcap-inc/tici/pull/673" target="_blank" rel="noopener">https://github.com/pingcap-inc/tici/pull/673</a></li></ul><h2 id="Make-AI-agent-more-focused"><a href="#Make-AI-agent-more-focused" class="headerlink" title="Make AI agent more focused"></a>Make AI agent more focused</h2><h3 id="Why-focused-attention-matters"><a href="#Why-focused-attention-matters" class="headerlink" title="Why focused attention matters"></a>Why focused attention matters</h3><p>Agents have a limited attention budget. When a prompt blends background, implementation details, and review notes, attention is spread thin and the model drifts. Treating the prompt as code and slicing the work into sub goals keeps the highest-signal spec in focus, which improves determinism, reduces requirement misses, and lowers evaluation cost.</p><h3 id="How-to-draw-attention-of-AI-agent"><a href="#How-to-draw-attention-of-AI-agent" class="headerlink" title="How to draw attention of AI agent"></a>How to draw attention of AI agent</h3><p>We can treat prompt as code. This is not a new idea in the industry, but many teams apply it unevenly. A common practice is to store prompts as markdown or templates under version control, so they can be reviewed, diffed, and rolled back like any other artifact. Some teams go further and build a “prompt registry” or config service to version prompts outside the codebase, and pair it with evaluation suites that act like unit tests for prompts (golden outputs, A/B runs, regression checks). Others embed prompts directly in application code as constants, which makes deployment easy but tends to hide intent and lose reviewability. The shared direction is clear: treat prompts as first-class assets with explicit structure, reviews, and tests.</p><p>I wrote a PR, <a href="https://github.com/pingcap-inc/tici/pull/692/files" target="_blank" rel="noopener">https://github.com/pingcap-inc/tici/pull/692/files</a>, where the core file is <code>prompts/0001-gc-cdc.md</code>. That file is the prompt itself, and it is committed with the PR, so anyone can start a session by loading the same prompt. It becomes versioned, diffable, and reviewable like real code, and the team no longer depends on a hidden chat history. Also, we can divide the whole goal into several sub goals and let AI agents to implement different sub goals sequentially or in parallel. And we don’t need to tell the AI the context everytime.</p><p>I think this can effectively make the AI agent more focused on what it needs to do, so as to generate bettwer codes with less resources.</p><p>Reviewers can write feedback directly in the <code>Reviews</code> section of the prompt doc, and I can pull that back into the next round of vibe. This workflow makes context length much less of a concern because the prompt is the canonical spec. A minimal shape looks like this:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"># Goal</span><br><span class="line"></span><br><span class="line">...</span><br><span class="line"></span><br><span class="line">## Sub goal 1</span><br><span class="line"></span><br><span class="line">In this sub goal, you need to ...</span><br><span class="line"></span><br><span class="line">### Programming Style</span><br><span class="line"></span><br><span class="line">### Musts</span><br><span class="line"></span><br><span class="line">### Tests</span><br><span class="line"></span><br><span class="line">## Reviews</span><br></pre></td></tr></table></figure><h2 id="Rust-as-the-language-of-“Vide-Coding-Era”"><a href="#Rust-as-the-language-of-“Vide-Coding-Era”" class="headerlink" title="Rust as the language of “Vide Coding Era”"></a>Rust as the language of “Vide Coding Era”</h2><p>As Rust becomes increasingly adopted in the “vibe coding era”, it does offer stronger guarantees against concurrency and memory errors. However, it is still too early to say that Rust is THE ONE.</p><p>Such an AT-Agent native language will likely consist of at least three distinct sub-languages:</p><ul><li>One for expressing intent<br>  This is the most critical language, because developers need a more efficient way to understand what AI agents have actually done.</li><li>One for validating correctness<br>  The language corresponds to the intent language, and is designed to describe test workflows more efficiently. Developers can fine-tune this part of the code to guide AI agents to generate correct code for corner cases.</li><li>One for concrete implementation<br>  This language is responsible for concrete implementation. Many AI agents, such as Codex, can already handle this layer well, as they rarely make trivial mistakes. Developers do not need to review this part of the code frequently, as other AI agents can handle the review instead.</li></ul><p>Only with this separation can we truly balance readability, robustness, and long-term maintainability. Furthermore, many of the current libraries can be rewritten to be more friendly to AI agent.</p><h2 id="Use-Skills"><a href="#Use-Skills" class="headerlink" title="Use Skills"></a>Use Skills</h2><p>I introduced several SKILLs into our new project. For example, <a href="https://gist.github.com/CalvinNeo/76811a9fbdd58d1bd271f17004051160" target="_blank" rel="noopener">ManualTest</a> enables AI agents to execute manual test cases automatically.</p><h1 id="Implement-tests"><a href="#Implement-tests" class="headerlink" title="Implement tests"></a>Implement tests</h1><h2 id="Hierarchy-of-tests"><a href="#Hierarchy-of-tests" class="headerlink" title="Hierarchy of tests"></a>Hierarchy of tests</h2><h3 id="The-problem"><a href="#The-problem" class="headerlink" title="The problem"></a>The problem</h3><p>Because each TiDB component is maintained in a separate repository, breaking changes in one component require coordinated adaptations across multiple repositories. Unfortunately, such adaptations cannot be performed atomically and are often non-trivial. While compilation flags or configuration options can sometimes be used to temporarily disable new features, this strategy is not always applicable. In particular, interface changes such as FFI definitions may break compatibility immediately. </p><p>In our project, an end-to-end (e2e) test starts a full cluster and asserts it from a client’s perspective, which in our case means sending SQL queries to the database service. Some of these tests are included in our CI pipeline. However, CI-based e2e tests cannot reliably detect adaptation issues. This creates a classic catch-22: resolving an adaptation problem requires updating all related components, yet the e2e tests cannot pass while you are still fixing the first component. As a result, most e2e tests are deferred to what we call the “daily tests”.</p><p>Nevertheless, we still need a subset of e2e tests in the CI pipeline. Although these tests may occasionally produce false positives due to compilation or adaptation issues, they provide valuable systematic checks to ensure that a new commit in one component does not break existing rules or behaviors. Deferring such checks to daily tests would be disastrous, as it makes bugs significantly harder to triage. When issues accumulate over time, the project can easily fall into a “bug jail,” where fixing new problems becomes increasingly expensive.</p><p>It is also worth noting that integration tests cannot practically detect all logical bugs. In many cases, module owners write integration tests mainly to verify that their own modules work with others, while overlooking the impact their changes may have on the system as a whole. This issue becomes even more critical when AI agents are used to refactor our code, as we need safeguards to ensure that unexpected behavior does not compromise the foundation of the project.</p><p>During the development and PoC stage of our project, several critical issues occurred because the tests were not correctly implemented, including:</p><ul><li>Module A uses the API of Module B in a wrong way. Neither is there integration test of Module A, nor its scene is covered in the e2e test.</li><li>Module C fails to verify a corner case, which is later caught by my <strong>embedded e2e test</strong> (introduced later). This kind of error is easy to be ignored, because it passes all tests except one. However, that single test protects our system from an availability failure caused by a deadlock in Module C.</li><li>Another component changed its convention for constructing a field in an RPC request without informing us, which caused the system to malfunction at the SQL layer and made the issue difficult to investigate. This problem was also detected by my <strong>embedded e2e test</strong>.</li></ul><h3 id="The-layers-of-tests"><a href="#The-layers-of-tests" class="headerlink" title="The layers of tests"></a>The layers of tests</h3><p>We can organize the tests for Component A into several layers. Each layer targets specific categories of bugs, and issues detected at lower layers should not propagate to higher layers.</p><ul><li>Daily regression: full system tests for cross-component compatibility only. However, no bugs originating from Component A itself should reach this level. We must proactively investigate daily test failures to avoid falling into a bug jail.</li><li>E2e tests with real components: We may occasionally allow skipping tests at this layer, because cross-component checks can fail due to upstream changes, as discussed earlier. However, bugs that originate within Component A itself must not propagate to this level.</li><li>Integration tests:<ul><li>Embedded e2e: at the component boundary, using mocked RPC/status/FFI calls; required when interface semantics change; validates external behavior and isolates compatibility issues.</li><li>Module integration: per-module behavior with or without mocks; required for new features or refactors; may need test framework enhancements.</li></ul></li><li>Unit tests: unit or local integration tests within a single module; most of these can be handled by AI agents.</li></ul><h3 id="The-embeded-e2e-test"><a href="#The-embeded-e2e-test" class="headerlink" title="The embeded e2e test"></a>The embeded e2e test</h3><p>This idea is based on the observation that a component’s behavior is defined by how it communicates with other components, through RPC, FFI, shared memory, and similar mechanisms.</p><p>Therefore, mocking these communications in integration tests provides the following benefits:</p><ul><li>We don’t need to start a full cluster, so we won’t face the adaption problem.</li><li>If an adaptation issue occurs, it can be easily reproduced at this level. This not only simplifies the debugging process, but also increases our confidence in the code.</li><li>This test treats our program as a black box, which makes it easier to implement because we do not need to understand how each module is implemented. These tests are expected to remain stable unless the interfaces or communication frameworks change.</li></ul><h3 id="Tests-as-the-Backbone-of-Vibe-Coding"><a href="#Tests-as-the-Backbone-of-Vibe-Coding" class="headerlink" title="Tests as the Backbone of Vibe Coding"></a>Tests as the Backbone of Vibe Coding</h3><p>In a Vibe Coding workflow, tests become the primary communication channel between intention and code. Among all types of tests, the embedded end-to-end (e2e) tests play a more and more important role.</p><p>Unlike unit tests, which specify local behavior, or integration tests, which usually verify a limited subsystem, my embedded e2e tests define system-level behavioral contracts. They describe what the system should do rather than how it should do it. This makes them naturally aligned with Test-Driven Development (TDD): they serve as executable specifications that drive the implementation.</p><h1 id="Systematic-choices"><a href="#Systematic-choices" class="headerlink" title="Systematic choices"></a>Systematic choices</h1><h2 id="Thread-or-coroutine"><a href="#Thread-or-coroutine" class="headerlink" title="Thread or coroutine?"></a>Thread or coroutine?</h2><p>Benefits of using tokio:</p><ul><li>Smaller memory cost, so we can create more coroutines.</li><li>Context switch is faster because there is no syscall.</li></ul><p>Pitfalls of using tokio:</p><ul><li>We cannot control the scheduling strategy of tokio’s runtime. For example, we cannot assign a priority to a specific task, nor can we limit the CPU quota of a particular class of tasks.</li><li>Switching to async code is often painful, as even the simplest function may become suspendable due to the use of <code>tokio::sync</code> locks.</li><li>It is hard to investigate deadlock / starvation problems.</li><li>Hard to use itertools. <code>futures::stream</code> can help, but it generates complex types.</li></ul><h3 id="Use-seperated-Runtime-for-different-task-pool"><a href="#Use-seperated-Runtime-for-different-task-pool" class="headerlink" title="Use seperated Runtime for different task pool?"></a>Use seperated Runtime for different task pool?</h3><p><code>Runtime</code> can only be created outside the “async context” of tokio. So if we need to use tuned <code>Runtime</code>s, we have to create them in advance. This involves lots of refactors.</p><h3 id="Propagate-the-panic-outward"><a href="#Propagate-the-panic-outward" class="headerlink" title="Propagate the panic outward"></a>Propagate the panic outward</h3><p>We must pay attention to panics inside the actor’s message loop: the handler, whether a thread or a coroutine, will only surface the panic when it is eventually joined, by which time the failure may have gone unnoticed for too long. What I recommend is to:</p><ul><li><p>Employ the <code>panic_hook</code> to capture the exact scene where things go wrong.</p>  <figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">panic::set_hook(<span class="built_in">Box</span>::new(|info| &#123;</span><br><span class="line">    eprintln!(<span class="string">"Task panicked: &#123;&#125;"</span>, info);</span><br><span class="line">    <span class="built_in">println!</span>(<span class="string">"Task panicked: &#123;&#125;"</span>, info);</span><br><span class="line">&#125;));</span><br></pre></td></tr></table></figure></li><li><p>Eliminate <code>unwrap</code>s and <code>expect</code>s</p>  <figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#![cfg_attr(not(test), deny(clippy::unwrap_used))]</span></span><br><span class="line"><span class="meta">#![cfg_attr(not(test), deny(clippy::expect_used))]</span></span><br></pre></td></tr></table></figure></li></ul><h2 id="Shared-Memory-or-Actor-model"><a href="#Shared-Memory-or-Actor-model" class="headerlink" title="Shared Memory or Actor model?"></a>Shared Memory or Actor model?</h2><p>If we use the coroutine runtime, we may need to decide how to handle race conditions.</p><h3 id="Why-are-“deadlock”s-so-hard-to-diagnose-when-using-coroutines"><a href="#Why-are-“deadlock”s-so-hard-to-diagnose-when-using-coroutines" class="headerlink" title="Why are “deadlock”s so hard to diagnose when using coroutines?"></a>Why are “deadlock”s so hard to diagnose when using coroutines?</h3><ol><li>There is neither a wait-for graph in the coroutine runtime nor one in the OS<br> <code>await</code> does not block a thread, so we can’t find anything with gdb/strace/perf.<br> Meanwhile, these “deadlocks” are hard to be detected because they appears that there is no CPU, no blocking thread, and the program is in a “vegetative state”.<br> Coroutine frameworks like <code>tokio</code> provides some o11y tools, however, they are hard to use, and have performance overhead.</li><li>No actual “deadlock”<br> These stalls are mostly “waiting for a train at a bus stop” errors. For example, we may read from a channel which will never be written, which is an easy mistake when we bail on an error without calling <code>.send()</code> first.<br> So we recommend to send a <code>Result&lt;T&gt;</code>, and implement a <code>Drop</code> trait that automatically sends <code>Err(Error::DropWithoutReport)</code> as a last-minute remedy.</li><li>No actual “stack”<br> Coroutines don’t carry a real stack. When they hit an await they yield a continuation, and that continuation may be resumed on the same or a different thread.</li></ol><h3 id="tokio-RwLock-or-std-sync-Mutex"><a href="#tokio-RwLock-or-std-sync-Mutex" class="headerlink" title="tokio::RwLock or std::sync::Mutex?"></a>tokio::RwLock or std::sync::Mutex?</h3><p>There is a public belief that we have to always use tokio locks in asynchronous code. However, according to the <a href="https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use" target="_blank" rel="noopener">reference</a> of tokio, it is ok and better to use synchronous locks such as <code>std::sync::Mutex</code> or <code>parking_lot::Mutex</code>.</p><p>I’d like to refer to these cases as “atomic access structures”, because they all follow the following patten:</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">Wrapped</span></span> &#123;</span><br><span class="line">    inner: Mutex&lt;<span class="built_in">String</span>&gt;,</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">impl</span> Wrapped &#123;</span><br><span class="line">    <span class="keyword">pub</span> <span class="function"><span class="keyword">fn</span> <span class="title">change_inner</span></span>(&amp;<span class="keyword">self</span>, s: <span class="built_in">String</span>) &#123;</span><br><span class="line">        <span class="keyword">self</span>.inner.lock().expect(<span class="string">""</span>) = s;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>The key point of this code is to avoid directly exposing the lock itself: we must not allow external callers to access it, and we must atomically release the lock after mutating the protected value. The underlying rationale is that we must not allow a coroutine to “sleep” while holding the lock, as this guarantees that no deadlocks will occur, because:</p><ul><li>If a coroutine holds the lock, it will not “sleep”, because the code <code>change_inner</code> is structured to avoid calling <code>.await</code> while the lock is held. Moreover, the executor thread will not sleep either, since it is not waiting on any condition.</li><li>If a coroutine does not hold the lock, it can eventually acquire it, because the current holder will release the lock promptly. And of course, the lock is released before any suspension point.</li></ul><p>For synchronous lock:</p><ul><li><code>std::sync::Mutex</code> supports poisoning. If a thread panics while holding the lock, future <code>lock()</code> calls return a <code>PoisonError</code>, forcing the caller to acknowledge that the protected state may be inconsistent. However, in most cases such an error can’t be processed, it will eventually lead to a panic, which is not elegant. There are also some other choices, simply accepting the risk is similar with <code>parking_lot::Mutex</code>, and resetting the state is not hazard if there are other threads waiting for this lock.</li><li><code>parking_lot::Mutex</code> is faster and smaller in many workloads (especially uncontended or lightly contended), but it does not poison. This often turns a hard failure into a latent, harder-to-debug corruption.</li></ul><p>The lack of poisoning is exactly why <code>parking_lot::Mutex</code> can be unsafe at the <em>logic</em> level. If a panic happens after partially mutating the protected value, the invariant is already broken. So, unless you can guarantee panic-free critical sections or have a clear recovery path, prefer the standard mutex to make invariant breaks visible. I think the best practice is to abort if any thread panics, which can be easily done with panic hooks.</p><h3 id="Implementing-an-“incomplete”-actor-mode"><a href="#Implementing-an-“incomplete”-actor-mode" class="headerlink" title="Implementing an “incomplete” actor mode"></a>Implementing an “incomplete” actor mode</h3><p>In the traditional actor model, each actor node encapsulates its own private data. However, this model is difficult to implement because:</p><ul><li>To rebalance data across nodes, we must introduce new message types and corresponding handlers.</li><li>Inspecting the internal state of actor nodes is difficult.</li></ul><p>So, as a simpler alternative, we can:</p><ul><li>Use a concurrent hash map to store all data, with each actor node mutating a portion of the map.</li><li>Allow other components to read or inspect entries in the concurrent hash map. Such inspectors can not mutate the entries, and their accessment must be atomic.</li></ul><p>A preferred candidate for the hash map is <code>DashMap</code>. Although this structure frees us from requiring <code>&amp;mut self,</code> most of its methods return a <code>Ref</code> or <code>RefMut</code> that holds a lock guard, so incorrect usage can lead to deadlocks. The following codes shows a simple example.</p><figure class="highlight rust"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#[test]</span></span><br><span class="line"><span class="function"><span class="keyword">fn</span> <span class="title">test_dashmap</span></span>() &#123;</span><br><span class="line">    <span class="keyword">let</span> map = DashMap::new();</span><br><span class="line">    map.insert(<span class="number">1</span>, <span class="number">1</span>);</span><br><span class="line">    map.insert(<span class="number">2</span>, <span class="number">2</span>);</span><br><span class="line"></span><br><span class="line">    <span class="keyword">for</span> entry <span class="keyword">in</span> map.iter() &#123;</span><br><span class="line">        <span class="built_in">println!</span>(<span class="string">"&#123;&#125; -&gt; &#123;&#125;"</span>, entry.key(), entry.value());</span><br><span class="line">        map.insert(<span class="number">3</span>, <span class="number">3</span>);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="built_in">println!</span>(<span class="string">"test end"</span>);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>There is a simple yet effective way to detect potential issues in our code: use <code>#[tokio::test]</code> instead of <code>#[tokio::test(flavor = &quot;multi_thread&quot;)]</code>. With the single-threaded runtime, the program will fail immediately if a coroutine “sleeps” while holding a lock.</p><h2 id="The-linking-problem"><a href="#The-linking-problem" class="headerlink" title="The linking problem"></a>The linking problem</h2><h3 id="FFI"><a href="#FFI" class="headerlink" title="FFI"></a>FFI</h3><p>TODO</p><h3 id="How-to-support-TLS"><a href="#How-to-support-TLS" class="headerlink" title="How to support TLS?"></a>How to support TLS?</h3><p>TODO</p><h2 id="Online-config-change"><a href="#Online-config-change" class="headerlink" title="Online config change"></a>Online config change</h2><p>There are some ways to update configs without restarting the program:</p><ul><li>For every actor, introduce a new <code>UpdateConfig</code> event, and handle it in the message loop.</li><li>Using <code>arc_swap</code>.</li></ul><p>I don’t think the service itself should persist the updated configuration to the config file. Instead, this should be handled by the operator.</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;Since April 2025, I have been actively contributing to a new Rust–C++ project. Through this work, I have gained many valuable insights. Although I cannot disclose most project details, there are numerous technical challenges worth discussing.&lt;/p&gt;
&lt;p&gt;One of the most notable aspects of this project is that it has been developed alongside the rapid evolution of AI agents, which led us to encounter many pitfalls when practicing vibe coding.&lt;/p&gt;</summary>
    
    
    
    
    <category term="C++" scheme="http://www.calvinneo.com/tags/C/"/>
    
    <category term="arch" scheme="http://www.calvinneo.com/tags/arch/"/>
    
    <category term="Rust" scheme="http://www.calvinneo.com/tags/Rust/"/>
    
    <category term="Articles" scheme="http://www.calvinneo.com/tags/Articles/"/>
    
    <category term="VibeCoding" scheme="http://www.calvinneo.com/tags/VibeCoding/"/>
    
  </entry>
  
  <entry>
    <title>乒乓球训练纪实</title>
    <link href="http://www.calvinneo.com/2025/06/11/table-tennis/"/>
    <id>http://www.calvinneo.com/2025/06/11/table-tennis/</id>
    <published>2025-06-11T15:07:22.000Z</published>
    <updated>2026-02-12T06:53:56.197Z</updated>
    
    <content type="html"><![CDATA[<p>因为五一节打羽毛球把膝盖打出问题了，现在主要学习乒乓球了</p><a id="more"></a><h1 id="Day-1"><a href="#Day-1" class="headerlink" title="Day 1"></a>Day 1</h1><p>主要修正了下反手动作。</p><p>这里，我最重要的是不要架肘。打的时候，可以用左手稍微按着一点左边的大臂。<br>然后，乒乓球反手是小臂带动大臂，大臂基本上不用特别发力，而是让小臂画 1/4 的圆弧。<br>在打完之后，需要还原。击球时，击球点不要太到台内，仿佛把整个手都要伸出去一般。<br>击球的是高点期。注意，不要急，而是等球到位之后再打，实际上质量更好。<br>当球落点朝着左边或者右边偏斜的时候，可以考虑靠只移动上半身来接。<br>作为初学者，不需要手腕有特别大的扭转。</p><p>-&gt; 初学者，可以采用下蹲马步，然后再站起来这样。但是最终，是要一直处于一种扎马步的状态，主要是前脚掌着地。然后用一些垫步还是啥的完成重心转换。</p><h1 id="Day-2"><a href="#Day-2" class="headerlink" title="Day 2"></a>Day 2</h1><p>主要纠正了下正手动作。</p><p>我正手有几点问题：</p><ul><li>撅手腕，教练说撅着手腕可能是因为击打点太靠前，以至于要“等球：</li><li>大臂架着</li><li>大臂还喜欢架着往后拉，实际上应该是转腰，而不是动大臂</li><li>握拍应该虎口直接对着拍子的“侧棱”，类似于羽毛球一样，我可能会喜欢转一点</li></ul><h1 id="Day-3"><a href="#Day-3" class="headerlink" title="Day 3"></a>Day 3</h1><p>继续是正手和反手的练习，下雨，只有 1.5h。</p><p>提到了一些细节：</p><ul><li>反手可以尝试蹲起，然后站立这种击打方式</li></ul><p>录制了一些视频，视频中可以发现，还是喜欢撅着手腕。</p><h1 id="Day-4"><a href="#Day-4" class="headerlink" title="Day 4"></a>Day 4</h1><p>继续是正手和反手的练习，有事，只有 1.5h。</p><h1 id="Aug-21"><a href="#Aug-21" class="headerlink" title="Aug 21"></a>Aug 21</h1><h1 id="Sept-12"><a href="#Sept-12" class="headerlink" title="Sept 12"></a>Sept 12</h1><h1 id="Sept-19"><a href="#Sept-19" class="headerlink" title="Sept 19"></a>Sept 19</h1><p>因为重新看了下医生，说暂时不要运动，因此这次就练了下搓球。</p><p>搓球的话主要几点：</p><ul><li>腿要伸进去</li><li>重心一定要压低压低！可以感受到脸要贴着桌子的感觉</li><li>击球点要低，基本上是二跳要落下，就要完蛋的那个时候搓。但尽管如此，人是要先上去的，所以这里有一个停顿</li></ul><h1 id="Sept-26"><a href="#Sept-26" class="headerlink" title="Sept 26"></a>Sept 26</h1><p>本周停止</p><h1 id="Oct-22"><a href="#Oct-22" class="headerlink" title="Oct 22"></a>Oct 22</h1><h1 id="Oct-30"><a href="#Oct-30" class="headerlink" title="Oct 30"></a>Oct 30</h1><ul><li>握拍得贴着虎口。因为虎口很大，所以是得靠大拇指一侧，而不是食指一侧</li><li>搓球的时候，中心前压，不是说要驼背，而是腹部发力</li><li>打球的时候，可以尝试只用前脚掌，把重心放到前脚掌上</li><li>摆速的时候，手不要拉</li><li>搓球的时候，中心要低，但不要趴桌上</li></ul><h1 id="Nov-13"><a href="#Nov-13" class="headerlink" title="Nov 13"></a>Nov 13</h1><p>正手虽然动作大，但是击球点靠后，所以空中时间长。</p><p>搓球亮板子是为了接触面积大。</p><h1 id="Nov-21"><a href="#Nov-21" class="headerlink" title="Nov 21"></a>Nov 21</h1><ul><li>搓球的时候，不要靠大臂往前戳。而是要以肘为轴心，小臂往前，大臂是不需要往前伸的</li></ul><h1 id="Dec-11"><a href="#Dec-11" class="headerlink" title="Dec 11"></a>Dec 11</h1><p>正手攻球打球有几点：</p><ul><li>上旋球，如果弧线比较低，也不需要提，直接打过去就行</li><li>攻球的时候，不仅要注意手不要翘，不要拉手。手腕最好跟搓球一样，能拱一点起来。整体来讲，就是要接触面积大。</li></ul><p>反手攻球，发现有时候会打下网，原因还是我手腕没有偏拱，而是有点把手翘起来的感觉。我在微信里面备注了个图，可以参考下。</p><p>另外，进一步看了正手搓球：</p><ul><li>重心还是要低，这里强调了一下，肘部也要低。我打球的时候，可能是提着肘，然后手在下面。其实应该肘和手都在下面。</li><li>可以想象，准备姿势，重心压低。这个时候，手臂是很贴近台面的。所以，正手搓球的时候，只需要把手臂打横就行了。</li><li>搓球也需要往前送，正手搓球不要往自己身体收前臂，而是往前送。</li><li>其实搓球搓起来，球不太容易很贴网的。</li></ul><h1 id="Jan-7"><a href="#Jan-7" class="headerlink" title="Jan 7"></a>Jan 7</h1><p>中间放假缺了一节课，另外和 fzh 又打了几场球，感觉正手搓球有点差了。主要体现在：</p><ul><li>伸手太靠近身体</li><li>身体的发力太多了，手上的动作少了</li><li>小臂不要翘上去，搞得像一个 V 字一样，还是要放下来一点</li></ul><p>在攻球方面，感觉膝盖好了不少。</p><p>然后初步学习一下反手拉球，感觉我自己的问题是：</p><ul><li>手肘架太高了，更像拧</li><li>可能是怕刮到球台，自己退台太多了</li></ul><p>这里的一个重点在于：</p><ul><li>手腕一定要引拍，让球拍对着自己</li><li>拉球完毕后，手腕应该恢复类似正常的攻球状态，不要往上翘。其实往上翘也是我之前攻球常见的一个错误问题</li></ul><p>我理解反手拉球随着你引拍是在身体左边、中间还是右边，动作的大小和幅度不太一样，但是框架是一样的，就是手腕要制造摩擦。</p><p>Jan 8 的时候又和 fzh 语音交流了一下。他的意思是反手拉球其实最重要的是能拉到别人搓过来的比较快并且贴身体的球。基本上退台是不会反手拉的。</p><h1 id="Jan-30"><a href="#Jan-30" class="headerlink" title="Jan 30"></a>Jan 30</h1><p>今天又是只有我一个人。</p><p>搓球：</p><ul><li>正手搓球不要在身体前面，而是要打开手臂，在身体右前方击球，原因是因为要有一致性。这样，站在那个位置，处理方式会有很多，不会给人感觉你就一定是搓球。不过打开手臂之后，还是要往前的，不能往身体方向收手臂。这个刻意注意一下就行。</li><li>另一种注意的办法是，搓球的时候可以有个停顿。这里的停顿不是说你就把拍子放在那里等球来了，我们肯定是要向着球制造摩擦的。但是就和羽毛球那样，得有一种“定”的感觉。或者可以认为伸手臂和往前搓实际上可以分解为两个动作。</li></ul><p>拉球：</p><ul><li>反手拉球直接跨步调整位置，不需要滑步。这是因为反手的地方就那么一块，滑步的动作太大了。</li></ul><p>另外，我觉得拉球我的问题主要是：</p><ul><li>不要边动边打球。脚步站好的时候，最好重心也调整好，比如该蹲那时候就蹲了，不要一边挥拍一边往下蹲，这样很难瞄准。</li><li>拍子要在下面，从下往上挥动。</li><li>拍子不要往上走，往上是通过手腕那个旋转来做的。手臂就是放松，自然展开就行。</li></ul><h1 id="Feb-12"><a href="#Feb-12" class="headerlink" title="Feb 12"></a>Feb 12</h1><p>学了下劈长。感觉这个很讲究瞬间发力。</p><ul><li>如果小臂移动太多，那么就容易出界</li><li>如果手腕移动太多，比如最后往上翘了，那么也容易出界</li></ul><p>另外今天体验了一下摆短。我确实可以摆起来，但是是卸力把球托过去的，并没有搓很转，所以没有什么威胁。</p><p>今天尝试了下拉球，就是彻底把手放下来，然后就发现有个明显的好处，就是如果我手自然下垂，那么我就没必要很刻意地去内旋我的手腕了，反而更轻松。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;因为五一节打羽毛球把膝盖打出问题了，现在主要学习乒乓球了&lt;/p&gt;</summary>
    
    
    
    
    <category term="运动" scheme="http://www.calvinneo.com/tags/运动/"/>
    
  </entry>
  
  <entry>
    <title>关西2</title>
    <link href="http://www.calvinneo.com/2025/04/08/kensai-2/"/>
    <id>http://www.calvinneo.com/2025/04/08/kensai-2/</id>
    <published>2025-04-08T12:06:11.000Z</published>
    <updated>2025-04-12T08:20:49.168Z</updated>
    
    <content type="html"><![CDATA[<p>趁着清明节又去了一趟关西。本以为是度假，但实际上累得要死。</p><a id="more"></a><h1 id="D1"><a href="#D1" class="headerlink" title="D1"></a>D1</h1><p>这一次从南京直飞大阪。机票照例没有提前多久定，但是两个人往返也才 4k 出头，相当便宜。宾馆就是贵的离谱了。京都的宾馆单人 1.5k，双人接近 4k 感觉简直在抢钱。我先在大阪定了 2 天 700 左右的。然后又订了一天姬路和大阪的。然后最后一天住哪不清楚，大概先这样。</p><p>关西机场入国变得麻烦多了，这次虽然没有坐小火车（吉祥航空），但入关排队感觉就花了将近一个小时了。在飞机上被发了入境单要填，我觉得挺麻烦的，现在都是电子化了啊。结果到了入境口发现有 abcd 四个 route，但完全不知道区别是什么。一个国际机场居然没有英文的说明，工作人员也只说日语和简单的英语。Anyway，他有个扫描护照的机器，感觉挺方便的。我用机器扫完，然后就再往前排队等人工。走到一半发现大家还有个 QR code 也不知道是啥，但是这些人扫完 QR 之后又要填一遍纸单子。。。</p><p>反正轮到我我就说 I have no QA code but I’ve already finished the note, and I’ve already registered on that machine. 然后那人把我的纸收了就直接贴入境单了，很丝滑很快。这次不去京都，就直接走南海电车了，更便宜。坐到 namba 才 900。</p><p>吃完饭，就去附近的 apple 店买个表。现在日本的 apple 店居然不能退税了，只有部分非官方 retailer 可以退税，但是我又不知道哪里靠谱。46mm 的要 3200，国内才 2600。但是因为国内啥啥都没，没快充慢的一批，所以还是买了。总不能为这个专门跑一次 hk 吧。</p><p>去 711 给交通卡充值，发现只能用现金。幸亏我带了 15000 jpy 来，本意是上次玩没用掉，这次结果救了命。</p><p>晚上出去吃饭回来，发现大阪真的冷，幸亏穿了两件，不然真的冻死。</p><p>回来发现日本的窗子真的好隔音，薄薄的平开窗，毫无特点的铝合金，居然这么猛。</p><h1 id="D2-京都"><a href="#D2-京都" class="headerlink" title="D2 京都"></a>D2 京都</h1><p>因为地铁卡充值要现金感觉不方便，所以我今天就打算不用地铁卡，用 iPhone Wallet 了。这鬼东西又不能用信用卡，我弄了半天才弄了个储蓄卡上去。上地铁又刷坏了，墨迹了半天。到了大阪梅田站，她出不了站，刷了半天不知道怎么就出去了。结果到了 JR 大阪站发现进不去，问工作人员说这个是 osaka subway，我觉得很奇怪，因为 icoca 是通用的，工作人员也说我的 iPhone 是可以的。于是我觉得肯定是被锁了。无论如何，这里可以支付宝买纸票，于是我们就去京都了。</p><p>京都的公交车排队排了感觉有半个多小时，后面上了个临时的加班车。这边还不让带行李上公交车，幸亏我们都放在了大阪，就带了个小包。公交车里面非常拥挤，但我们进去的早，所以有个位置。公交车开的特别慢，感觉甚至不如走路。开到清水寺附近感觉花了大概二十多分钟。我们试图在五年阪下车，结果被堵住了，然后司机就不让我们下了。然后后面一个傻逼老外就说 you come here late, you have to follow their rules, it’s not your country 啥的。非常 offensive。</p><p>清水寺我们没进去，我对象说没啥意思，我之前去过，感觉也没啥意思，人还特别多。门口的御守他也觉得贼丑。然后我们就顺着三年半二年阪往下走，去找法观寺。我们都已经看到高台寺公园了，发现法观寺走过了，有绕回去找。结果法观寺就是我们来的路上的那个我觉得一般的唐代风格的塔，好像是京都的最老的塔，也不让进去。然后我们又走回到高台寺。</p><p>高台寺的垂樱是挺好看的。</p><p>高台寺出来很快就到了八阪神社，里面和好吃街一样，我们找了下垂樱在哪里，就去吃那个鳗鱼饭了。</p><p>鳗鱼饭吃完出来，就在木屋町那条小路那边拍樱花，感觉比鸭川的樱花好看。</p><p>然后又绕回去看了下花见小路，照旧非常无聊。路尽头是什么建仁寺的，大家都没有兴趣看。</p><p>然后又走了一大段路去看顶法寺。顶法寺不要钱，里面的樱花很漂亮。还有几个小和尚的雕塑也很有意思。</p><p>晚上实在走不动了，就坐了京阪电车，因为阪急还要多走 300m。至于京都 JR，因为坐公交体验太差，完全不考虑了。我们坐的是 18.59 的京阪电车去的大阪。其实以后真的可以坐京阪电车到京都东边，比 JR 方便很多，还便宜。但这次京阪电车属实是个坑货。它终点站在大阪是 yodoyabashi 淀屋桥，但是又有一些车是到一个叫 yodo 淀的地方。然后我就被这个车丢在了 yodo。后面上了个准急的，几乎就是站站乐了，感觉总共花了大概一个多小时才到大阪。</p><p>回到大阪，找大阪地铁说明了情况，列车长问我们要不要 refund。我说不要，他就说 OK。然后就解锁了，结果发现之前坐的 namba 的南海的钱居然也没被扣。南海真的血亏啊。</p><h1 id="D3-姬路"><a href="#D3-姬路" class="headerlink" title="D3 姬路"></a>D3 姬路</h1><p>起来去姬路。这次发现他们电车同一个方向有两个轨道，一边是 local 一边是 express。上车前可以看站台上的屏幕，上车时可以看车的屏幕确认。上车前可以看 Google 或者 Apple 的 map 到达的时间确定自己是不是对应班次的车。上车后也可以听播报。基本上 Rapid、Express 的车都比较推荐，大阪、神户市区的大站基本都停。Limited Express 特急要特价券我没见过。</p><p>姬路是个小城市，我们的酒店在车站南边一点。应该是此行中比较大的酒店了。酒店不能提前办理入住，但是可以预先寄存。</p><p>姬路站北面正对姬路城，通过一个大道可以直接走过去。大道两侧不少店铺比较出名，我们吃了个咖啡，然后就立即前往姬路城了。</p><p>城里面樱花很漂亮，右边走还有个动物园。登城口在左边，要排队，但是队伍很快，等了大概半小时就进去了。买票 +50 yen 就能得到一个旁边的花园的门票。花园里面有个餐厅，我们去的时候已经关门了。姬路城主要就是两块，一个是西之丸庭院，可以脱掉鞋子登上去，类似于一个走廊，有多个城橹。从化妆橹可以下来，据说这个是给城主夫人化妆用的地方。西之丸庭院相比大阪城比较小巧，里面也有不少樱花。</p><p>从庭院出来就可以走到天守的口了。然后就是穿过一道道什么 yi 之门、wa 之门、ni 之门，然后走到天守的下面的小庭院内。然后就走过水之 X 门走到大天守里面。大天守有 6 层，第 2 层开始大排长队，人贼多。</p><h1 id="D4-神户"><a href="#D4-神户" class="headerlink" title="D4 神户"></a>D4 神户</h1><p>神户的酒店在三宫门口，应该是本次我找的最近、最便宜的酒店了，才 500。住宿体验很舒适。</p><p>放完东西，去神户动物园。我们应该是顶门到的，进去刷票的时候，我对象把票根掉了，检票员说必须要把票找回来。</p><p>在那个破落商店里面有一家后来知道叫 Yellow Submarine 的桌游超市，还挺大的。回来我去问了下有没有那个骰子游戏，他带我去找，然后翻了下，说 sold out 了。我以为附近南京町还有一家，走过去发现那是个别的地方，已经关门了。</p><h1 id="D5-奈良-大阪"><a href="#D5-奈良-大阪" class="headerlink" title="D5 奈良 - 大阪"></a>D5 奈良 - 大阪</h1><p>下了奈良站，在 5 号口附近找到一个柜子，600 yen 就可以放两个小箱子和一个包了。不过只接受 100 yen 的硬币，我还要去旁边换。</p><p>出去吃了那个口水麻薯，一堆老外在那边拍照。去看了兴福寺，要钱，没意思没进去。兴福寺下来就能看到鹿，鹿很现实，看到我们没买饼就不磕头了。我们到最后也没买，主打白嫖。</p><p>旁边的奈良博物馆关门了。</p><p>我印象里上次去若草山有个大坡可以坐着休息，但这次去好像就是一些大草坪可以走，有椅子可以坐。大草坪也许是封起来了吧，因为当时看到大草坪上没有人，而且往若草山方向有被拦住。</p><p>在大阪这最后一晚住的酒店是最拉胯的。</p><h1 id="D6-大阪"><a href="#D6-大阪" class="headerlink" title="D6 大阪"></a>D6 大阪</h1>]]></content>
    
    
    <summary type="html">&lt;p&gt;趁着清明节又去了一趟关西。本以为是度假，但实际上累得要死。&lt;/p&gt;</summary>
    
    
    
    
    <category term="游记" scheme="http://www.calvinneo.com/tags/游记/"/>
    
  </entry>
  
  <entry>
    <title>Database paper part 7</title>
    <link href="http://www.calvinneo.com/2025/03/15/database-paper-7/"/>
    <id>http://www.calvinneo.com/2025/03/15/database-paper-7/</id>
    <published>2025-03-15T13:33:22.000Z</published>
    <updated>2025-03-15T14:00:06.214Z</updated>
    
    <content type="html"><![CDATA[<p>包含：</p><ul><li>We Ain’t Afraid of No File Fragmentation: Causes and Prevention of Its Performance Impact on Modern Flash SSDs</li></ul><a id="more"></a><h1 id="We-Ain’t-Afraid-of-No-File-Fragmentation-Causes-and-Prevention-of-Its-Performance-Impact-on-Modern-Flash-SSDs"><a href="#We-Ain’t-Afraid-of-No-File-Fragmentation-Causes-and-Prevention-of-Its-Performance-Impact-on-Modern-Flash-SSDs" class="headerlink" title="We Ain’t Afraid of No File Fragmentation: Causes and Prevention of Its Performance Impact on Modern Flash SSDs"></a>We Ain’t Afraid of No File Fragmentation: Causes and Prevention of Its Performance Impact on Modern Flash SSDs</h1><p><a href="https://www.usenix.org/system/files/fast24-jun.pdf" target="_blank" rel="noopener">https://www.usenix.org/system/files/fast24-jun.pdf</a></p><h2 id="Abstract"><a href="#Abstract" class="headerlink" title="Abstract"></a>Abstract</h2><blockquote><p>The primary cause of the degraded performance is <strong>not due to request splitting</strong> but stems from a significant increase in <strong>die-level collisions</strong>.</p></blockquote><p>如果在写连续的 file block 中，有其他的写入过来，那么这些 file block 就不会在连续的 die 上，从而产生 random die allocation。这种情况比如发生在 file overwrite 的时候。</p><blockquote><p>In SSDs, when other writes come between writes of neighboring file blocks, the file blocks are not placed on consecutive dies, resulting in random die allocation. This randomness escalates the chances of die-level collisions, causing deteriorated read performance later. We also reveal that this may happen when a file is overwritten.</p></blockquote><p>Evaluations with commercial SSDs and an SSD emulator indicate that our approach effectively curtails the read performance drop arising from both fragmentation and overwrites, all without the need for defragmentation. Representatively, when a 162 MB SQLite database was fragmented into 10,011 pieces, our approach limited the performance drop to 3.5%, while the conventional system experienced a 40% decline.</p><h2 id="Introduction"><a href="#Introduction" class="headerlink" title="Introduction"></a>Introduction</h2><p>To prevent performance degradation caused by fragmentation, file systems utilize various techniques [35], such as delayed allocation [23] and preallocation of data blocks [2], to maintain continuity among data blocks. </p><p>SSD 中没有磁头的物理移动，所以减少了顺序读和随机读之间的性能 gap。但 [4] 中说，SSD 上读 fragmented 的文件，也有 2-5 倍的性能损失。诸如 [13, 31, 42] 的文件只是认为这些性能损失的原因是 request splitting in the kernel I/O path due to fragmentation。</p><p>这篇文章指出 fragmentation 导致的性能损失实际上根因是 die-level collisions。而 die-level collisions 会减少 SSD 内部的并发度。</p><blockquote><p><strong>An SSD’s firmware allocates its flash memory pages in a round-robin manner across the flash memory dies based on the order in which they are written.</strong></p></blockquote><p>所以，如果发生了 fragmentation，那么 the pages storing contiguous file blocks 不能被放置在 contiguous dies 上，而是被分配在任意的 dies 上。</p><p>这个论文修改了 nvme 的协议，让 write 命令指定 page-to-die mapping。</p><blockquote><p>With these hints, the page for an appending write is mapped to the die following the die where the previous file block’s page was assigned to. In addition, the page for an overwrite operation to an existing file block, which also disrupts the page-to-die mapping pattern, is mapped to the same die where the original page was located.</p></blockquote><h2 id="Background-and-Motivation"><a href="#Background-and-Motivation" class="headerlink" title="Background and Motivation"></a>Background and Motivation</h2><h3 id="Old-Wisdom-on-File-Fragmentation"><a href="#Old-Wisdom-on-File-Fragmentation" class="headerlink" title="Old Wisdom on File Fragmentation"></a>Old Wisdom on File Fragmentation</h3><blockquote><p>In the HDD era, the primary and direct cause of performance degradation from file fragmentation was <strong>the seek time</strong> between dispersed sectors of the file.</p></blockquote><p>Fragmentation 对读取的影响更大，因为读取必须要等待完成，而写入则可以被 buffer。</p><p>Fragmentation 在三个层面影响性能：</p><ul><li>kernel I/O path<br>  Only a single command is required for the host to instruct the storage device to perform read or write operations on contiguous storage space.<br>  Thus, when a sequential read occurs for a file, the Linux kernel reads the data block mapping in the file’s inode, and for each contiguous data block region, it creates a bio (block I/O) data structure. This data structure is used to create the corresponding request data structure to be passed to the device driver, which then issues the command for the request to the device.<br>  Through this process, a single sequential file access may be <strong>split into multiple <code>bio</code>s</strong> and corresponding requests to the storage device, depending on the degree of file fragmentation.</li><li>storage device interface<br>  This request splitting is known to increase I/O execution time, as it increases the number of data structure creations and calls to underlying functions, including the device driver code.<br>  Specifically, the frequency of fetching, decoding, translating commands into storage media operations, and queuing media access operations increases. Therefore, file fragmentation also delays the processing time of the storage device controller.</li><li>storage media access</li></ul><p><img src="/img/dbpaper/ssd-die-collision/1.png"></p><p>ext4 为了减少 fragment 产生的优化：</p><ul><li>The <strong>delayed allocation</strong> technique used in the <strong>ext4</strong> file system performs data block allocation not at the write system call handling but <strong>at the time of page flush</strong>.</li><li>In addition, ext4 reserves a predefined window of free data blocks for each file’s inode. These reserved free blocks will be actually allocated to the file for its successive append writes.</li></ul><p>defragmentation 的手段：</p><ul><li>【Sato】Allocates contiguous free blocks to a temporary inode, copies the fragmented file data to the temporary inode, deletes the original file, and renames the temporary inode to the original’s.</li></ul><h3 id="File-Fragmentation-in-SSD-Era"><a href="#File-Fragmentation-in-SSD-Era" class="headerlink" title="File Fragmentation in SSD-Era"></a>File Fragmentation in SSD-Era</h3><p>很多学者和厂商说 SSD 不受 fragmentation 的影响，defragmentation 反而可能会损害 SSD 的寿命。</p><p>SSDs offer significantly higher performance than a single flash memory die (chip) because they operate multiple flash dies in parallel. </p><p>NVMe 有 65535 个命令队列，每个队列能 queue 65535 个 commands。总共 65535 的平方，可以说非常大了。</p><blockquote><p>Specifically, NVMe SSDs offer 65,535 command queues, each capable of queueing 65,536 commands.</p></blockquote><blockquote><p>Even when fragmentation leads to smaller request sizes that cannot fully utilize die-level parallelism, smaller flash operations in the command queues can still be processed out-of-order, allowing most dies to be fully utilized.</p></blockquote><p>因此，很多学者认为 kernel I/O path 和 storage device interface 中的 request splitting 是影响性能的关键。</p><h3 id="Internals-of-Modern-Flash-SSDs"><a href="#Internals-of-Modern-Flash-SSDs" class="headerlink" title="Internals of Modern Flash SSDs"></a>Internals of Modern Flash SSDs</h3><blockquote><p>a die can only process one request at a time.</p></blockquote><p>FTL 会将需要写入的 page 存储到尽可能多的 die 中。</p><blockquote><p>To prevent die-level collisions for read operations, the flash translation layer (FTL) of an SSD’s firmware must perform physical page allocation in a manner that distributes the physical pages storing contiguous logical pages across as many dies as possible.</p></blockquote><p>所以，是 rr 地选择 die，而不是一股脑全写到一个 die 里面。</p><blockquote><p>For this purpose, the FTL of most modern SSDs selects a die in a round-robin manner when allocating a flash page for processing an incoming page write request.</p></blockquote><blockquote><p>Additionally, modern FTLs perform the valid page copy within the die where the page resides during the garbage collection (GC) process if the die has a sufficient number of free pages.</p></blockquote><p>For example, in Fig. 2, File A is evenly distributed across four dies since its four pages were written without interference. Thus, a sequential read of File A will be performed simultaneously on these four dies, resulting in a bandwidth of up to four times the flash die performance. </p><p>In contrast, assume that the writes to File B and File C were interleaved. As the die for storing a logical page is assigned in a round-robin manner according to the order of writes performed within the SSD, both the third and last pages of File B ended up being allocated to Die 3. As a result, the time to read File B is twice as long as that for reading an ideally-placed file of the same size, such as File A.</p><p><img src="/img/dbpaper/ssd-die-collision/2.png"></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;包含：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We Ain’t Afraid of No File Fragmentation: Causes and Prevention of Its Performance Impact on Modern Flash SSDs&lt;/li&gt;
&lt;/ul&gt;</summary>
    
    
    
    
    <category term="数据库" scheme="http://www.calvinneo.com/tags/数据库/"/>
    
    <category term="论文阅读" scheme="http://www.calvinneo.com/tags/论文阅读/"/>
    
  </entry>
  
  <entry>
    <title>法语学习纪要</title>
    <link href="http://www.calvinneo.com/2025/03/13/french/"/>
    <id>http://www.calvinneo.com/2025/03/13/french/</id>
    <published>2025-03-13T15:09:06.000Z</published>
    <updated>2025-09-29T16:49:02.536Z</updated>
    
    <content type="html"><![CDATA[<p>在小绿鸟上学法语。</p><a id="more"></a><h1 id="常见变位"><a href="#常见变位" class="headerlink" title="常见变位"></a>常见变位</h1><ul><li>aller：去<ul><li>je vais</li><li>tu vas</li><li>il/elle va</li><li>nous allons</li><li>vous allez</li><li>ils/elles vont</li></ul></li><li>avoir：有<ul><li>j’ai</li><li>tu as</li><li>il/elle a</li><li>nous avons</li><li>vous avez</li><li>ils/elles ont</li></ul></li><li>être：是<ul><li>je suis</li><li>tu es</li><li>il/elle est</li><li>nous sommes</li><li>vous êtes</li><li>ils/elles sont</li></ul></li><li>venir：来<ul><li>je viens</li><li>tu viens</li><li>il/elle vient</li><li>nous venons</li><li>vous venez</li><li>ils/elles viennent</li></ul></li><li>faire：做<ul><li>je fais</li><li>tu fais</li><li>il/elle fait</li><li>nous faisons</li><li>vous faites</li><li>ils/elles font</li></ul></li><li>falloir：必须，这是个绝对无人称动词<ul><li>il faut</li></ul></li><li>devoir：必须<ul><li>je dois</li><li>tu dois</li><li>il/elle doit</li><li>nous devons</li><li>vous devez</li><li>ils/elles doivent</li></ul></li></ul><ul><li>aimer：喜欢<ul><li>j’aime</li><li>tu aimes</li><li>il/elle aime</li><li>nous aimons</li><li>vous aimez</li><li>ils/elles aiment</li></ul></li><li>parler: 说<ul><li>je parle</li><li>tu parles</li><li>il/elle parle</li><li>nous parlons</li><li>vous parlez</li><li>ils/elles parlent</li></ul></li><li>prendre：take<ul><li>je prends</li><li>tu prends</li><li>il/elle prend</li><li>nous prenons</li><li>vous prenez</li><li>ils/elles prennent</li></ul></li><li>sortir：出<ul><li>je sors</li><li>tu sors</li><li>il/elle sort</li><li>nous sortons</li><li>vous sortez</li><li>ils/elles sortent</li></ul></li><li>rentrer：come back in<ul><li>je rentre</li><li>tu rentres</li><li>il/elle rentre</li><li>nous rentrons</li><li>vous rentrez</li><li>ils/elles rentrent</li></ul></li><li>écouter：听<ul><li>j’écoute</li><li>tu écoutes</li><li>il/elle écoute</li><li>nous écoutons</li><li>vous écoutez</li><li>ils/elles écoutent</li></ul></li><li>lire：读<ul><li>je lis</li><li>tu lis</li><li>il/elle lit</li><li>nous lisons</li><li>vous lisez</li><li>ils/elles lisent</li></ul></li><li>ecris：写<ul><li>j’écris</li><li>tu écris</li><li>il/elle écrit</li><li>nous écrivons</li><li>vous écrivez</li><li>ils/elles écrivent</li></ul></li></ul><ul><li>paye：付费<ul><li>je paye</li><li>tu payes</li><li>il/elle paye</li><li>nous payons</li><li>vous payez</li><li>ils/elles payent</li></ul></li><li>finir:<ul><li>je finis</li><li>tu finis</li><li>il/elle finit</li><li>nous finissons</li><li>vous finissez</li><li>ils/elles finissent</li></ul></li><li>courir：跑<ul><li>je cours</li><li>tu cours</li><li>il/elle court</li><li>nous courons</li><li>vous courez</li><li>ils/elles courent</li></ul></li></ul><h1 id="人称代词"><a href="#人称代词" class="headerlink" title="人称代词"></a>人称代词</h1><p>重读人称代词：</p><ul><li>在 C’est 后作表语<br>  C’est moi.</li><li>在介词之后<br>  Je suis avec toi.</li><li>用于表强调</li><li>命令式<br>  Regarde-toi.</li></ul><ul><li>第一人称单数<ul><li>je</li><li>所有格的阳性 mon</li><li>所有格的阴性 ma</li><li>所有格的复数 mes</li><li>间接宾语 me</li><li>重读 moi</li></ul></li><li>第二人称单数<ul><li>tu</li><li>所有格的阳性 ton</li><li>所有格的阴性 ta</li><li>所有格的复数 tes</li><li>间接宾语 te</li><li>重读 toi</li></ul></li><li>第三人称单数<ul><li>il/elle</li><li>所有格的阳性 son</li><li>所有格的阴性 sa</li><li>所有格的复数 ses</li><li>间接宾语 lui</li><li>重读 lui/elle</li></ul></li><li>第一人称复数<ul><li>nous</li><li>所有格单数 notre</li><li>所有格复数 nos</li><li>间接宾语 nous</li><li>重读 nous</li></ul></li><li>第二人称复数<ul><li>vous</li><li>所有格单数 votre</li><li>所有格复数 vos</li><li>间接宾语 vous</li><li>重读 vous</li></ul></li><li>第三人称复数<ul><li>ils/elles</li><li>所有格单数 leur</li><li>所有格复数 leurs</li><li>间接宾语 leur</li><li>eux/elles</li></ul></li><li>泛指代词<ul><li>on</li><li>重读 soi</li></ul></li></ul><h1 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h1><ul><li><a href="https://www.collinsdictionary.com/zh/conjugation/french-conjugation/aller" target="_blank" rel="noopener">https://www.collinsdictionary.com/zh/conjugation/french-conjugation/aller</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;在小绿鸟上学法语。&lt;/p&gt;</summary>
    
    
    
    
    <category term="法语" scheme="http://www.calvinneo.com/tags/法语/"/>
    
  </entry>
  
  <entry>
    <title>Fuse 学习</title>
    <link href="http://www.calvinneo.com/2025/03/09/learn-fuse/"/>
    <id>http://www.calvinneo.com/2025/03/09/learn-fuse/</id>
    <published>2025-03-09T14:48:32.000Z</published>
    <updated>2026-01-14T19:04:38.681Z</updated>
    
    <content type="html"><![CDATA[<p>看下 FUSE 的相关知识。</p><p>Filesystem In Userspace 也就是 fuse，是 linux 的一个内核模块。</p><a id="more"></a><p>Fuse 的优势在于：</p><ul><li>允许非管理员权限去 mount 一个文件系统，例如 overlayfs</li><li>允许不通过内核实现一个文件系统</li></ul><p>但是 Fuse 并不是在用户态访问文件系统，在调用文件系统时，依然需要陷入内核访问 VFS，并通过内核完成 IO。Fuse 在用户态做的是将内核传递的 fuse request 处理，并调用 VFS。</p><h1 id="To-FUSE-or-not-to-FUSE-Analysis-and-Performance-Characterization-of-the-FUSE-User-Space-File-System-Framework"><a href="#To-FUSE-or-not-to-FUSE-Analysis-and-Performance-Characterization-of-the-FUSE-User-Space-File-System-Framework" class="headerlink" title="To FUSE or not to FUSE - Analysis and Performance Characterization of the FUSE User-Space File System Framework"></a>To FUSE or not to FUSE - Analysis and Performance Characterization of the FUSE User-Space File System Framework</h1><h2 id="2"><a href="#2" class="headerlink" title="2"></a>2</h2><p>fuse 的内核部分是 fuse.ko，会注册三个文件系统的类型：fuse、fuseblk、fusectl，它们在 /proc/filesystems 中都可见。fuse 类型的文件系统并不需要下层的块设备，而 fuseblk 类型的文件系统则需要。</p><p><code>fuseblk</code> provides following features:</p><ol><li>Locking the block device on mount and unlocking on release;</li><li>Sharing the file system for multiple mounts;</li><li>Allowing swap files to bypass the file system in accessing the underlying device;</li></ol><p>fuse 和 fuseblk 都是一些不同的 FUSE 文件系统的 proxy，所以后面统称为 FUSE。</p><p>几点说明：</p><ul><li>FUSE 文件系统的名字一般是 <code>[fuse|fusectl].$NAME</code>。</li><li>/dev/fuse 是一个块设备，被用来支持用户态的 FUSE daemon 和内核之间的通信。简单来说，用户态的 daemon 会从 /dev/fuse 中读出请求，进行处理，然后再写回到 /dev/fuse 中。</li></ul><p>这个经典的图，是 FUSE 的链路<br><img src="/img/dbpaper/fuse/fuseornot/2.1.png"></p><blockquote><p>When a user application performs some operation on a mounted FUSE file system, the VFS routes the operation to FUSE’s kernel (file system) driver. The driver allocates a FUSE request structure and puts it in a FUSE queue. At this point, the<br>process that submitted the operation is usually put in a wait state.<br>FUSE’s user-level daemon then picks the request from the kernel queue by reading from <code>/dev/fuse</code> and processes the request. Processing the request might require <strong>re-entering</strong> the kernel again: for example, in case of a stackable FUSE file system, the daemon submits operations to the underlying file system (e.g., Ext4); or in case of a block-based FUSE file system, the daemon reads or writes from the block device; and in case of a network or in-memory file system, the FUSE daemon might still need to re-enter the kernel to obtain certain system services (e.g., create a socket or get the time of day).<br>When done with processing the request, the FUSE daemon writes the response back to <code>/dev/fuse</code>; FUSE’s kernel driver then marks the request as completed, and wakes up the original user process which submitted the request.</p></blockquote><p>一些文件系统的操作并不需要和用户态的 FUSE daemon 交流，就可以完成。例如，读一个之前读过的文件，因为它的 page 已经被存在 page cache 里面了，所以就不需要再把请求 forward 给 FUSE driver 了。这里可能被 cache 的不仅包括 data，还包括一些 meta data。例如 stat 查询 inode 和 dentry 的信息，它们被存在 Linux 的 dcache 中，可以直接在内核态处理，而不需要调用 FUSE daemon 了。</p><p><img src="/img/dbpaper/fuse/fuseornot/t2.1.png"></p><h3 id="2-2-User-Kernel-Protocol"><a href="#2-2-User-Kernel-Protocol" class="headerlink" title="2.2 User-Kernel Protocol"></a>2.2 User-Kernel Protocol</h3><h3 id="2-3-Library-and-API-Levels"><a href="#2-3-Library-and-API-Levels" class="headerlink" title="2.3 Library and API Levels"></a>2.3 Library and API Levels</h3><p>High-level 的 API 允许开发者跳过 path-to-inode 映射。或者说 inode 在 high level API 中根本不存在了，high level API 只操作路径。FORGET inode method 根本就不需要了。</p><p>无论是 high 还是 low level，反正大概都是要实现 42 个方法。</p><p><img src="/img/dbpaper/fuse/fuseornot/2.2.png"></p><h3 id="2-4-Queues"><a href="#2-4-Queues" class="headerlink" title="2.4 Queues"></a>2.4 Queues</h3><p>一个请求在同一时间只能属于下面五个队列的其中一个。</p><p><img src="/img/dbpaper/fuse/fuseornot/2.3.png"></p><p>FORGET requests are sent when the inode is evicted, and these requests are would queue up together with regular file system requests, if a separate forgets queue did not exist. 如果有大量的 FORGET 请求，就无法处理其他的文件系统请求了。这个行为是在一个有 3200 万的 inode 节点上看到的，当所有的 inode 被从 icache evict 的时候，系统可能会 hang 大概 30min。</p><p>pending queue 中是同步请求。</p><p>当 daemon 从 /dev/fuse 中读取时，会以下面的顺序：</p><ul><li>最高优先级会处理 Interrupts 队列中的请求</li><li>FORGET 和非 FORGET 请求会被公平地选择，处理完 8 个非 FORGET 请求，会再处理 16 个 FORGET 请求。这也保障了 FORGET 请求不会堆积起来。</li></ul><p>请求处理的流程是：</p><ul><li>pending queue 中最老的请求会被传送到 user space，然后被 processing queue 立即处理。<br>  INTERRUPT 和 FORGET 队列并不会从 user daemon 获得回复，所以当 daemon 读取这些请求的时候，它们就终止了。</li><li>如果 pending queue 上没有请求，那么 FUSE daemon 就会阻塞在一个 read 调用上。</li><li>如果 daemon 回复了，对应的请求就会被从 processing queue 中被移除，这个请求就完成了。同时，blocked user processes (e.g., the ones waiting for READ to complete) are notified that they can proceed.</li></ul><p>background queue 是对异步请求的。默认情况下，只有读是异步的，因为可以 read ahead。如果开启 writeback cache，则 write 也会走到 background queue 中。开启 writeback cache 后，从 user process 来的 write 会先聚集在 page cache 中，然后 bdflush 线程会醒来，去刷脏页。</p><p>background queue 中的请求会一点点汇到 pending queue 中，在 pending queue 中的异步请求的数量是根据 max_background 参数（默认 12）来调整的。目的是：</p><ul><li>避免异步请求影响同步请求</li><li>开启 multi-threaded 选项后，限制 user daemon 线程的数量</li></ul><h3 id="2-5-Splicing-and-FUSE-Buffers"><a href="#2-5-Splicing-and-FUSE-Buffers" class="headerlink" title="2.5 Splicing and FUSE Buffers"></a>2.5 Splicing and FUSE Buffers</h3><p>在初始设置中，FUSE daemon 需要从 /dev/fuse 中读取请求，并且把回复也写到这个设备中。这样在内核和用户态之间复制内存，对 read 和 write 请求是有害的，因为通常它们会包含很多数据。因此，FUSE 会使用 Linux kernel 提供的 splice 技术。</p><p>splice() 系列的系统调用可以在两个 in-kernel memory buffer 之间传递数据，这样就不需要到用户态拷贝一次了。</p><blockquote><p>例如 sendfile、mmap、splice 都是 Linux 中的零拷贝技术，主要是消除一些不需要的拷贝</p></blockquote><p>FUSE 将它的 buffer 表示为下面两种形式之一：</p><ul><li>The regular memory region identified by a pointer in the user daemon’s address space, or</li><li>The kernel-space memory pointed by a file descriptor of a pipe where the data resides.</li></ul><p>If a user-space file system implements the <code>write_buf()</code> method (in the low-level API), then FUSE first splices the data from <code>/dev/fuse</code> to a Linux pipe and then passes the data directly to this method as a buffer containing a file descriptor of the pipe. FUSE splices only <code>WRITE</code> requests and only the ones that contain <strong>more than a single page of data</strong>.</p><p>Similar logic applies to the replies to READ requests if the <code>read_buf()</code> method is implemented. However, the <code>read_buf()</code> method is only present in the high-level API; for the low-level API, the file-system developer has to differentiate between splice and non-splice flows inside the read method itself.</p><blockquote><p>下面这段话讲解了为什么 header 肯定得 copy</p></blockquote><p>If the library is compiled with splice support, the kernel supports it, and appropriate commandline parameters are set, then splice() is always called for every request (including the request’s header). However, the header of every single request needs to be examined, for example to identify the request’s type and size. This examination is not possible if the FUSE buffer has only the file descriptor of a pipe where the data resides. So, for every request the header is then read from the pipe using regular read() calls (i.e., small, at most 80 bytes, memory copying is always perfomed). FUSE then splices the requested data if its size is larger than a single page (excluding the header): therefore only big writes are spliced. For reads, replies larger than two pages are spliced.</p><h4 id="扩展"><a href="#扩展" class="headerlink" title="扩展"></a>扩展</h4><p>注意，上文中得到的那个 pipe，实际上是可以方便地把数据通过 pipe 的形式往下游传递的。应该是对应了第二种 buffer 的形式。</p><p><a href="https://man7.org/linux/man-pages/man2/vmsplice.2.html" target="_blank" rel="noopener">vmsplice</a> 并不能把管道里面的数据移动到用户态内存中，所以如果需要在用户态对数据进行处理，例如需要加解密或者加解压，则得 Copy。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">/** Write contents of buffer to an open file</span></span><br><span class="line"><span class="comment"> *</span></span><br><span class="line"><span class="comment"> * Similar to the write() method, but data is supplied in a</span></span><br><span class="line"><span class="comment"> * generic buffer.  Use fuse_buf_copy() to transfer data to</span></span><br><span class="line"><span class="comment"> * the destination.</span></span><br><span class="line"><span class="comment"> *</span></span><br><span class="line"><span class="comment"> * Unless FUSE_CAP_HANDLE_KILLPRIV is disabled, this method is</span></span><br><span class="line"><span class="comment"> * expected to reset the setuid and setgid bits.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="keyword">int</span> (*write_buf) (<span class="keyword">const</span> <span class="keyword">char</span> *, struct fuse_bufvec *buf, <span class="keyword">off_t</span> off,</span><br><span class="line">          struct fuse_file_info *);</span><br></pre></td></tr></table></figure><h3 id="2-6-Notifications"><a href="#2-6-Notifications" class="headerlink" title="2.6 Notifications"></a>2.6 Notifications</h3><p>FUSE 可以通过回复内核的 request 向内核传递消息。但有时候，它需要主动向内核传递消息，比如 poll 调用时，如果事件发生了，FUSE 需要主动通知内核。</p><p><img src="/img/dbpaper/fuse/fuseornot/t2.2.png"></p><h3 id="2-7-Multithreading"><a href="#2-7-Multithreading" class="headerlink" title="2.7 Multithreading"></a>2.7 Multithreading</h3><p>如果 pending queue 里面有多个请求，FUSE 会启动新的线程。<strong>每个线程处理一个请求</strong>，我理解这里是内核线程。处理完后，会检查是否有超过 10 个线程，如果是，则线程退出。</p><p>There is no explicit upper limit on the number of threads created by the FUSE library.<br>The implicit limit arises due to two factors.</p><ul><li>First, by default, only 12 asynchronous requests (max background parameter) can be in the pending queue at one time.</li><li>Second, the number of synchronous requests in the pending queue is constrained by the number of user processes that submit requests.</li><li>In addition, for every INTERRUPT and FORGET requests, a new thread is invoked. </li></ul><p>Therefore, the total number of FUSE daemon threads is at most (12 + number of processes with outstanding I/O + number of interrupts + no of forgets).</p><h3 id="2-8-Linux-Write-back-Cache-and-FUSE"><a href="#2-8-Linux-Write-back-Cache-and-FUSE" class="headerlink" title="2.8 Linux Write-back Cache and FUSE"></a>2.8 Linux Write-back Cache and FUSE</h3><h4 id="2-8-1-Linux-Page-Cache"><a href="#2-8-1-Linux-Page-Cache" class="headerlink" title="2.8.1 Linux Page Cache"></a>2.8.1 Linux Page Cache</h4><p>Page cache 的作用是减少 disk IO，行为是将数据存在 RAM 里面。</p><p>对于读，就是很简单的 cache。对于写，是三个策略：</p><ul><li>no-write 直接写 disk，invalidate cache</li><li>write-through 写 disk 也写 cache</li><li>write-back 只写 cache，后续异步写 disk。这些中间状态的 page 也被称为 dirty page</li></ul><p>刷 diety page 发生在三个情况：</p><ul><li>free memory 低于阈值</li><li>dirty data 比某个阈值老了</li><li>用户调用了 sync 或者 fsync</li></ul><p>All of the above three tasks are performed by the group of flusher threads. First, flusher threads flush dirty data to disk when the amount of free memory in the system drops below a threshold value.<br>This is done by a flusher thread calling a function <code>bdi_writeback_all</code>, which continues to write data to disk until following two conditions are true:</p><ol><li>The specified number of pages has been written out; and</li><li>The amount of free memory is above the threshold.</li></ol><p>在 2.6 内核前，内核中有 bdflush 和 kupdated 两个线程，它们的作用和现在的 flusher 一样：</p><ul><li>bdflush 负责后台 writeback dirty page，当 free memory 变少的时候</li><li>kupdated 负责周期性地 writeback dirty page</li></ul><blockquote><p>The major disadvantage in bdflush was that it consisted of one thread. This led to congestion during heavy page writeback where the single bdflush thread blocked on a single slow device. </p></blockquote><p>在 2.6 内核中，引入了 pdflush 这种线程，它们会根据 io load 调整为 2 到 8 中间的数量。The pdflush threads were not associated with any specific disk, but instead they were global to all disks. The downside of pdflush was that it can easily bottleneck on congested disks, starving other devices from getting service.</p><p>Therefore, a per-spindle flushing method was introduced to improve performace. The flusher threads replaced the pdflush threads in the 2.6.32 kernel.</p><p>The 2.6.32 kernel solved this problem by enabling multiple flusher threads to exists where each thread individually flushes dirty pages to a disk, allowing different threads to flush data at different rates to different disks. This also introduced the concept of per-backing device info (BDI) structure which maintains the per-device (disk) information like dirty list, read ahead size, flags, and B.D.Mn.R and B.D.Mx.R which are discussed in the next section.</p><h4 id="2-8-2-Page-Cache-Parameters"><a href="#2-8-2-Page-Cache-Parameters" class="headerlink" title="2.8.2 Page Cache Parameters"></a>2.8.2 Page Cache Parameters</h4><p>Global Background Ratio (G.B.R): The percentage of Total Available Memory filled with dirty pages at which the background kernel flusher threads wake up and start writing the dirty pages out. The processes that generate dirty pages are not throttled at this point. G.B.R can be changed by the user at /proc/sys/vm/dirty_background_ratio. By default this value is set to 10%.</p><p>Global Dirty Ratio (G.D.R): The percentage of Total Available Memory that can be filled with dirty pages before the system starts to throttle incoming writes. When the system gets to this point, all new I/O’s get blocked and the dirty data is written to disk until the amount of dirty pages in the system falls below this G.D.R. This value can be changed by the user at /proc/sys/vm/dirty ratio. By default this value is set to 20%.</p><p>Global Background Threshold (G.B.T): The absolute number of pages in the system that, when crossed, the background kernel flusher thread will start writing out the dirty data. This is obtained from the following formula:<br>G.B.T = T otalAvailableMemory × G.B.R</p><p>Global Dirty Threshold (G.D.T): The absolute number of pages that can be filled with dirty pages before the system starts to throttle incoming writes. This is obtained from the following formula:<br>G.D.T = T otalAvailableMemory × G.D.R</p><p>下面两个参数主要是限制不同设备能够使用的 page cache 的比例</p><p>BDI Min Ratio (B.Mn.R): Generally, each device is given a part of the page cache that relates to its current average write-out speed in relation to the other devices. This parameter gives the minimum percentage of the G.D.T (page cache) that is available to the file system. This value can be changed by the user at <code>/sys/class/bdi/&lt;bdi&gt;/min</code> ratio after the mount, where <code>&lt;bdi&gt;</code> is either a device number for block devices, or the value of st_dev on non-block-based file systems which set their own BDI information (e.g., a fuse file system). By default this value is set to 0%.</p><p>BDI Max Ratio (B.Mx.R): The maximum percentage of the G.D.T that can be given to the file system (100% by default). This limits the particular file system to use no more than the given percentage of the G.D.T. It is useful in situations where we want to prevent one file system from consuming all or most of the page cache.</p><p>BDI Dirty Threshold (B.D.T): The absolute number of pages that belong to write-back cache that can be allotted to a particular device. This is similar to the G.D.T but for a particular BDI device. As a system runs, B.D.T fluctuates between the lower limit (G.D.T × B.Mn.R) and the upper limit (G.D.T × B.Mx.R).</p><p>BDI Background Threshold (B.B.T): When the absolute number of pages which are a percentage of G.D.T is crossed, then the background kernel flusher thread starts writing out the data. This is similar to the G.B.T but for a particular file system using BDI. B.B.T = B.D.T × G.B.T / G.D.T</p><p>NR FILE DIRTY: The total number of pages in the system that are dirty. This parameter is incremented/decremented by the VFS (page cache).</p><p>NR WRITEBACK: The total number of pages in the system that are currently under write-back. This parameter is incremented/decremented by the VFS (page cache).</p><p>BDI RECLAIMABLE: The total number of pages belonging to all the BDI devices that are dirty. A file system that supports BDI is responsible for incrementing/decrementing the values of this parameter.</p><p>BDI WRITEBACK: The total number of pages belonging to all the BDI devices that are currently under write-back. A file system that supports BDI is responsible for incrementing/decrementing the values for this parameter.</p><h2 id="Implementations"><a href="#Implementations" class="headerlink" title="Implementations"></a>Implementations</h2><p>主要是做了一个 stackfs。</p><p>主要介绍了下面几个方面：</p><ul><li>inode</li><li>lookup</li><li>session information<br>  主要是是每个 FUSE session/connection 的数据管理。</li><li>directories</li><li>file create and open</li></ul><blockquote><p>Stackfs assigns its inode the number equal to the address of the inode structure in memory (by type-casting)</p></blockquote><p>也就是说，Stackfs 的 inode number 数值上是等于 inode struct 的地址的。而 FUSE 的 high level API 中，需要维护 FUSE inode number 到 inode struct 所在位置的映射关系。</p><h2 id="Methodology"><a href="#Methodology" class="headerlink" title="Methodology"></a>Methodology</h2><p>FUSE has evolved significantly over the years and added several useful optimizations: a writeback cache, zero-copy via splicing, and multi-threading. In our experience, some in the storage community tend to pre-judge FUSE’s performance—assuming it is poor—mainly due to not having enough information about the improvements FUSE has made over the years.</p><p>为了比较近年来 FUSE的优化：</p><ul><li>StackfsBase</li><li>StackfsOpt -&gt; 包含了所有的 FUSE 优化项目<ul><li>writeback cache</li><li>单个 FUSE 请求的最大尺寸 4KiB -&gt; 128KiB</li><li>user daemon 跑在 muitl-thread，也就是 fuse_session_loop_mt</li><li>splicing is activated for all operations，应该指的是 splice 这个零拷贝技术</li></ul></li></ul><h1 id="一些补充"><a href="#一些补充" class="headerlink" title="一些补充"></a>一些补充</h1><h2 id="FUSE-和-zero-copy"><a href="#FUSE-和-zero-copy" class="headerlink" title="FUSE 和 zero copy"></a>FUSE 和 zero copy</h2><p>如之前的图所示，FUSE 的通信模型如下。FUSE 中的一些过程是可以被 Zero-Copy 的，具体如 Splicing and FUSE Buffers 这一章节所讲的：</p><ul><li>WRITE 部分有限制地给超过一个 page 的数据开启 splice 功能</li><li>READ 部分也无法避免直接读取 header</li></ul><p>FUSE 并不能完全实现零拷贝，原因主要在需要访问 /dev/fuse。</p><p>/dev/fuse 这个设备很像是一个 rpc 调用，kernel 向它写入一些命令，例如 READ、OPEN 等，而 daemon 会从这个设备读出这些命令。daemon 会往 /dev/fuse 中写 response。</p><p>命令大概如下</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[ header | payload ]</span><br></pre></td></tr></table></figure><p>不能使用 mmap 实现 zero copy 的原因：FUSE daemon 是普通用户进程，所以在设计上就允许它 crash、被攻击、卡死等情况。而如果使用 mmap，则内核很难控制生命周期。例如当 FUSE daemon 挂掉之后，需要考虑谁来清理掉这些 mmap 的文件。</p><h1 id="一些知识"><a href="#一些知识" class="headerlink" title="一些知识"></a>一些知识</h1><h2 id="零拷贝"><a href="#零拷贝" class="headerlink" title="零拷贝"></a>零拷贝</h2><p><a href="/2025/09/30/zero-copy/">见 Zero-Copy 技术</a></p><h1 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h1><ul><li><a href="https://www.fsl.cs.sunysb.edu/docs/fuse/bharath-msthesis.pdf" target="_blank" rel="noopener">https://www.fsl.cs.sunysb.edu/docs/fuse/bharath-msthesis.pdf</a><br>  完整版论文</li><li><a href="https://github.com/0voice/kernel_awsome_feature/blob/main/Fuse/FUSE.pdf" target="_blank" rel="noopener">https://github.com/0voice/kernel_awsome_feature/blob/main/Fuse/FUSE.pdf</a></li><li><a href="https://github.com/libfuse/libfuse/blob/master/example/passthrough.c" target="_blank" rel="noopener">https://github.com/libfuse/libfuse/blob/master/example/passthrough.c</a><br>  一个最简单的 demo</li><li><a href="https://github.com/0voice/kernel_awsome_feature/blob/main/%E8%87%AA%E5%88%B6%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F%20%E2%80%94%2003%20Go%E5%AE%9E%E6%88%98%EF%BC%9Ahello%20world%20%E7%9A%84%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F.md" target="_blank" rel="noopener">https://github.com/0voice/kernel_awsome_feature/blob/main/%E8%87%AA%E5%88%B6%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F%20%E2%80%94%2003%20Go%E5%AE%9E%E6%88%98%EF%BC%9Ahello%20world%20%E7%9A%84%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F.md</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;看下 FUSE 的相关知识。&lt;/p&gt;
&lt;p&gt;Filesystem In Userspace 也就是 fuse，是 linux 的一个内核模块。&lt;/p&gt;</summary>
    
    
    
    
    <category term="Linux" scheme="http://www.calvinneo.com/tags/Linux/"/>
    
    <category term="数据库" scheme="http://www.calvinneo.com/tags/数据库/"/>
    
    <category term="FileSystem" scheme="http://www.calvinneo.com/tags/FileSystem/"/>
    
  </entry>
  
  <entry>
    <title>Excerpt from Yes Minister</title>
    <link href="http://www.calvinneo.com/2025/02/15/from-ym/"/>
    <id>http://www.calvinneo.com/2025/02/15/from-ym/</id>
    <published>2025-02-15T15:42:32.000Z</published>
    <updated>2025-06-11T13:49:27.241Z</updated>
    
    <content type="html"><![CDATA[<p>The novel of Yes Minister. Including metaphors, grammar issues and funny paragraphs.</p><a id="more"></a><h1 id="Editor’s-Note"><a href="#Editor’s-Note" class="headerlink" title="Editor’s Note"></a>Editor’s Note</h1><blockquote><p>Years of political training and experience had taught Hacker to use twenty words where one would do, to dictate millions of words where mere thousands would suffice, and to use language to blur and fudge issues and events so that they became incomprehensible to others. Incomprehensibility can be a haven for some politicians, for therein lies temporary safety.</p></blockquote><blockquote><p>But his natural gift for the misuse of language, though invaluable to an active politician, was not an asset to a would-be author.</p></blockquote><h1 id="OPEN-GOVERMENT"><a href="#OPEN-GOVERMENT" class="headerlink" title="OPEN GOVERMENT"></a>OPEN GOVERMENT</h1><p>电视剧中增加了一段误接 BBC 采访电话的剧情，我觉得挺棒的。</p><blockquote><p>‘Then why don’t you marry him?’ she asked. ‘I now pronounce you man and political adviser. Whom politics has joined let no wife put asunder.’</p></blockquote><h1 id="THE-OFFICIAL-VISIT"><a href="#THE-OFFICIAL-VISIT" class="headerlink" title="THE OFFICIAL VISIT"></a>THE OFFICIAL VISIT</h1><blockquote><p>‘Shall we scramble?’ he said.<br>‘Where to?’ I said, then felt rather foolosh as I realised what he was talking about. Then I relised it was another of Bernards’s draft suggestions: what’s the point of scrambling a phonw conversation about something that’s just been on the television news?</p></blockquote><p>scramble 既有匍匐前进、攀登的意思，也有干扰无线电，使得只有特殊的人才能够听到通话的意思。</p><blockquote><p>And if the new president is Marxist-backed, who better to win him over to our side than Her Majesty?</p></blockquote><p>如果新总统背后有马克思主义支持，还有谁比陛下更能说服他站到我们这边呢？</p><blockquote><p>there was one 747 that belonged to nine different African airlines in one month. They called it the mumbo-jumbo.</p></blockquote><h1 id="THE-ECONOMY-DRIVE"><a href="#THE-ECONOMY-DRIVE" class="headerlink" title="THE ECONOMY DRIVE"></a>THE ECONOMY DRIVE</h1><blockquote><p>I was forced to move on to the next two white elephants.</p></blockquote><p>我被迫继续处理接下来的那两个烫手山芋。</p><blockquote><p>‘Government buildings do not need fire fafety clearance’<br>‘Why?’<br>‘Perhaps,’ Humphrey offered, ‘because Her Majesty’s Civil Servants are not easily inflamed.’</p></blockquote><blockquote><p>Frank chimed in eagerly, ‘Yes, that would get rid of ninety civil servants at a stroke.’<br>‘Or indeed,’ said Sir Humphrey, ‘at a strike.’</p></blockquote><p>后面还有经典的 smile at 和 smile on。</p><h1 id="BIG-BROTHER"><a href="#BIG-BROTHER" class="headerlink" title="BIG BROTHER"></a>BIG BROTHER</h1><blockquote><p>The local party, the constituency, my family, all of them are proud of me for getting into the Cabinet – yet they are all resentful that I have less time to spend on them and are keen to remind me that I’m nothing special, just their local MP, and that I mustn’t get ‘too big for my boots’. They manage both to grovel and patronise me simultaneously. It’s hard to know how to handle it.</p></blockquote><blockquote><p>And he assumes, rightly, that the Minister has too much else to do. [The whole process is called Creative Inertia – Ed.]</p></blockquote><blockquote><p>He also warned me of the ‘Three Varieties of Civil Service Silence’, which would be  Humphrey’s last resort if completely cornered:<br>The silence when they do not want to tell you the facts: Discreet Silence.<br>The silence when they do not intend to take any action: Stubborn Silence.<br>The silence when you catch them out and they haven’t a leg to stand on. They imply that they could vindicate themselves completely if only they were free to tell all, but they are too honourable to do so: Courageous Silence.</p></blockquote><blockquote><p>I explained to her that the Opposition aren’t really the opposition. They’re just called the Opposition. But, in fact, they are the opposition in exile. The Civil Service are the opposition in residence.</p></blockquote><blockquote><p>In the second place, if there had been investigations, which there haven’t or not necessarily, or I am not at liberty to say if there have, there would have been a project team which, had it existed, on which I cannot comment, would now be disbanded if it had existed and the members returned to their original departments, had there indeed been any such members.</p></blockquote><p>经典虚拟语气。</p><blockquote><p>But they’ve convinced me that they can. Indeed my Permanant Secretary is staking his reputation on it. And, if not, heads will roll.</p></blockquote><p>Hacker 对媒体的了解，经典的媒体逼宫战术。在后面的交通总管一事中，也被 PM 用来赶鸭子上架。</p><h1 id="THE-WRITING-ON-THE-WALL"><a href="#THE-WRITING-ON-THE-WALL" class="headerlink" title="THE WRITING ON THE WALL"></a>THE WRITING ON THE WALL</h1><blockquote><p>Woe betide any Minister who lifts the phone to try to sort out a foreign trade deal, for instance.</p></blockquote><p>Woe betide 在一起表示 XXX 样就会倒霉。Betide 的意思是降临。</p><blockquote><p>‘With respect, Minister,’ countered Sir Humphrey (untruthfully), ‘how do you know it says the opposite if it is totally unintelligible?’</p></blockquote><p>这次是 Humphrey 而不是 Bernard 来挑逻辑问题。</p><blockquote><p>Hacker was beginning to understand Civil Service code language. Other examples are:<br>‘I think we have to be very careful.’ Translation: We are not going to do this.<br>‘Have you thought through all the implications?’ Translation: You are not going to do this.<br>‘It is a slightly puzzling decision.’ Translation: Idiotic!<br>‘Not entirely straightforward.’ Translation: Criminal.<br>‘With the greatest possible respect, Minister . . .’ Translation: Minister, that is the silliest idea I’ve ever heard</p></blockquote><p>这是 Editor 的总结，我觉得很有趣。</p><blockquote><p>If a purely hypothetical Minister were to be unhappy with a departmental draft of evidence to a committee, and if the hypothetical Minister were to be planning to replace it with his own hypothetical draft worked out with his own political advisers at his party HQ, and if this Minister was planning to bring in his own draft so close to the final date for evidence that there would be no time to redraft it, and if the hypothetical Private Secretary were to be aware of this hypothetical draft – in confidence – should the hypothetical Private Secretary pass on the information to the Perm. Sec. of the hypothetical Department?</p></blockquote><p>经典虚拟语气片段。</p><blockquote><p>‘We shall always support you as your standard-bearer, Minister but not as your pall-bearer.’</p></blockquote><blockquote><p>‘If you must do this damn silly thing,’ he said, ‘don’t do it in this damn silly way.’</p></blockquote><p>Humphrey 难得的两次很直的话。</p><blockquote><p>Bernard assured me that I didn’t really need to know much about the proposal because his information on the grapevine, through the Private Office network, was that the proposal would go through on the nod.</p></blockquote><p>为啥 grapevine 还有小道消息的意思？</p><blockquote><p>Donald Hughes, rubbing salt in the wound, apparently described it as ‘approbation, elevation and castration, all in one stroke’. It seems he suggested that I should take the title Lord Hacker of Kamikaze.</p></blockquote><blockquote><p>I told Humphrey I was appalled.<br>‘You’re appalled?’ he said. ‘I’m appalled.’<br>Bernard said he was appalled, too. And, there’s no doubt about it, the situation is appalling.</p></blockquote><blockquote><p>Industrial Harmony. That means strikes.</p></blockquote><p>这里 Editor 有个注解，挺有趣的。</p><blockquote><p>You’ll probably spend the rest of your career in the Vehicle Licensing Centre in Swansea.</p></blockquote><p>对 Swansea 很有敌意啊。</p><blockquote><p>Then Humphrey proposed that we work together on this. This was a novel suggestion, to say the least.</p></blockquote><blockquote><p>‘I’m awfully sorry to quibble again, Minister, but you can’t actually stop things before they start,’ intervened Bernard, the wet-hen-in-chief. He’s really useless in a crisis.</p></blockquote><blockquote><p>‘Same reason,’ came the reply. ‘It’s just like the United Nations. The more members it has, the more arguments you can stir up, and the more futile and impotent it becomes.’</p></blockquote><p>感觉很有洞见。</p><blockquote><p>Then I had an idea. I suddenly realised that Martin will be on my side. I can’t imagine why I didn’t think of it before. He’s Foreign Secretary – and, to my certain knowledge, Martin is genuinely pro Europe. (Humphrey calls him ‘naïf’). Also I ran his campaign against the PM, and he only stands to lose if I’m squeezed out.</p></blockquote><blockquote><p>I agreed, and remarked that this Europass thing is the worst disaster to befall the government since I was made a member of the Cabinet. [We don’t think that Hacker actually meant what he seems to be saying here Ed.]</p></blockquote><blockquote><p>It’s awarded to the statesman who has made the biggest contribution to European unity since Napoleon. [That’s if you don’t count Hitler – Ed.]</p></blockquote><p>这几段 Editor 的注解都很有趣。</p><blockquote><p>when you’ve got them by the balls, their hearts and minds will follow.</p></blockquote><h1 id="THE-RIGHT-TO-KNOW"><a href="#THE-RIGHT-TO-KNOW" class="headerlink" title="THE RIGHT TO KNOW"></a>THE RIGHT TO KNOW</h1><blockquote><p>Sir Humphrey replied that I need not look far – Private Secretaries who could not occupy their Ministers were a threatened species.</p></blockquote><blockquote><p>‘Almost anything can be attacked as a loss of amenity and almost anything can be defended as not a significant loss of amenity. One must appreciate the significance of significant.’</p></blockquote><blockquote><p>Humphrey suggested I look inside them. I did, and to my utter astonishment I saw that there were a handful of signatures in each book, about a hundred altogether at the most. A very cunning ploy – a press photo of a petition of six fat books is so much more impressive than a list of names on a sheet of Basildon Bond.</p></blockquote><blockquote><p>Those civil servants are always kowtowing to Daddy, but they never take any real notice of him.</p></blockquote><p>Kowtow 就是中文的磕头。</p><blockquote><p>She told me she’d been out with the trots. I was momentarily sympathetic and suggested she saw the doctor. Then I realised she meant the Trotskyites. I’d been slow on the uptake because I didn’t know she was a Trotskyite. Last time we talked she’d been a Maoist.</p></blockquote><blockquote><p>I noted that Lucy was giving out the press release at five p.m. Very professional. Misses the evening papers, which not too many people read, and therefore makes all the dailies. She’s learned something from being a politician’s daughter.</p></blockquote><h1 id="JOBS-FOR-THE-BOYS"><a href="#JOBS-FOR-THE-BOYS" class="headerlink" title="JOBS FOR THE BOYS"></a>JOBS FOR THE BOYS</h1><blockquote><p>you scratch my back, I’ll scratch yours.</p></blockquote><h1 id="THE-COMPASSIONATE-SOCIETY"><a href="#THE-COMPASSIONATE-SOCIETY" class="headerlink" title="THE COMPASSIONATE SOCIETY"></a>THE COMPASSIONATE SOCIETY</h1><p>比较好奇为什么 DAA 会为医院的铺张浪费负责？</p><blockquote><p>I informed Bernard that most of our journalists are so amateur that they would have grave difficulty in finding out that today is Thursday.<br>‘It’s actually Wednesday, Minister,’ he said.</p></blockquote><blockquote><p>Sir Humphrey preferred to write in margins where possible, but, if not possible, simulated margins made him feel perfectly comfortable.</p></blockquote><blockquote><p>We can infer from this note that Mr Bernard Woolley – as he then was – mentioned the matter of St Edward’s Hospital to Sir Humphrey, although when we challenged Sir Bernard – as he now is – on this point he had no recollection of doing so – Ed.</p></blockquote><h1 id="THE-DEATH-LIST"><a href="#THE-DEATH-LIST" class="headerlink" title="THE DEATH LIST"></a>THE DEATH LIST</h1><h1 id="DOING-THE-HONOURS"><a href="#DOING-THE-HONOURS" class="headerlink" title="DOING THE HONOURS"></a>DOING THE HONOURS</h1><p>比较好奇为什么 DAA 会管到教育的事情？</p><blockquote><p>Chat over the port and walnuts</p></blockquote><p>通常用来描述一种轻松的社交场合，人们在享用波特酒（port）和核桃（walnuts）时进行闲聊。</p><blockquote><p>He explained that home students were to be avoided at all costs! Anything but home students.</p></blockquote><blockquote><p>Sir William Guthrie, OM, FRS, FBA, Ph.D, MC, MA (Oxon)<br>Group Captain Christopher Venables, DSC, MA<br>Sir Humphrey Appleby, KCB, MVO, MA (Oxon)<br>Bernard Woolley, MA (Cantab)<br>The Rt Hon. James Hacker, PC, MP, BSc. (Econ)<br>Sir Arnold Robinson, GCMG, CVO, MA (Oxon)</p></blockquote><p>Cantab 指的是剑桥。</p><blockquote><p>In fact, the only time a civil servant is known to have refused a knighthood was in 1496. This was because he already had one.</p></blockquote><blockquote><p>Just as incomes policies have always been manipulated by those that control them: for instance, the 1975 Pay Policy provided exemptions for Civil Service increments and lawyers’ fees. Needless to say, the policy was drafted by civil servants and parliamentary draftsmen, i.e. lawyers.</p></blockquote><blockquote><p>Quis custodiet ipsos custodes?</p></blockquote><blockquote><p>And how did the civil servants get away with creating these remarkably favourable terms of service for themselves? Simply by keeping a low profile. They have somehow managed to make people feel that discussing the matter at all is in rather poor taste.</p></blockquote><p>很有趣的观察，“保持低调”，“讨论这些并不得体”。</p><blockquote><p>Cut no ice with me<br>俗语，表示 XX 对我没用。</p></blockquote><blockquote><p>The penny dropped<br>俗语，表示某人终于明白了某件事情或某人突然明白了之前不明白的事情。</p></blockquote><blockquote><p>‘There is no reason,’ he said, stabbing the air with his finger, ‘to change a system which has worked well in the past.’<br>‘But it hasn’t,’ I said.<br>‘We have to give the present system a fair trial,’ he stated. This seemed quite reasonable on the face of it. But I reminded him that the Most Noble Order of the Garter was founded in 1348 by King Edward III. ‘Surely it must be getting towards the end of its trial period?’ I said.<br>So Humphrey tried a new tack. He said that to block honours pending economies might create a dangerous precedent. What he means by ‘dangerous precedent’ is that if we do the right thing now, then we might be forced to do the right thing again next time. And on that reasoning nothing should ever be done at all.</p></blockquote><blockquote><p>‘How do they award the Thistle?’ I asked.<br>‘A committee sits on it,’ said Bernard</p></blockquote><p>双关。</p><blockquote><p>‘As you know,’ he said, ‘the letters JB are the highest honour in the Commonwealth.’<br>I didn’t know.<br>Humphrey eagerly explained. ‘Jailed by the British. Gandhi, Nkrumah, Makarios, Ben-Gurion, Kenyatta, Nehru, Mugabe – the list of world leaders is endless and contains several of our students.’</p></blockquote><blockquote><p>although the Cabinet Secretary is theoretically primus inter pares he is in reality very much primus. It seems that all Permanent Secretaries are equal but some are more equal than others.</p></blockquote><blockquote><p>thin end of the wedge</p></blockquote><blockquote><p>Perhaps Appleby is not an absolutely first-rank candidate to succeed one as Cabinet Secretary. Not really able in every department. Might do better in a less arduous job, such as chairman of a clearing bank or as an EEC official.</p></blockquote><blockquote><p>‘Of course,’ said Bernard, ‘but it’s years and years since the Department of Transport had a Permanent Secretary from Cambridge.’<br>好像只有 Bernard 是剑桥的。</p></blockquote><h1 id="THE-GREASY-POLE"><a href="#THE-GREASY-POLE" class="headerlink" title="THE GREASY POLE"></a>THE GREASY POLE</h1><p>“The Greasy Pole” is an idiomatic expression used to describe the difficult and often slippery route to advancement in one’s career or profession. It is particularly used in contexts where success is hard to achieve and the path to the top is fraught with challenges and obstacles</p><blockquote><p>‘Simple, Minister,’ he explained. ‘It means “with” or “after”, or sometimes “beyond” – it’s from the Greek, you know.’<br>[Like all Permanent Secretaries, Sir Humphrey Appleby was a generalist. Most of them studied classics, history, PPE or modern languages. Of course you might expect the Permanent Secretary at the Department of Administrative Affairs to have a degree in business administration, but of course you would be wrong – Ed.]<br>Then he went on to explain that metadioxin means ‘with’ or ‘after’ dioxin, depending on whether it’s with the accusative or the genitive: with the accusative it’s ‘beyond’ or ‘after’, with the genitive it’s ‘with’ as in Latin, where the ablative is used for words needing a sense of with to precede them.<br>Bernard added – speaking for the first time in the whole meeting – that of course there is no ablative in Greek, as I would doubtless recall.<br>I told him I recalled no such thing, and later today he wrote me a little memo, explaining all the above Greek and Latin grammar.</p></blockquote><blockquote><p>‘Well,’ he said eventually, ‘inert means that . . . it’s not . . . ert.’</p></blockquote><blockquote><p>I searched desperately for an analogy, ‘It’s like Littler and Hitler,’ I explained. ‘We’re not saying that you’re like Hitler because your name sounds similar.’</p></blockquote><blockquote><p>Stage two: Discredit the evidence that you are not publishing<br>This is, of course, much easier than discrediting evidence that you do publish. You do it indirectly, by press leaks. You say:<br>(a) that it leaves important questions unanswered<br>(b) that much of the evidence is inconclusive<br>(c) that the figures are open to other interpretations<br>(d) that certain findings are contradictory<br>(e) that some of the main conclusions have been questioned<br>Points (a) to (d) are bound to be true. In fact, all of these criticisms can be made of a report without even reading it. There are, for instance, always some questions unanswered – such as the ones they haven’t asked. As regards (e), if some of the main conclusions have not been questioned, question them! Then they have.</p></blockquote><p>这里，有两幅 BBC 的报道，分别是支持和反对某个事情的版本，不过太模糊看不清楚。</p><blockquote><p>‘The public,’ said Sir Humphrey, ‘are ignorant and misguided.’<br>‘What do you mean?’ I demanded. ‘It was the public who elected me.’</p></blockquote><blockquote><p>He was very bitter. And very insulting. ‘Must you always be so concerned with climbing the greasy pole?’<br>I faced the question head on. ‘Humphrey,’ I explained, ‘the greasy pole is important. I have to climb it.’<br>‘Why?’<br>‘Because,’ I said, ‘it’s there.’</p></blockquote><blockquote><p>I asked him how it felt, going from the Commons to the Lords.<br>‘It’s like being moved from the animals to the vegetables,’ he replied.</p></blockquote><h1 id="THE-DEVIL-YOU-KNOW"><a href="#THE-DEVIL-YOU-KNOW" class="headerlink" title="THE DEVIL YOU KNOW"></a>THE DEVIL YOU KNOW</h1><blockquote><p>He sighed. ‘Well, Minister, I’m afraid that this is the penalty we have to pay for trying to pretend that we are Europeans. Believe me, I fully understand your hostility to Europe.’</p></blockquote><blockquote><p>I reminded Humphrey that the typical Common Market official is said to have the organising capacity of the Italians, the flexibility of the Germans and the modesty of the French. He tops all that up with the imagination of the Belgians, the generosity of the Dutch, and the intelligence of the Irish. Finally, for good measure, he has the European spirit of Mr Anthony Wedgwood Benn.</p></blockquote><h1 id="THE-BED-OF-NAILS"><a href="#THE-BED-OF-NAILS" class="headerlink" title="THE BED OF NAILS"></a>THE BED OF NAILS</h1><p>如坐针毡还有的表述是：On Tenterhooks</p><h1 id="常见单词"><a href="#常见单词" class="headerlink" title="常见单词"></a>常见单词</h1>]]></content>
    
    
    <summary type="html">&lt;p&gt;The novel of Yes Minister. Including metaphors, grammar issues and funny paragraphs.&lt;/p&gt;</summary>
    
    
    
    
    <category term="文学" scheme="http://www.calvinneo.com/tags/文学/"/>
    
  </entry>
  
</feed>
