C++ 的 std::string 有什么缺点？

发布时间：

2025-04-16 15:31

阅读量：

std::string 与其说是 string，不如说是 ByteArray。事实上许多 C++ 基础组件都把 std::string 当做 ByteArray 来使用，最典型的，比如在 leveldb/rocksdb 中，比如在 snappy 压缩算法中。

这些基础组件不仅把 std::string 当做 ByteArray 使用，还把它当做 OutputByteStream 使用，例如 rocksdb 中就有大量这样的用法，其中一个典型场景是构建 WriteBatch 时，举例来说：

Status WriteBatchInternal::Put(WriteBatch* b, uint32_t column_family_id, const Slice& key, const Slice& value) { // 略 .... 说明：b->rep_ 是 std::string if (column_family_id == 0) { b->rep_.push_back(static_cast<char>(kTypeValue)); } else { b->rep_.push_back(static_cast<char>(kTypeColumnFamilyValue)); PutVarint32(&b->rep_, column_family_id); } PutLengthPrefixedSlice(&b->rep_, key); PutLengthPrefixedSlice(&b->rep_, value); // 略 .... } // 其中： inline void PutVarint32(std::string* dst, uint32_t v) { char buf[5], *ptr = EncodeVarint32(buf, v); dst->append(buf, static_cast<size_t>(ptr - buf)); } inline void PutLengthPrefixedSlice(std::string* dst, const Slice& value) { PutVarint32(dst, static_cast<uint32_t>(value.size())); dst->append(value.data(), value.size()); }

每次 std::string::append 或 push_back 需要：
　① 检查 capacity 是否够用，不够的话扩容(扩容是重量级操作)；
　② 追加数据，更新 size 字段；
　③ 设置末尾的 '\0'，必备操作，必须能当 c_str() 来用。

在这个 WriteBatchInternal::Put 中，需要调用一次 push_back，再加 4 次或 5 次 append，如果 key value 比较长的话还无所谓，如果 key value 比较短，这个相对开销是很大的，并且额外开销甚至远大于 memcpy key value 的开销。

那么，我们能不能减少这些开销？很遗憾，正常途径无法做到，通过公开接口得到的 string 状态必须是合法的，我们没有机会去象 C 语言一样在未初始化的内存上执行操作！但是，Hacker 们的探索精神是无止境的，C++23 为 std::string 增加了一个方法 resize_and_overwrite，用法如下：

std::string stream; size_t oldsize = stream.size(); size_t newsize = oldsize + ComputeIncSize(...); stream.resize_and_overwrite(newsize, [oldsize,&](char* mem, size_t newsize) { char* ptr = mem + oldsize; // oldsize 要捕获，newsize 会通过参数传进来 ptr = mempcpy(ptr, part1, len1); // mempcpy 返回 ptr + len1 ptr = mempcpy(ptr, part2, len2); // mempcpy 返回 ptr + len2 ptr = mempcpy(ptr, part3, len3); // mempcpy 返回 ptr + len3 // 更多 .... return newsize; });

这样一来，用在前面的 WriteBatchInternal::Put 中，在外面预先 ComputeIncSize，然后只需要调用一次 resize_and_overwrite，只需要对 std::string 的状态做一次改变！稍微美中不足的是，如果有一个 string::append_with_op(inc_len, op) 函数就更好了。

更惊险的来了

C++23 并非广泛可用，如果没有 C++23，那怎么办？至少在 GCC 中，我们还是有办法的，当然，这就属于黑活了：

void hack_set_size(std::string* s, size_t n); template<class FuncType, FuncType std::string::*Func> struct string_hack { friend void hack_set_size(std::string* s, size_t n) { (s->*Func)(n); } }; // 以下 string::_M_length 是 private 的用来设置 string 的 size 的函数， // private 成员一般是不能外部被访问的，这就是我们 hack 的地方 template struct string_hack<void(size_t), &std::string::_M_length>; void string_resize_no_touch_memory(std::string* str, size_t new_size) { if (new_size > str->capacity()) { size_t old_size = str->size(); // 以下 103/64 略小于 1.618 size_t cap = std::max(new_size, old_size * 103/64); str->reserve(cap); } hack_set_size(str, new_size); }

注意，这个函数，我们甚至没有设置末尾的 '\0'，由调用方来设置（是有原因的）。

以这样的方式，我们就可以在 GCC C++ 17 中优化 WriteBatchInternal::Put 了：

Status WriteBatchInternal::Put(WriteBatch* b, uint32_t column_family_id, const Slice& key, const Slice& value) { // 略... size_t old_size = b->rep_.size(); size_t inc_size = 1 + (column_family_id ? VarUint32Length(column_family_id) : 0) + VarUint32Length(uint32_t(key.size())) + key.size() + VarUint32Length(uint32_t(value.size())) + value.size(); string_resize_no_touch_memory(&b->rep_, old_size + inc_size); char* ptr = b->rep_.data(); EncodeFixed32(ptr + 8, DecodeFixed32(ptr + 8) + 1); // Update Batch Count ptr += old_size; if (column_family_id == 0) { ptr[0] = static_cast<char>(kTypeValue); ptr += 1; } else { ptr[0] = static_cast<char>(kTypeColumnFamilyValue); ptr = EncodeVarint32(ptr + 1, column_family_id); // 原地操作，避免 memcpy } ptr = EncodeVarint32(ptr, key.size()); // 原地操作，避免 memcpy memcpy(ptr, key.data(), key.size()); ptr = EncodeVarint32(ptr + key.size(), value.size()); // 原地操作，避免 memcpy memcpy(ptr, value.data(), value.size()); ptr[value.size()] = '\0'; // end of str // 略... }

优化后：

在 RocksDB 中，因为整体性能较差，这个地方慢一些，对全局影响不大，因为其它环节更慢。但是在 ToplingDB 中，我们大幅提升了整体性能(比如最近的改进)，每个环节的消耗，我们需要逐一优化，这个环节也不例外。

在整个数据库中，WriteBatch::Put 占的比例很低，我们把它的性能提高了三五倍，对整体的贡献虽然不高，但也有好几个百分点了。

【完】

上一篇：我只想老婆对我态度好一点，她为什么想分开？

下一篇：雷军的每一分钱都是干净的吗？

END