Introducing sqlite-wasm-rs
Project: https://github.com/Spxg/sqlite-wasm-rs
SQLite is highly portable. It can run on most platforms, including the Web. The official project provides sqlite-wasm, which is based on wasm32-unknown-emscripten, exposes JS APIs for Web applications, and includes persistent VFS support.
On the Rust side, there is the wasm32-unknown-unknown target. Together with wasm-bindgen, it lets Rust programs run on the Web. But after all these years, SQLite still had not been compiled to this target, mainly for two reasons:
- It does not have a C standard library or headers.
- Its C ABI was not standard-compliant, specifically for the Rust target rather than LLVM.
The first point means we cannot directly use cc to compile SQLite without extra work. This is the main problem to solve. The second point does not affect SQLite directly, but it introduces a lot of uncertainty.
Fortunately, starting from 1.89.0, its C ABI became standards-compliant, so we can now safely interoperate with C.
Completing the headers
To compile SQLite, we need to provide the headers required by the build. At first, the compilation looks like this:
$ clang sqlite3.c -target wasm32-unknown-unknown
sqlite3.c:15667:10: fatal error: 'stdio.h' file not found
15667 | #include <stdio.h>
| ^~~~~~~~~
1 error generated.The good news is that, because the ABI is compatible, we can copy the required musl source code from emscripten. Before that, however, we need to set the SQLITE_OS_OTHER=1 and SQLITE_THREADSAFE=0 compile options. The former tells SQLite that it is being compiled for a custom platform and that we need to implement our own VFS. The latter disables multithreading support. Together, these two options minimize SQLite’s symbol dependencies. For example, pthread is no longer required.
Completing the symbols
After completing the headers, the code can compile, but many required symbols are still missing, such as strcmp. There are two options in front of us: implement them in Rust, or continue copying musl source code. I chose the latter. musl’s implementation has been battle-tested and is undoubtedly the better choice. Implementing these symbols in Rust would introduce some problems, because we would need to declare an extern "C" symbol:
#[unsafe(no_mangle)]
unsafe extern "C" strcmp(s1: *const c_char, s2: *const c_char) -> c_int;When compiled to wasm, this causes strcmp to be exported blindly. I want the final artifact to be clean, without unnecessary external symbols.
There are still some symbols that need to be implemented with Rust and wasm-bindgen, such as malloc, free, and localtime.
Renaming symbols
After implementing the symbols, we also need to rename them. The idea was inspired by zstd-sys. For example, strcmp is renamed to rust_sqlite_wasm_strcmp. Since the wasm32-unknown-unknown target does not have a C standard library, any other library that also needs to interoperate with C will likely need to implement these symbols as well. If everyone uses the original symbol names and the libraries are compiled together, the final link step will run into conflicts.
Renaming symbols is straightforward. We only need to introduce a global header file, such as wasm-shim.h:
#define strcmp rust_sqlite_wasm_strcmp
int rust_sqlite_wasm_strcmp(const char *l, const char *r);Then compile SQLite with clang -include wasm-shim.h. This automatically includes our declared header before compiling each source file, allowing the macro to rename all symbols.
VFS implementation
We compile with SQLITE_OS_OTHER=1, which means we need to implement our own VFS and declare the sqlite3_os_init symbol to initialize it. For the Web, FileSystemSyncAccessHandle is the best choice. It can read and write files synchronously. The downside is that creating files is asynchronous. I will not go into too much detail here; see the SQLite official documentation for more information. In short, persistent storage has all kinds of environment-specific limitations. Among the available options, opfs-sahpool looks like the best solution to me, so I ported it to Rust.
The only option without environment-specific limitations is Memory VFS. As the name suggests, it stores files in memory and has very high IO performance. Persistence can be handled by exporting and importing the database, but it cannot write a large amount of data because it is limited by the Web page’s memory. Based on SQLite’s behavior, I designed a structure to reduce memory allocation: SQLite reads and writes the main database in units of PAGE_SIZE except for the first 100 bytes used to read the header. We can determine the block size on the first write and then read and write by block:
pub struct MemChunksFile {
chunks: Vec<Vec<u8>>,
chunk_size: Option<usize>,
file_size: usize,
}
impl MemChunksFile {
fn read(&self, buf: &mut [u8], offset: usize) {
if let Some(chunk_size) = self.chunk_size {
buf.copy_from_slice(&self.chunks[offset / chunk_size]);
} else {
buf.fill(0);
}
}
fn write(&mut self, buf: &[u8], offset: usize) {
let chunk_size = if let Some(chunk_size) = self.chunk_size {
chunk_size
} else {
let size = buf.len();
self.chunk_size = Some(size);
size
};
for _ in self.chunks.len()..offset / chunk_size {
self.chunks.push(vec![0; chunk_size]);
}
if let Some(buffer) = self.chunks.get_mut(offset / chunk_size) {
buffer.copy_from_slice(buf);
} else {
self.chunks.push(buf.to_vec());
}
}
}Compared with directly using Vec<u8>, this avoids additional memory allocation when the existing space is insufficient. That kind of reallocation is disastrous for large writes. If 1 GB of data has already been written and the space is no longer enough, a new buffer that is usually at least 2 GB will be allocated, and the existing data will be copied over. Considering that WASM currently only has the memory.grow instruction and no memory.shrink, memory will keep growing. A typical Web page’s memory limit is 4 GB, and wasm32 can only address up to 4 GB anyway, so the page can easily crash.
Implementing a VFS requires dozens of interfaces and many edge cases, so I abstracted the methods. Now it is much easier to implement a VFS.
Summary
Like libsqlite3-sys, sqlite-wasm-rs provides SQLite bindings. It has already been integrated into diesel and rusqlite, which means SQLite can now be used out of the box on the Web. After all, using the C API directly is still a bit tedious.
I also built a RustPlayground-like website. You can try it right here; it is not a screenshot.