Skip to content

feat(new-compiler): migrate metadata storage from lockfile to LMDB#1955

Open
AndreyHirsa wants to merge 7 commits intomainfrom
feat/metadata-lmdb-storage
Open

feat(new-compiler): migrate metadata storage from lockfile to LMDB#1955
AndreyHirsa wants to merge 7 commits intomainfrom
feat/metadata-lmdb-storage

Conversation

@AndreyHirsa
Copy link
Contributor

@AndreyHirsa AndreyHirsa commented Feb 3, 2026

Summary

Migrate metadata storage from JSON files with lockfile to LMDB database for improved concurrent access and elimination of ELOCKED errors.

Changes

  • Replace proper-lockfile with LMDB: Use lmdb-js for metadata storage with MVCC (Multi-Version Concurrency Control) for safe concurrent access across bundler workers
  • Remove proper-lockfile dependency: No longer needed with LMDB's built-in concurrency handling
  • Update related docs

Testing

Business logic tests added:

  • Basic operations with db
  • Stats accuracy tracking
  • Cleanup logic
  • Error handling
  • All tests pass locally

Visuals

N/A - No UI changes

Checklist

  • Changeset added
  • Tests cover business logic
  • No breaking changes (or documented below)

Closes #1636

Summary by CodeRabbit

  • New Features

    • Metadata storage migrated from a JSON file to an LMDB-backed database with separate dev/build metadata directories and short-lived access.
    • Server runtime API updated to accept translation hashes and return { t, locale, translations }.
  • Documentation

    • Compiler, architecture, transformation pipeline, and runtime docs updated to reflect new storage layout and APIs.
  • Tests

    • Added comprehensive metadata tests covering creation, save/load, cleanup, large batches, and concurrency-like scenarios.
  • Chores

    • Removed file-locking dependency in favor of LMDB-backed persistence.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces JSON+file-lock metadata with LMDB-backed storage, removing the MetadataManager class and proper-lockfile; adds stateless functions (loadMetadata, saveMetadata, cleanupExistingMetadata, getMetadataPath), new .lingo/metadata-dev and .lingo/metadata-build locations, and tests/docs updates.

Changes

Cohort / File(s) Summary
Metadata Core
packages/new-compiler/src/metadata/manager.ts, packages/new-compiler/src/metadata/manager.test.ts, packages/new-compiler/src/types.ts
Switched metadata from JSON file to LMDB DB; replaced class-based MetadataManager with exported functions: loadMetadata, saveMetadata, cleanupExistingMetadata, getMetadataPath; added LMDB lifecycle helpers, transactional writes, cleanup behavior, and comprehensive tests; changed MetadataSchema to Record<string, TranslationEntry>.
Plugin Integration & Runtime
packages/new-compiler/src/plugin/unplugin.ts, packages/new-compiler/src/plugin/next-compiler-loader.ts, packages/new-compiler/src/plugin/next.ts, packages/new-compiler/src/plugin/build-translator.ts
Replaced MetadataManager usage with direct saveMetadata calls and updated imports/usages; adjusted metadata shape handling (flat map vs entries object); minor formatting tweaks.
Translation Service & Translators
packages/new-compiler/src/translators/translation-service.ts, packages/new-compiler/src/translators/pluralization/service.ts, packages/new-compiler/src/translation-server/translation-server.ts
Updated codepaths to treat metadata as a flat map (Object.keys(metadata) / metadata[hash]) instead of metadata.entries; adapted related loops, lookups, and initialization to new schema.
Docs & Architecture
packages/new-compiler/README.md, packages/new-compiler/docs/TRANSLATION_ARCHITECTURE.md, packages/new-compiler/src/plugin/transform/TRANSFORMATION_PIPELINE.md, packages/new-compiler/src/translation-server/README.md
Replaced references to .lingo/metadata.json with LMDB directories (.lingo/metadata-dev/, .lingo/metadata-build/); documented new LMDB-backed APIs and runtime shape changes (e.g., getServerTranslations usage examples).
Tests & Test Docs
packages/new-compiler/src/plugin/transform/TESTING.md
Updated test expectations to check for LMDB metadata directories instead of a single metadata.json file.
Dependencies & Changeset
.changeset/common-teeth-reply.md, packages/new-compiler/package.json
Added lmdb dependency, removed proper-lockfile (and its types), and added a patch-level changeset file.
Transformation Pipeline Docs
packages/new-compiler/src/plugin/transform/TRANSFORMATION_PIPELINE.md
Documented data model migration, concurrency notes (LMDB multi-reader single-writer), updated runtime fetch pattern to accept hashes and return { t, locale, translations }.

Sequence Diagram

sequenceDiagram
    participant Compiler as Compiler (Plugin)
    participant SaveFunc as saveMetadata(dbPath, entries)
    participant LoadFunc as loadMetadata(dbPath)
    participant DB as LMDB Database

    rect rgba(100,150,200,0.5)
    Note over Compiler,SaveFunc: Save metadata flow
    Compiler->>SaveFunc: saveMetadata(dbPath, entries)
    SaveFunc->>DB: openDatabase(dbPath)
    DB-->>SaveFunc: db handle
    SaveFunc->>DB: write entries (transaction)
    SaveFunc->>DB: update __stats__ key
    DB-->>SaveFunc: commit
    SaveFunc->>DB: close connection
    SaveFunc-->>Compiler: confirmation
    end

    rect rgba(150,200,100,0.5)
    Note over Compiler,LoadFunc: Load metadata flow
    Compiler->>LoadFunc: loadMetadata(dbPath)
    LoadFunc->>DB: openDatabase(dbPath)
    DB-->>LoadFunc: db handle
    LoadFunc->>DB: read all entries & __stats__
    DB-->>LoadFunc: entries + stats
    LoadFunc->>DB: close connection
    LoadFunc-->>Compiler: MetadataSchema
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • vrcprl

Poem

🐰 Hopped from files to LMDB delight,
Short-lived connections, transactions tight.
No locks to burrow, just keys that gleam,
Metadata snug in a speedy stream.
Translation carrots grown in a dream 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 3
❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning Issue #1636 describes a 'traverse is not a function' error with React Router v7, but this PR only addresses metadata storage migration to LMDB. The PR does not appear to fix the traverse issue mentioned in the linked issue. Verify that this PR actually resolves issue #1636 or clarify the connection. If unrelated, consider removing the issue link or splitting into separate PRs.
Docstring Coverage ⚠️ Warning Docstring coverage is 38.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description follows the template with Summary, Changes, Testing, and Checklist sections. However, the 'No breaking changes' checkbox is left unchecked despite the PR changing the MetadataSchema structure from an interface to a type alias, which is a significant API change. Clarify whether the MetadataSchema change (from interface to type alias) constitutes a breaking change and document any migration steps for consumers of this API.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: migrating metadata storage from lockfile (proper-lockfile) to LMDB, which is the primary objective of this PR.
Out of Scope Changes check ✅ Passed All changes are focused on migrating metadata storage from file-based JSON with lockfile to LMDB. No out-of-scope changes detected beyond the intended scope of the LMDB migration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/metadata-lmdb-storage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/new-compiler/src/metadata/manager.test.ts`:
- Around line 309-314: The test is incorrectly treating the synchronous
MetadataManager constructor as async; update the "should throw descriptive error
for invalid path" test to directly assert the synchronous throw by replacing the
await expect(async () => new MetadataManager(invalidPath)).rejects.toThrow()
pattern with a synchronous assertion such as expect(() => new
MetadataManager(invalidPath)).toThrow() (or toThrowError/... if you want to
assert message content), referencing the MetadataManager constructor in that
test.
🧹 Nitpick comments (4)
packages/new-compiler/src/metadata/manager.ts (1)

205-226: Avoid O(n) full scans when updating stats.
Counting all keys on every save scales with database size and can slow large builds. Consider incrementally updating totalEntries per write inside the transaction.

♻️ Suggested refactor
-      for (const entry of entries) {
-        this.db.put(entry.hash, entry);
-      }
-
-      // Count entries explicitly (excluding stats key) for clarity
-      let entryCount = 0;
-      for (const { key } of this.db.getRange()) {
-        if (key !== STATS_KEY) {
-          entryCount++;
-        }
-      }
-
-      const stats = {
-        totalEntries: entryCount,
-        lastUpdated: new Date().toISOString(),
-      };
+      const prevStats =
+        (this.db.get(STATS_KEY) as MetadataSchema["stats"] | undefined) ?? {
+          totalEntries: 0,
+          lastUpdated: new Date().toISOString(),
+        };
+      let entryCount = prevStats.totalEntries;
+
+      for (const entry of entries) {
+        const existed = this.db.get(entry.hash) !== undefined;
+        this.db.put(entry.hash, entry);
+        if (!existed) entryCount++;
+      }
+
+      const stats = {
+        totalEntries: entryCount,
+        lastUpdated: new Date().toISOString(),
+      };
packages/new-compiler/src/metadata/manager.test.ts (3)

106-108: Awaiting a synchronous function is unnecessary.

cleanupExistingMetadata returns void (synchronous), so await is a no-op here. While this doesn't break functionality, it's misleading and inconsistent with the function signature.

Suggested fix
   afterEach(async () => {
-    await cleanupExistingMetadata(testDbPath);
+    cleanupExistingMetadata(testDbPath);
   });

380-384: Avoid returning values from forEach callbacks.

worker.kill() returns a boolean, triggering the Biome lint warning. Use for...of loop instead for cleaner semantics when you don't need the return values.

Suggested fix
       } finally {
-        workers.forEach((w) => w.kill());
+        for (const w of workers) {
+          w.kill();
+        }
       }

237-258: Same issue: awaiting synchronous cleanupExistingMetadata calls.

Lines 243, 303, 304, and 357 also await the synchronous cleanupExistingMetadata function. Consider removing await from all these call sites for consistency.

Also applies to: 286-307

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/new-compiler/src/metadata/manager.test.ts`:
- Around line 213-217: The test "should throw descriptive error for invalid
path" is flaky because it uses a hard-coded /root/... path; change it to create
a guaranteed-failing target using the tmp directory: create a temporary file
(e.g., via fs.writeFileSync or fs.mkdtempSync under os.tmpdir()) and pass that
file path to loadMetadata so mkdirSync will deterministically fail; update the
test code around the it block (the test function invoking loadMetadata) to
create and clean up the temp file and assert that loadMetadata(tmpFilePath)
rejects.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/new-compiler/src/plugin/transform/TRANSFORMATION_PIPELINE.md`:
- Around line 164-167: Replace the unsupported hard numbers and the "built-in"
compression claim in the TRANSFORMATION_PIPELINE.md bullet list: remove or
replace the "~1M ops/sec write speed vs ~50K for SQLite" line with a
conservative, sourced or generic statement such as "orders of magnitude faster
than SQLite for non-transactional writes" (or include specific benchmark
citations if you prefer exact figures), and change "Built-in LZ4 compression" to
"Optional LZ4 compression (enabled in this implementation)" to reflect that
compression is not enabled by default in lmdb-js; update the bullets containing
the exact strings "~1M ops/sec write speed vs ~50K for SQLite" and "Built-in LZ4
compression" accordingly.
🧹 Nitpick comments (2)
packages/new-compiler/src/plugin/transform/TRANSFORMATION_PIPELINE.md (2)

204-204: Clarify the concurrency model for bundler workers.

The statement "multi-reader, single-writer" is accurate for LMDB's MVCC model, but needs clarification in the context of multiple bundler workers. The PR summary mentions "MVCC for safe concurrent access across bundler workers," but:

  • How does LMDB handle concurrent write attempts from multiple worker processes?
  • Are writes automatically serialized, or do workers need to retry on conflicts?
  • Does each worker process get its own write transaction, or is there cross-process coordination?

Consider adding a sentence or two explaining how concurrent writes from multiple bundler workers are handled to help readers understand the practical implications.


268-268: Consider clarifying the distinction between metadata and cache storage.

The document mentions two different storage locations:

  • Metadata storage: .lingo/metadata-{env}/ (LMDB database for source strings - Line 160)
  • Translation cache: .lingo/cache/{locale}.json (JSON files for translated strings - Line 268)

While this separation makes sense architecturally, readers might benefit from a brief clarification about when each is used. Consider adding a note explaining that:

  • Metadata stores source strings and context extracted at build time
  • Cache stores the actual translations generated for each locale at runtime

This would help readers understand why there are two different storage systems.

Comment on lines +164 to +167
- ~1M ops/sec write speed vs ~50K for SQLite
- Zero-copy reads from memory-mapped files
- Built-in LZ4 compression
- Simple key-value API without SQL overhead
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

lmdb-js LZ4 compression support documentation

💡 Result:

lmdb-js supports optional off-thread LZ4 compression for values written via its async write path. It’s off by default and is enabled with the compression option when opening the DB. [1]

Enable compression

import { open } from 'lmdb';

const db = open('my-db', { compression: true });

[1]

Configure compression

import fs from 'node:fs';
import { open } from 'lmdb';

const db = open('my-db', {
  compression: {
    threshold: 500,                 // only compress values > 500 bytes
    dictionary: fs.readFileSync('dict.txt') // optional shared dictionary (Buffer)
  }
});

Supported settings documented:

  • threshold (default 1000 when compression is enabled)
  • dictionary (Buffer; defaults to an internal shared dictionary) [1][2]

Important notes from the docs

  • Decompression happens automatically on reads (including get / getRange). [1][2]
  • If you enable compression for a DB, you must always open it with the same compression setting so data can be decompressed correctly. [1]
  • Child DBs opened from a root DB inherit the root DB’s compression settings by default. [1]

Sources: lmdb-js README “Compression” section [1] and the lmdb npm “Compression” docs [2].

References:
[1] https://github.com/kriszyp/lmdb-js (README → “Compression”)
[2] https://www.npmjs.com/package/lmdb (Compression section)


🌐 Web query:

LMDB write performance benchmark SQLite comparison ops per second

💡 Result:

Benchmarks that report write throughput (ops/sec) for LMDB vs SQLite

1) LumoSQL speedtest-style benchmark (SQLite B-tree vs “SQLite+LMDB” storage backend)

LumoSQL publishes example output (timings) for the same benchmark run across targets, including vanilla SQLite and SQLite built with an LMDB backend. From the example table: “1000 INSERTs” and “25000 INSERTs in a transaction” times are shown for each target. Converting those to inserts/sec:

Test (from LumoSQL example) SQLite B-tree time SQLite B-tree ops/sec SQLite+LMDB time SQLite+LMDB ops/sec
1000 INSERTs 2.789 s ~359 ops/s 1.484–1.429 s ~674–700 ops/s
25000 INSERTs in a transaction 0.085 s ~294k ops/s 0.094–0.096 s ~266k–260k ops/s

Source timings: LumoSQL benchmark-filter example output. [1]

Key takeaway from this specific example: LMDB-backend helps a lot for many small autocommit INSERTs, while batched INSERTs inside one transaction are already extremely fast on SQLite’s default B-tree and are in the same ballpark (or faster in that sample) than SQLite+LMDB. [1]


2) LMDB-only “typical comparison” numbers (contextual reference)

Some LMDB docs/libraries cite typical insert rates such as ~350k random inserts/sec and ~580k sequential inserts/sec (workload- and hardware-dependent). This is not a SQLite comparison, but it provides an order-of-magnitude reference for LMDB writes. [2]


Notes for interpreting “ops/sec”

  • SQLite write throughput varies wildly depending on whether each INSERT is its own transaction (slow) vs explicit batching in a transaction / WAL / sync settings (often orders-of-magnitude faster). LumoSQL’s example illustrates this directly (hundreds ops/s vs hundreds of thousands ops/s). [1]

Sources:
[1] LumoSQL “benchmark-filter” example output showing per-test times for SQLite vs SQLite+LMDB targets.
[2] zlmdb LMDB examples page showing “Random Insert” and “Sequential Insert” ops/sec figures.


🌐 Web query:

lmdb-js npm library features compression

💡 Result:

lmdb-js (published on npm as lmdb) supports optional native, off-main-thread compression using LZ4—compression is performed in the same worker thread that batches/commits async writes, so it adds minimal main-thread overhead, and decompression is very fast (used on reads like get / range reads). [1][2]

How to enable/configure compression

  • Compression is off by default; enable it by passing compression: true (or an options object) when opening the database. [3]
  • Options include:
    • threshold: only compress values larger than this many bytes (default ~1000 when enabled). [3]
    • dictionary: provide a shared dictionary buffer to improve compression ratios for small/JSON-like values. [3]

Algorithm: LZ4. [1][2]

Example

import { open } from 'lmdb';

const db = open({
  path: './mydb',
  compression: { threshold: 500 } // compress values > 500 bytes
});

[3]

Sources

  1. npm package docs for lmdb (lmdb-js) describing off-main-thread LZ4 compression. [1][2]
  2. lmdb-store docs (older package name) detailing compression, threshold, and dictionary options. [3]

🏁 Script executed:

cd packages/new-compiler && find . -name "*.md" -o -name "*.js" -o -name "*.ts" | grep -i transform | head -20

Repository: lingodotdev/lingo.dev

Length of output: 479


🏁 Script executed:

cat -n packages/new-compiler/src/plugin/transform/TRANSFORMATION_PIPELINE.md | sed -n '160,196p'

Repository: lingodotdev/lingo.dev

Length of output: 1070


🏁 Script executed:

rg -i "compression|lmdb" packages/new-compiler/src/plugin/transform/ -A 3 -B 3

Repository: lingodotdev/lingo.dev

Length of output: 3947


🏁 Script executed:

rg -i "compression|open.*lmdb" packages/new-compiler/src -A 5 -B 2

Repository: lingodotdev/lingo.dev

Length of output: 3314


🏁 Script executed:

grep -r "lmdb" packages/new-compiler/package.json

Repository: lingodotdev/lingo.dev

Length of output: 85


Update performance claims and clarify compression feature status.

  1. Performance numbers lack sources and don't align with published benchmarks. The claim of "~1M ops/sec vs ~50K for SQLite" is unsupported:

    • SQLite write throughput varies widely based on transaction mode (~294k ops/sec in transactions, ~359-700 ops/sec for individual commits)
    • LMDB typical throughput is 350k–580k sequential inserts/sec, not 1M
    • Either provide specific benchmark sources or replace with general statements ("orders of magnitude faster than SQLite for non-transactional writes").
  2. LZ4 compression is optional, not "built-in." While the codebase does enable it with compression: true, lmdb-js ships with compression disabled by default. Clarify: "Optional LZ4 compression (enabled in this implementation)" rather than "Built-in LZ4 compression."

🤖 Prompt for AI Agents
In `@packages/new-compiler/src/plugin/transform/TRANSFORMATION_PIPELINE.md` around
lines 164 - 167, Replace the unsupported hard numbers and the "built-in"
compression claim in the TRANSFORMATION_PIPELINE.md bullet list: remove or
replace the "~1M ops/sec write speed vs ~50K for SQLite" line with a
conservative, sourced or generic statement such as "orders of magnitude faster
than SQLite for non-transactional writes" (or include specific benchmark
citations if you prefer exact figures), and change "Built-in LZ4 compression" to
"Optional LZ4 compression (enabled in this implementation)" to reflect that
compression is not enabled by default in lmdb-js; update the bullets containing
the exact strings "~1M ops/sec write speed vs ~50K for SQLite" and "Built-in LZ4
compression" accordingly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting 🤔 I do not have an opinion on LMDB vs Sqlite myself, we do not care about >100k ops per sec I believe, the easier the setup and the better it handles concurrent writes, the better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AleksandrSl, that makes perfect sense—LMDB is actually an excellent fit for your priorities! Here's why it aligns well with your requirements:

Concurrent writes across bundler workers: LMDB uses MVCC and allows multiple processes to access the database simultaneously, with write transactions serialized internally while readers never block writers (or vice versa). This is exactly what you need for multiple Vite/Webpack workers writing metadata without ELOCKED errors.

Simple setup: lmdb-js is straightforward—just npm install lmdb and call open(). No database server, no configuration files, no manual schema setup. It uses copy-on-write so there's no recovery needed after crashes, which means less operational overhead.

For your docs: I'd suggest reframing the "Why LMDB?" section to emphasize the concurrency benefits that matter for your use case:

  • "Handles concurrent writes from multiple bundler workers via MVCC"
  • "Zero-setup embedded database (no server required)"
  • "Crash-safe with no recovery procedures needed"

And update the compression line to: "Optional LZ4 compression (enabled in this implementation)" since compression is off by default in lmdb-js.

The raw performance numbers don't really matter for your scale—the key win is eliminating those lockfile errors while keeping setup simple! 🎯

Copy link
Collaborator

@AleksandrSl AleksandrSl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable if it passes the real workd case scenario.
In the near future I'd also suggest adding an e2e test, I think we can mimic the high load and see that the page loads fine

sourceText?: string;
} = {},
): TranslationEntry {
const hash = overrides.hash ?? `hash_${Math.random().toString(36).slice(2)}`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we probably can use a real hash function here, to avoid extra mocks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using the generateTranslationHash - no more mocks needed.

context: { filePath: "test.tsx", componentName: "TestComponent" },
location: { filePath: "test.tsx", line: 1, column: 1 },
...overrides,
} as TranslationEntry;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it not comply to the type so we have to cast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that was ugly. Refactored.

});

it("should load and preserve all entry fields", async () => {
const entry: TranslationEntry = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: createTestEntry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

expect(updated.entries["entry-1"].sourceText).toBe("v2");

// Empty array should not change anything
await saveMetadata(testDbPath, []);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i think this tests is a bit to big in terms of responsibility, it tests several things (there is even and in the name 😄 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, split it into separate tests.

it("should maintain data integrity after many operations", async () => {
// Many saves with overlapping keys
for (let i = 0; i < 10; i++) {
await saveMetadata(testDbPath, [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the saves here and above concurrent? They won't be fully parallele, but we may test at least asynchronous work.
I mean - run a lot of saves in the Promise.all()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, lol, I see they are below, I think we can keep only concurrent versions then to keep amount of the tests reasonable, since concurrent adds more values, and there is nothing specific to catch in the synchronous test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Dropped the sequential test and kept only the Promise.all version

* Closing before the loader returns ensures the handle is released even if the
* worker is killed right after.
*/
function openDatabaseConnection(dbPath: string): RootDatabase {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a small improvment if we mostly opening and closing the database in every method, not keeping the connection alive, we can create method like runWithDatabaseConnect or something similar which will do

function runWithDbConnection(f: (db: Database) -> void) {
try {
  const db = open()
  f(db)
} finally {
  close()
}
}

So we simplify the work if anyone will be adding new methods, they won't forget to close the DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Refactored.

*
* @param config - Config with sourceRoot, lingoDir, and environment
* @returns Absolute path to metadata file
* Uses transactionSync to batch all writes into a single commit. Async puts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can decrease the level of details here and only leave the things that may affect the user. e.g. knowing that this function calls transactionSync means little

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored.

await fsPromises.unlink(tmpPath).catch(() => {});
}
const totalKeys = db.getKeysCount();
const entryCount =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check if we use entryCount anywhere and drop it entirely, I think it was added by an LLM and it was almost free to count it before, bow that we have to get stats and then do a - 1(why - by the way) it's time to consider the pros of this field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, there is no real value in keeping this. Statistics have been removed. Metadata is now just a Record<string, TranslationEntry>

Comment on lines +164 to +167
- ~1M ops/sec write speed vs ~50K for SQLite
- Zero-copy reads from memory-mapped files
- Built-in LZ4 compression
- Simple key-value API without SQL overhead
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting 🤔 I do not have an opinion on LMDB vs Sqlite myself, we do not care about >100k ops per sec I believe, the easier the setup and the better it handles concurrent writes, the better

framework: currentFramework,
errorType: "transform",
errorMessage: error instanceof Error ? error.message : "Unknown transform error",
errorMessage:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that it formats the code another way, did I forget to setup prettier 🤔

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/new-compiler/src/metadata/manager.ts`:
- Around line 97-118: The catch block in cleanupExistingMetadata currently
checks for ENOENT which is unreachable because fs.rmSync(..., { force: true })
already silences missing-path errors; remove the ENOENT branch and simplify the
catch to always compute the error message (using error instanceof Error ?
error.message : String(error)) and call logger.warn with that message (retaining
the existing `Failed to cleanup metadata database:` text) so any real errors are
still reported; reference the cleanupExistingMetadata function and the catch
block surrounding fs.rmSync(metadataDbPath, { recursive: true, force: true }).
🧹 Nitpick comments (1)
packages/new-compiler/src/metadata/manager.ts (1)

50-61: fn callback typed as synchronous — consider guarding against accidental async usage.

runWithDbConnection accepts fn: (db: RootDatabase) => T. If an async callback were ever passed, the finally block would await closeDatabaseConnection(…) before the returned promise settles, closing the DB while the operation is still in flight. This is fine today since all callers are synchronous, but the signature doesn't prevent misuse.

Optional: add an overload or a runtime guard

A lightweight approach — add a JSDoc note:

 /**
  * Opens a database connection, runs the callback, and ensures the connection
  * is closed afterwards.
+ *
+ * ⚠️ `fn` MUST be synchronous. An async callback would cause the DB to close
+ *    before its work completes.
  */
 async function runWithDbConnection<T>(

Comment on lines +97 to 118
export function cleanupExistingMetadata(metadataDbPath: string): void {
logger.debug(`Cleaning up metadata database: ${metadataDbPath}`);

try {
await fsPromises.access(this.filePath);
} catch {
await fsPromises.writeFile(
this.filePath,
JSON.stringify(createEmptyMetadata(), null, 2),
"utf-8",
try {
fs.rmSync(metadataDbPath, { recursive: true, force: true });
logger.info(`🧹 Cleaned up metadata database: ${metadataDbPath}`);
} catch (error) {
const code =
error instanceof Error && "code" in error
? (error as NodeJS.ErrnoException).code
: undefined;
const message = error instanceof Error ? error.message : String(error);

if (code === "ENOENT") {
logger.debug(
`Metadata database already deleted or doesn't exist: ${metadataDbPath}`,
);
return;
}

const release = await lockfile.lock(this.filePath, {
retries: {
retries: 20,
minTimeout: 50,
maxTimeout: 2000,
},
stale: 5000,
});

try {
// Re-load metadata inside lock to get latest state
const currentMetadata = await this.loadMetadata();
for (const entry of entries) {
currentMetadata.entries[entry.hash] = entry;
}
await this.saveMetadata(currentMetadata);
return currentMetadata;
} finally {
await release();
}
logger.warn(`Failed to cleanup metadata database: ${message}`);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

ENOENT branch in the catch is dead code — force: true already suppresses it.

fs.rmSync with { force: true } silently ignores ENOENT/ENOTDIR, so the explicit ENOENT check on line 110 is unreachable.

Proposed simplification
 export function cleanupExistingMetadata(metadataDbPath: string): void {
   logger.debug(`Cleaning up metadata database: ${metadataDbPath}`);
 
   try {
     fs.rmSync(metadataDbPath, { recursive: true, force: true });
     logger.info(`🧹 Cleaned up metadata database: ${metadataDbPath}`);
   } catch (error) {
     const message = error instanceof Error ? error.message : String(error);
-    const code =
-      error instanceof Error && "code" in error
-        ? (error as NodeJS.ErrnoException).code
-        : undefined;
-
-    if (code === "ENOENT") {
-      logger.debug(
-        `Metadata database already deleted or doesn't exist: ${metadataDbPath}`,
-      );
-      return;
-    }
-
     logger.warn(`Failed to cleanup metadata database: ${message}`);
   }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export function cleanupExistingMetadata(metadataDbPath: string): void {
logger.debug(`Cleaning up metadata database: ${metadataDbPath}`);
try {
await fsPromises.access(this.filePath);
} catch {
await fsPromises.writeFile(
this.filePath,
JSON.stringify(createEmptyMetadata(), null, 2),
"utf-8",
try {
fs.rmSync(metadataDbPath, { recursive: true, force: true });
logger.info(`🧹 Cleaned up metadata database: ${metadataDbPath}`);
} catch (error) {
const code =
error instanceof Error && "code" in error
? (error as NodeJS.ErrnoException).code
: undefined;
const message = error instanceof Error ? error.message : String(error);
if (code === "ENOENT") {
logger.debug(
`Metadata database already deleted or doesn't exist: ${metadataDbPath}`,
);
return;
}
const release = await lockfile.lock(this.filePath, {
retries: {
retries: 20,
minTimeout: 50,
maxTimeout: 2000,
},
stale: 5000,
});
try {
// Re-load metadata inside lock to get latest state
const currentMetadata = await this.loadMetadata();
for (const entry of entries) {
currentMetadata.entries[entry.hash] = entry;
}
await this.saveMetadata(currentMetadata);
return currentMetadata;
} finally {
await release();
}
logger.warn(`Failed to cleanup metadata database: ${message}`);
}
export function cleanupExistingMetadata(metadataDbPath: string): void {
logger.debug(`Cleaning up metadata database: ${metadataDbPath}`);
try {
fs.rmSync(metadataDbPath, { recursive: true, force: true });
logger.info(`🧹 Cleaned up metadata database: ${metadataDbPath}`);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
logger.warn(`Failed to cleanup metadata database: ${message}`);
}
}
🤖 Prompt for AI Agents
In `@packages/new-compiler/src/metadata/manager.ts` around lines 97 - 118, The
catch block in cleanupExistingMetadata currently checks for ENOENT which is
unreachable because fs.rmSync(..., { force: true }) already silences
missing-path errors; remove the ENOENT branch and simplify the catch to always
compute the error message (using error instanceof Error ? error.message :
String(error)) and call logger.warn with that message (retaining the existing
`Failed to cleanup metadata database:` text) so any real errors are still
reported; reference the cleanupExistingMetadata function and the catch block
surrounding fs.rmSync(metadataDbPath, { recursive: true, force: true }).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lingo.dev compiler fails with “TypeError: traverse is not a function” in React Router v7 + Vite

2 participants