Quick clone
The repository URL or slug may differ between forks or reorganizations. Replace the example below with the canonical repo URL for your fork or organization.
Examples:
HTTPS:
bashgit clone https://github.com/<owner>/<repo>.git cd <repo>SSH:
bashgit clone git@github.com:<owner>/<repo>.git cd <repo>
If you are following a public guide or ticket, verify the owner/repo values match the project you intend to inspect (for example, doctype/doctypedev or your fork). You can also use the repository URL from your project's README.
Overview
This repository implements a Retrieval-Augmented Generation (RAG) pipeline used by the CLI. The key components to check when things break are:
RetrievalService.indexProject()— incremental indexer (packages/cli/src/services/rag/retrieval-service.ts).EmbeddingService— embedding generation (packages/cli/src/services/rag/embedding-service.ts).VectorStoreService(LanceDB) — persistent vector store under the.sintesiworkspace.- Native core / GitBinding — Rust NAPI bindings (crates/core) used for fast git diffs.
- Tests under
packages/cli/__tests__(e.g.,changeset.test.ts,indexing-state.test.ts).
Below are focused troubleshooting steps, diagnostics, and where to look in the codebase and on disk.
Diagnostic flow (mermaid)
Where to look first
- Indexing logic:
packages/cli/src/services/rag/retrieval-service.ts- Git SHA and working tree checks use
execSync('git rev-parse HEAD')andexecSync('git status --porcelain'). - If previous SHA differs, it calls
new GitBinding(this.projectRoot).analyzeChanges(lastSha)(Rust binding). - Fallback logic uses file timestamps via
statSync. - Deletion operations call
this.vectorStore.deleteChunks(...). - Embedding is performed by
this.embeddingService.embedDocuments(...)in batches.
- Git SHA and working tree checks use
- Embeddings:
packages/cli/src/services/rag/embedding-service.ts- Reads
process.env.OPENAI_API_KEYand optionallyprocess.env.HELICONE_API_KEY. - Default model: the code typically selects
text-embedding-3-smallunless overridden. NOTE: the exact model string can be determined at runtime (see the embedding service implementation) — inspectembedding-service.tsto confirm whether the model is a hard-coded constant, derived from an env var, or selected from provider settings. - Errors are logged via
this.logger.error(...).
- Reads
- Vector store and persistence:
- Vector store backed by LanceDB and persisted in the
.sintesiworkspace (see RAG README:VectorStoreServiceuses.sintesi/lancedb). - Index state is managed by
IndexingStateManager(tests atpackages/cli/__tests__/indexing-state.test.ts) and stored under./.sintesi.
- Vector store backed by LanceDB and persisted in the
- Native Rust binding (GitBinding):
crates/coreand its NAPI exports referenced incrates/core/src/napi/mod.rs.- If the native binding is missing or incompatible, the code logs a warning and falls back to timestamp-based indexing.
Common problems & step-by-step fixes
1) Indexing reports "Index is up to date" but new changes are ignored
Why it happens:
RetrievalServicecomparescurrentShato the last persisted SHA. If they match and the workspace is clean, it exits early. Diagnostics:- Run the git checks the indexer uses:bash
git rev-parse HEAD git status --porcelain - Check index state (look for files under
./.sintesi). Where to look: packages/cli/src/services/rag/retrieval-service.tsIndexingStateManagerimplementation and tests:packages/cli/__tests__/indexing-state.test.tsFixes:- Ensure files are committed if you expect the comparator to use commit SHAs (or force re-index).
- If you intentionally want to re-index, either remove the saved state under
./.sintesior run a command that forces indexing (the CLI exposes a--forcestyle behavior in code paths — consult the CLI entrypoint you use).
2) Git diff via GitBinding fails (Rust native binding) and you see fallback behavior
Why it happens:
- Native binding may not be present, incompatible, or throw in
analyzeChanges. Diagnostics: - Inspect logs for the exact warning emitted by the catch where
GitBinding.analyzeChanges()is called. Openpackages/cli/src/services/rag/retrieval-service.tsand find the try/catch aroundanalyzeChanges()to see the exact message the code logs. Example search terms you can use:- Search for the exact logger call in the source:
- grep for "analyzeChanges" or search for
GitBindingusage inretrieval-service.ts.
- grep for "analyzeChanges" or search for
- Search logs for phrases similar to:
- Failed to compute git diff using Rust binding
- Falling back to timestamp check
- Search for the exact logger call in the source:
- IMPORTANT: the exact log string can change between commits. If you rely on a log string for automation or debugging, copy the literal string from the code (the logger call in the catch block). Where to look:
packages/cli/src/services/rag/retrieval-service.tslogs the catch and fallback.- Native artifacts in
crates/coreand platform-specific npm packages (seecrates/core/README.md). Fixes: - Rebuild native core following
crates/core/README.mdbuild steps. - If rebuilding isn’t possible, timestamp fallback still works — ensure file timestamps change (e.g.,
touchfiles) to trigger reindexing.
3) Embedding generation fails or returns empty vectors
Why it happens:
- Missing or invalid
OPENAI_API_KEY. - Helicone proxy incorrectly configured; Helicone fallback may throw.
- External API rate limits or network errors. Diagnostics:
- Check logs for:
- a warning that the API key is missing (the exact text is emitted from
EmbeddingServiceconstructor — inspectembedding-service.tsfor the exact message). - a debug line indicating embedding model and batch size (inspect the file to confirm the literal message; the code commonly logs something like
Generating embeddings for X chunks using text-embedding-3-small..., but confirm the running code). Embedding generation failed: <message>(or similar exact error logged by the catch).
- a warning that the API key is missing (the exact text is emitted from
- The code path to inspect:
packages/cli/src/services/rag/embedding-service.ts. Where to look: EmbeddingServiceconstructor for warnings aboutOPENAI_API_KEYandHELICONE_API_KEY.- Calls to
embedMany(...)and its catch where errors are logged and thrown. Fixes: - Ensure
OPENAI_API_KEYis set in your environment:bashexport OPENAI_API_KEY="sk_..." - If you use Helicone, confirm
HELICONE_API_KEYis valid and that any proxybaseURL/headers are allowed. - Re-run indexer. If embedding still fails, inspect network access and provider limits.
Notes about the embedding model default:
- The code commonly uses
text-embedding-3-smallas the default model. However, the exact effective model may be:- a hard-coded constant in
embedding-service.ts, or - selected at runtime based on env vars or provider configuration (for example, an env var such as
OPENAI_EMBEDDING_MODELor provider-specific settings).
- a hard-coded constant in
- Always verify the exact default by opening
packages/cli/src/services/rag/embedding-service.tsand locating the model selection logic if you need deterministic behavior.
4) Vector store operations fail (add/delete chunks)
Why it happens:
- Disk permission issues under
./.sintesior corrupted LanceDB files. Diagnostics: - Look for logs around
this.vectorStore.addChunks(...)andthis.vectorStore.deleteChunks(...)inretrieval-service.ts. - Inspect the
.sintesidirectory:ls -la .sintesiandls -la .sintesi/lancedb. Where to look: packages/cli/src/services/ragfor vector store wrapper implementation and RAG README noting.sintesi/lancedb. Fixes:- Ensure the CLI process user has write permission to the repository workspace and
.sintesi. - Backup and (if acceptable) remove or recreate
.sintesi/lancedbto force a fresh vector store. - Re-run indexing to rebuild the store.
5) Partial embedding success: some batches embed, others fail
Why it happens:
- Per-batch network or API intermittent errors;
embedManymight fail for a batch and the code logs batch-level failures. Diagnostics: - Search logs for the exact batch-failure messages emitted by
retrieval-service.ts(open the file and copy the literal text used inlogger.errorfor batch failures). Where to look: - The embedding loop and
catchinretrieval-service.ts. Fixes: - Re-run indexing; intermittent errors often succeed on retry.
- Decrease batch size in code if you have a very constrained API quota.
Batching constant note:
- The code uses a batching constant to control how many texts are sent to the embedding API per request. Commonly this is named
TEXTS_PER_BATCHand is set to 20 in many examples, but the exact name and value may change. To avoid confusion:- Open
packages/cli/src/services/rag/retrieval-service.tsand confirm the constant name and value used in your branch (search forTEXTS_PER_BATCH,BATCH_SIZE, or similar). - Edit that constant in-place to tune the batching behavior for your environment.
- Open
6) Tests failing related to native bindings or dependency resolution
What to inspect:
- Tests mock
@sintesi/corein several places to avoid native binding issues (seepackages/cli/__tests__/changeset.test.ts). changeset.test.tsvalidates thatchangesetCommandchecks for@changesets/cliusingcreateRequireand handles missing dependency by returningsuccess: falseand an error mentioning@changesets/cli not installed. Where to look:packages/cli/__tests__/changeset.test.tspackages/cli/__tests__/indexing-state.test.tspackages/cli/__tests__/markdown-builder.test.tspackages/cli/__tests__/package-selector.test.tsDiagnostics & fixes:- If tests fail with missing native modules, ensure any test-time mocks are correct. The repository intentionally mocks
@sintesi/corein tests to avoid native binding issues. - Run a single test file for faster debugging:bash
npx vitest packages/cli/__tests__/changeset.test.ts - Inspect test mocks: tests often mock
child_process.execSync,module.createRequire, or@sintesi/core. Failures may indicate the mock setup doesn’t match runtime expectations.
CI vs local / environment requirements:
- Unit tests and many CI job configurations commonly use mocks for
@sintesi/coreand AI provider clients. As a result:- Unit tests may run successfully in CI without a real
OPENAI_API_KEYor without a built native@sintesi/corebinding. - End-to-end or integration runs (running the indexer against a real provider or running code paths that call the native binding) require:
- A valid
OPENAI_API_KEY(and/or other provider keys) in the environment for real embedding calls. - The native
@sintesi/coremodule built for your platform if you want the GitBinding path (followcrates/core/README.md).
- A valid
- If you are replicating an issue locally, confirm whether the failing test is a unit test with mocks (no API key required) or an integration path that needs the API keys and/or native binding.
- Unit tests may run successfully in CI without a real
CLI flags and command args (references present in tests)
| Flag / Arg | Type | Behavior (from tests / code) |
|---|---|---|
baseBranch | string | Branch used as base for operations (example: 'main' or 'develop') |
forceFetch | boolean | When true, test expects a git fetch origin <baseBranch> call in changesetCommand |
noAI | boolean | Used in tests to skip AI-dependent paths (speeds up tests) |
Configuration & environment (table)
| Property / Env | Required? | Purpose / Where used |
|---|---|---|
OPENAI_API_KEY | yes (for embeddings in E2E/integration runs) | Used by EmbeddingService to call OpenAI embeddings (commonly text-embedding-3-small unless overridden in code or env). |
HELICONE_API_KEY | optional | If present, EmbeddingService and AI providers use Helicone proxy for observability/caching. |
COHERE_API_KEY | optional | Reranking (RAG) – documented in RAG README. |
.sintesi/lancedb | n/a | Vector store persisted by VectorStoreService (LanceDB). |
./.sintesi | n/a | Index state managed by IndexingStateManager (see indexing-state tests). |
Useful commands & checks
- Check git SHA and status (mirrors indexer logic):
git rev-parse HEAD
git status --porcelain- Inspect
.sintesiworkspace:
ls -la .sintesi
ls -la .sintesi/lancedb- Run a quick indexing example (from RAG README) — create
scripts/run-index.ts:
// scripts/run-index.ts
import { Logger } from './packages/cli/src/utils/logger';
import { RetrievalService } from './packages/cli/src/services/rag/retrieval-service';
(async () => {
const logger = new Logger(true); // enable debug if logger supports it
const svc = new RetrievalService(logger, process.cwd());
await svc.indexProject();
})();Run it with:
node --loader ts-node/esm scripts/run-index.ts(Adjust runtime invocation to your environment and TypeScript setup.)
Logging and increasing verbosity
- The indexer, embedding service, and AI provider use
this.logger.*calls (info,debug,warn,error,success). Search the codebase forlogger.debug/logger.warn/logger.errorto find useful log points. - Many tests construct
new Logger(false)ornew Logger(true)(seepackages/cli/__tests__/package-selector.test.ts) — if the constructor accepts a boolean, passingtruemay enable debug logging (inspectpackages/cli/src/utils/loggerfor exact API).
When relying on log messages for debugging or automation:
- Prefer copying the exact string from the source files (e.g., the message passed to
logger.warninretrieval-service.ts) and then search logs for that literal string. - If you only have a rough fragment, search for the code site (file and line) and confirm the current literal message used by the running version.
Recovery patterns
- If indexing stalls or embedding fails repeatedly:
- Confirm
OPENAI_API_KEYand network connectivity (for real embedding calls). - Stop processes that may lock
.sintesi/lancedb. - Backup
.sintesiand remove it to rebuild from scratch. - Re-run indexing to rebuild embeddings and state.
- Confirm
- If GitBinding (Rust) is missing:
- Rebuild the native crate following
crates/core/README.md. - If rebuild not possible, rely on timestamp fallback (ensure files’
mtimechanges).
- Rebuild the native crate following
When to open an issue
Include the following in your report:
- Exact error log lines (copy the literal messages from the run; examples to look for in the code are messages emitted around embedding failures, GitBinding fallbacks, and batch failures — copy them from the source or logs).
- Full stdout/stderr from the failing run.
- A brief description of whether you ran indexing in a dirty workspace or on a clean commit.
- Contents (listing) of
./.sintesiand./.sintesi/lancedb. - Results of
git rev-parse HEADandgit status --porcelain. - Which tests fail and their output when running
npx vitest <path-to-test>. - Whether the failing run was a unit test (which may use mocks) or an integration/e2e run (which may require API keys and native bindings).
Quick references
- Indexing core logic:
packages/cli/src/services/rag/retrieval-service.ts - Embeddings:
packages/cli/src/services/rag/embedding-service.ts - RAG README:
packages/cli/src/services/rag/README.md - Indexing state tests:
packages/cli/__tests__/indexing-state.test.ts - Changeset tests (dependency resolution &
git fetchbehavior):packages/cli/__tests__/changeset.test.ts - Native core (NAPI/Rust):
crates/core/src/napi/mod.rsandcrates/core/README.md
If you cannot resolve an issue after following these steps, prepare logs and the details requested above and open an issue in the repository with the 📁 .sintesi directory listing and the exact failing command or test.