| commit | 1eb393bb076ac59d11c9b01881220d8cdff5141c | [log] [tgz] |
|---|---|---|
| author | Andrea Arcangeli <aarcange@redhat.com> | Mon Dec 15 10:17:46 2025 +0100 |
| committer | Andrea Arcangeli <aarcange@redhat.com> | Mon Dec 15 10:18:16 2025 +0100 |
| tree | 3243e40ffa4f294f9662015a279ae5e150f7a588 | |
| parent | 28b72b725426106f106c77cd56572737289b1323 [diff] |
Handle context size errors in API responses When the endpoint returns a context size error, the request should not be retried as it indicates the request is too large for the model's context window. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
AI-powered conflict resolution for Git
synthmerge is a minimalistic command-line tool that leverages AI to automatically resolve conflicts arising from Git commands. Built on the research of the Patchpal project, it provides a pure AI inference layer that seamlessly integrates with your existing Git workflow. While the AI generates code solutions, all code reviews and approvals remain within your favorite code editor.
Specialized AI Layer
Dedicated AI inference system that complements Git without duplicating its core functionality
Git Integration
Leverages Git's diff3 conflict markers as the foundation (requires git config merge.conflictStyle diff3)
Editor Agnostic
Compatible with any development environment (VS Code, Emacs, Vim, etc.)
Universal Git Operation Support
Seamlessly integrates with all Git operations that create conflicts:
cherry-pickmergerebaserevertstash popModel Flexibility
No fine-tuning required, any instruct large language model can be used
Parallel Multi-AI Endpoint Support
Simultaneously queries multiple AI models to resolve conflicts:
Parameter Variants Support
Each AI endpoint can be configured with multiple parameter variants to run multiple inference strategies:
Results Deduplication
Consolidates identical solutions and displays model and/or parameter variant agreement
Review Using Your Workflow
Fail-Safe Design
Benchmark
Built-in benchmarking tool (synthmerge_bench) for evaluating model accuracy on conflict resolution tasks
Context Lines Configuration
Configurable context lines for code, diff, and patch to control the amount of surrounding information provided to AI models
Git sets up conflicts
git config merge.conflictStyle diff3 # Must be set git cherry-pick -x <commit> # Git detects conflicts
synthmerge analyzes conflicts
diff3 conflict markersAI resolves conflict
Git gets updated
✅ Works also for git rebase, revert and merge conflict resolutions.
# Ensure Git is configured for diff3 conflict style git config merge.conflictStyle diff3 # Attempt cherry-pick (will leave conflicts unresolved) git cherry-pick -x <commit> # Resolve conflicts with AI synthmerge # Review synthmerge resolved conflicts in each unmerged file ... git diff --name-only --diff-filter=U # ... or linearized in a single buffer to edit with ripgrep-edit rg-edit -E vim -U -e '(?s)^<<<<<<<+ .*?^>>>>>>>+ ' rg-edit -E emacsclient -U -e '(?s)^<<<<<<<+ .*?^>>>>>>>+ '
Create ~/.config/synthmerge.yaml based on synthmerge.yaml:
endpoints: - name: "Claude Sonnet 4.5" url: "https://api.anthropic.com/v1/messages" type: "anthropic" x_api_key_file: "~/.keys/anthropic.api-key" json: max_tokens: 20000 model: "claude-sonnet-4-5" temperature: 0 headers: anthropic-version: "2023-06-01" variants: - name: "default" - name: "no_diff" context: no_diff: true - name: "Vertex Claude Sonnet 4.0" url: "https://host/path" type: "anthropic" api_key_file: "~/.keys/claude.api-key" json: anthropic_version: "something-YYYY-MM-DD" max_tokens: 20000 temperature: 0 variants: - name: "default" - name: "no_diff" context: no_diff: true # Optional root certificate for HTTPS endpoints # root_certificate_pem: "~/.ssl/corp-ca.pem" - name: "Patchpal AI" type: "patchpal" url: "http://patchpal.usersys.redhat.com:9080/v1" - name: "Gemini 2.5 Flash" url: "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" type: "openai" api_key_file: "~/.keys/gemini.api-key" json: model: "gemini-2.5-flash" # "none" (only available with Flash) works better with default layout reasoning_effort: "none" variants: - name: "default" - name: "no_diff" context: no_diff: true - name: "Gemini 2.5 Pro" url: "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" type: "openai" api_key_file: "~/.keys/gemini.api-key" json: model: "gemini-2.5-pro" reasoning_effort: "low" context: # reasoning_effort != none needs the prompt at the top of system_message layout: system_message: - prompt user_message: - training - diff variants: - name: "default" - name: "no_diff" context: no_diff: true - name: "Gemini 3 Pro preview" url: "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" type: "openai" api_key_file: "~/.keys/gemini.api-key" json: model: "gemini-3-pro-preview" reasoning_effort: "low" context: layout: system_message: - prompt user_message: - training - diff variants: - name: "default" - name: "no_diff" context: no_diff: true - name: "llama.cpp vulkan minimal" # requires --no-jinja url: "http://localhost:8811/v1/chat/completions" type: "openai" - name: "llama.cpp vulkan" # requires --no-jinja url: "http://localhost:8811/v1/chat/completions" #timeout: 600000 #retries: 10 #delay: 1000 #max_delay: 600000 #wait: 1000 type: "openai" json: #temperature: 0.7 #top_p: 0.8 #top_k: 20 #min_p: 0 # n_probs: 1 provides the probability of the lowest probability # token in the resolved conflict n_probs: 1 # n_probs: 2 same as n_probs: 1 but it also provides two more # beams with the perplexity search algorithm of synthmerge # applied to the logprobs, which is a client side only # approximated beam search #n_probs: 2 variants: # one query for each entry in the variants list - name: "default" - name: "no_diff" context: no_diff: true #- name: "min_p" # json: # temperature: 0.3 # top_p: 1.0 # top_k: 0 # min_p: 0.9 - name: "llama.cpp vulkan no_chat" # requires --no-jinja url: "http://localhost:8811/v1/completions" type: "openai" no_chat: true context: no_training: true
| Endpoint Type | Example Configuration | Notes |
|---|---|---|
| Patchpal-backend | type: "patchpal" | Fine-tuned for patch resolution |
| OpenAI protocol | type: "openai" | Self-hosted LLMs (e.g., llama.cpp) and Gemini |
| Anthropic protocol | type: "anthropic" | Claude models |
✅ Gemini supports a compatible OpenAI endpoint
✅ Models work with stock weights – the prompt engineering simulates Patchpal's fine-tuned behavior.
The context: layout: configuration allows fine-grained control over how information is structured in a LLM request.
reasoning_effort: none) perform best when the most important directives are closest to the generationreasoning_effort != none require the prompt explaining the challenge at hand to be at the top of the system messageprompt: The high-level prompt explaining the challengetraining: The synthetic training examplesdiff: The full git diff showing all other changes of the commitno_diff: Disable diff inclusion in contextno_training: Disable training examples in context# Set layout at endpoint level context: layout: system_message: - prompt user_message: - training - diff # Override layout in a variant variants: - name: "no_diff" context: no_diff: true
The layout can be configured either at the endpoint level or in individual variants, but not both simultaneously in the same endpoint.
A Fedora Copr package is available:
Install Synthmerge:
sudo dnf copr enable vittyvk/synthmerge sudo dnf install synthmerge
Configuration:
cp -a /usr/share/synthmerge/synthmerge.yaml ~/.config/ $EDITOR ~/.config/synthmerge.yaml
Install Synthmerge:
git clone https://gitlab.com/aarcange/synthmerge.git cd synthmerge cargo build --release sudo cp target/release/synthmerge /usr/local/bin/
Configuration:
cp synthmerge.yaml ~/.config/ $EDITOR ~/.config/synthmerge.yaml
The following statistics were generated using the synthmerge_bench tool on a C language dataset to evaluate model performance on conflict resolution tasks. These results may vary depending on prompt, context, and other variables.
Accuracy checks if the AI resolved conflict is an exact match including all spaces, tabs, and newlines.
Accuracy (aligned) checks equality of whitespace patterns up until the first non-whitespace character, ignoring differences in lines without non-whitespace characters and whitespace variations after the first non-whitespace character (i.e. Python equivalence).
Accuracy (stripped) compresses all whitespaces and newlines into a single space (i.e. C/C++/Rust/JavaScript equivalence).
This measurement used only new test data never exposed to the model during the fine tuning process.
Claude Sonnet 4.5 and Gemini 3 Pro preview not done yet. Model: Claude Sonnet 4.0 (default) Accuracy: 66.70% (753/1129) Accuracy (aligned): 70.42% (795/1129) Accuracy (stripped): 73.34% (828/1129) Error Rate: 0.00% (0/1129) Average tokens: 5730.47 Average duration: 7.03 s Model: Claude Sonnet 4.0 (no_diff) Accuracy: 65.19% (736/1129) Accuracy (aligned): 68.29% (771/1129) Accuracy (stripped): 71.48% (807/1129) Error Rate: 0.00% (0/1129) Average tokens: 1184.14 Average duration: 6.34 s # only the Patchpal Beam 0 is comparable to the non Patchpal models Model: Patchpal AI Accuracy: 64.57% (729/1129) Accuracy (aligned): 68.47% (773/1129) # might be duplicate with other beams Accuracy (stripped): 71.12% (803/1129) # might be duplicate with other beams Error Rate: 0.44% (5/1129) Model: Gemini 2.5 Pro (high) # reasoning_effort: high Accuracy: 55.18% (623/1129) Accuracy (aligned): 60.67% (685/1129) Accuracy (stripped): 63.42% (716/1129) Error Rate: 0.00% (0/1129) Model: Gemini 2.5 Flash (none no_diff) # reasoning_effort: none Accuracy: 53.06% (599/1129) Accuracy (aligned): 63.24% (714/1129) Accuracy (stripped): 66.25% (748/1129) Error Rate: 3.28% (37/1129) Average tokens: 1036.06 Average duration: 1.18 s # context: layout: system_message: [ prompt ] user_message: [ training, diff ] Model: Gemini 2.5 Pro (low userctx) # reasoning_effort: low Accuracy: 52.44% (592/1129) Accuracy (aligned): 56.95% (643/1129) Accuracy (stripped): 59.70% (674/1129) Error Rate: 5.49% (62/1129) Average tokens: 6014.82 Average duration: 9.68 s # context: layout: system_message: [ prompt ] user_message: [ training, diff ] Model: Gemini 2.5 Pro (low no_diff) # reasoning_effort: low Accuracy: 51.99% (587/1129) Accuracy (aligned): 55.36% (625/1129) Accuracy (stripped): 58.02% (655/1129) Error Rate: 2.92% (33/1129) Average tokens: 1931.27 Average duration: 9.11 s # temperature: 0.7 top_k: 20 top_p: 0.8 min_p: 0 # llama.cpp vulkan Q6_K Model: Qwen3-Coder-30B-A3B-Instruct (default) Accuracy: 49.69% (561/1129) Accuracy (aligned): 54.21% (612/1129) Accuracy (stripped): 56.78% (641/1129) Error Rate: 0.09% (1/1129) Average tokens: 4252.31 Average duration: 9.18 s Average prob: 33.1% (+- 35.4) Average prob (incorrect): 16.3% (+- 40.7) Average prob (stripped): 56.7% (+- 27.4) Average prob (aligned): 58.0% (+- 27.2) Average prob (correct): 61.6% (+- 25.9) Model: Gemini 2.5 Flash (none default) # reasoning_effort: none Accuracy: 49.60% (560/1129) Accuracy (aligned): 60.41% (682/1129) Accuracy (stripped): 63.42% (716/1129) Error Rate: 6.20% (70/1129) Average tokens: 5069.04 Average duration: 1.15 s # context: layout: system_message: [ prompt ] user_message: [ training, diff ] Model: Gemini 2.5 Flash (low no_diff userctx) # reasoning_effort low Accuracy: 48.72% (550/1129) Accuracy (aligned): 58.19% (657/1129) Accuracy (stripped): 62.00% (700/1129) Error Rate: 2.66% (30/1129) Average tokens: 1916.70 Average duration: 4.62 s # temperature: 0.7 top_k: 20 top_p: 0.8 min_p: 0 # llama.cpp vulkan Q6_K Model: Qwen3-Coder-30B-A3B-Instruct (no_diff) Accuracy: 46.94% (530/1129) Accuracy (aligned): 51.02% (576/1129) Accuracy (stripped): 53.76% (607/1129) Error Rate: 0.00% (0/1129) Average tokens: 904.89 Average duration: 4.37 s Average prob: 37.1% (+- 35.1) Average prob (incorrect): 24.0% (+- 39.1) Average prob (stripped): 53.8% (+- 29.1) Average prob (aligned): 57.3% (+- 27.9) Average prob (correct): 62.6% (+- 26.5) # context: layout: system_message: [ prompt ] user_message: [ training, diff ] Model: Gemini 2.5 Flash (low default userctx) # reasoning_effort: low Accuracy: 42.52% (480/1129) Accuracy (aligned): 52.70% (595/1129) Accuracy (stripped): 55.98% (632/1129) Error Rate: 13.82% (156/1129) Average tokens: 5942.75 Average duration: 4.22 s # if Beam 0 is wrong, Beam 1 is right 10.54% of the time Model: Patchpal AI #1 Accuracy: 10.54% (119/1129) Accuracy (aligned): 21.17% (239/1129) # might be duplicate with other beams Accuracy (stripped): 30.03% (339/1129) # might be duplicate with other beams Error Rate: 0.53% (6/1129) Model: Gemini 2.5 Flash (low default) # reasoning_effort: low Accuracy: 7.97% (90/1129) Accuracy (aligned): 9.57% (108/1129) Accuracy (stripped): 10.27% (116/1129) Error Rate: 85.56% (966/1129) # default layout fails with Gemini thinking mode Average tokens: 3719.80 Average duration: 0.51 s # this is comparable to Patchpal AI #1 Model: Qwen3-Coder-30B-A3B-Instruct (no_diff#1) # perplexity beam #1 Accuracy: 7.71% (87/1129) Accuracy (aligned): 11.87% (134/1129) # might be duplicate with other beams Accuracy (stripped): 16.56% (187/1129) # might be duplicate with other beams Error Rate: 0.18% (2/1129) Average tokens: 910.68 Average duration: 1.17 s # kvcached # if Beam 0 and Beam 1 are wrong, Beam 2 is right 3.37% of the time Model: Patchpal AI #2 Accuracy: 3.37% (38/1129) Accuracy (aligned): 16.21% (183/1129) # might be duplicate with other beams Accuracy (stripped): 23.83% (269/1129) # might be duplicate with other beams Error Rate: 0.44% (5/1129) # this is comparable to Patchpal AI #2 Model: Qwen3-Coder-30B-A3B-Instruct (default#2) # perplexity beam #2 Accuracy: 1.95% (22/1129) Accuracy (aligned): 6.91% (78/1129) # might be duplicate with other beams Accuracy (stripped): 11.87% (134/1129) # might be duplicate with other beams Error Rate: 0.09% (1/1129) Average tokens: 913.69 Average duration: 1.18 s # kvcached
Aggregate accuracy represents the combined performance when multiple models/variants/beams are used in parallel: a conflict is considered successfully resolved if at least one model/variant/beam produces a correct solution.
| Configuration | Accuracy | Accuracy (aligned) | Accuracy (stripped) |
|---|---|---|---|
Qwen3-Coder-30B (default) | 49.69% | 54.21% | 56.78% |
Qwen3-Coder-30B (no_diff) | 46.94% | 51.02% | 53.76% |
Aggregate: Qwen3-Coder-30B (default + no_diff) | 55.80% | 60.50% | 63.33% |
(Perplexity) beams added to Qwen3-Coder-30B | 63.24% | 69.18% | 71.83% |
Claude Sonnet 4.0 (default) | 66.70% | 70.42% | 73.34% |
Qwen3-Coder-30B + Claude Sonnet 4.0 | 75.02% | 78.39% | 80.96% |
Gemini 2.5 Flash (none) | 49.60% | 60.41% | 63.42% |
Gemini 2.5 Pro (low) | 52.44% | 56.95% | 59.70% |
Qwen3-Coder-30B + beams + Claude Sonnet 4.0 + Gemini 2.5 Flash + Gemini 2.5 Pro | 79.98% | 82.82% | 84.68% |
Patchpal AI (Beam 0) | 64.57% | 68.47% | 71.12% |
| Aggregate: Patchpal AI (3 beams) | 78.39% | 81.05% | 82.46% |
| ✅ All models + all variants + all beams | 84.85% | 87.51% | 88.66% |