Add case data, benchmark embeddings, and bug report
Add cases symlink, Google Vision extraction and benchmark embedding data, and Paperclip bug report. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
65
paperclip-bug-report.md
Normal file
65
paperclip-bug-report.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# Bug: Skill import from Gitea — wrong raw URL format causes empty SKILL.md
|
||||
|
||||
**File at:** https://github.com/paperclipai/paperclip/issues/new
|
||||
|
||||
## Title
|
||||
Skill import from Gitea: wrong raw URL format causes empty SKILL.md
|
||||
|
||||
## Body
|
||||
|
||||
### Bug Summary
|
||||
|
||||
When importing skills from a **Gitea** instance (self-hosted), Paperclip fetches the git tree successfully via the `/api/v3/` endpoint (which Gitea supports), but then uses the **wrong raw file URL format** to download `SKILL.md` content, resulting in a 404 and an almost-empty stub being saved.
|
||||
|
||||
### Environment
|
||||
|
||||
- Paperclip server: `@paperclipai/server@2026.403.0`
|
||||
- Gitea instance: self-hosted Gitea
|
||||
|
||||
### Steps to Reproduce
|
||||
|
||||
1. Host a skill repo on a Gitea instance with a `SKILL.md` (32KB+), `scripts/`, and `references/` directories
|
||||
2. Import the skill via URL: `https://my-gitea.example.com/org/skill-name.git`
|
||||
3. Observe that only a stub SKILL.md (~283 bytes) is saved, and subdirectories are missing
|
||||
|
||||
### Root Cause
|
||||
|
||||
In `server/dist/services/github-fetch.js`, the `resolveRawGitHubUrl()` function builds:
|
||||
|
||||
```
|
||||
https://{hostname}/raw/{owner}/{repo}/{ref}/{file}
|
||||
```
|
||||
|
||||
This format works for **GitHub Enterprise**, but **not for Gitea**. Gitea expects:
|
||||
|
||||
```
|
||||
https://{hostname}/{owner}/{repo}/raw/branch/{ref}/{file}
|
||||
```
|
||||
|
||||
### Proof
|
||||
|
||||
```bash
|
||||
# Paperclip's URL format -> 404
|
||||
$ curl -s -o /dev/null -w "%{http_code}" "https://my-gitea.example.com/raw/org/skill-repo/main/SKILL.md"
|
||||
404
|
||||
|
||||
# Correct Gitea format -> 200
|
||||
$ curl -s -o /dev/null -w "%{http_code}" "https://my-gitea.example.com/org/skill-repo/raw/branch/main/SKILL.md"
|
||||
200
|
||||
```
|
||||
|
||||
### Secondary Issue
|
||||
|
||||
When `SKILL.md` is at the repository root, `path.posix.dirname("SKILL.md")` returns `"."`, causing the inventory filter `entry.startsWith("./")` to miss all sibling directories (`scripts/`, `references/`). This means even if the raw URL worked, subdirectories would still be excluded from the file inventory.
|
||||
|
||||
### Suggested Fix
|
||||
|
||||
1. **Detect Gitea** vs GitHub Enterprise (e.g., check for `/api/v1/` endpoint which is Gitea-specific, vs `/api/v3/`)
|
||||
2. **Use the correct raw URL format** per platform:
|
||||
- GitHub/GHE: `https://{hostname}/raw/{owner}/{repo}/{ref}/{file}`
|
||||
- Gitea: `https://{hostname}/{owner}/{repo}/raw/branch/{ref}/{file}`
|
||||
3. **Fix root-level SKILL.md inventory**: when `skillDir === "."`, include all files instead of filtering by `entry.startsWith("./")`
|
||||
|
||||
### Workaround
|
||||
|
||||
Manually clone the repo into `~/.paperclip/instances/default/skills/{company_id}/{slug}/` and update the `company_skills` table directly with correct markdown content and file_inventory.
|
||||
Reference in New Issue
Block a user