Add cases symlink, Google Vision extraction and benchmark embedding data, and Paperclip bug report. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
66 lines
2.5 KiB
Markdown
66 lines
2.5 KiB
Markdown
# Bug: Skill import from Gitea — wrong raw URL format causes empty SKILL.md
|
|
|
|
**File at:** https://github.com/paperclipai/paperclip/issues/new
|
|
|
|
## Title
|
|
Skill import from Gitea: wrong raw URL format causes empty SKILL.md
|
|
|
|
## Body
|
|
|
|
### Bug Summary
|
|
|
|
When importing skills from a **Gitea** instance (self-hosted), Paperclip fetches the git tree successfully via the `/api/v3/` endpoint (which Gitea supports), but then uses the **wrong raw file URL format** to download `SKILL.md` content, resulting in a 404 and an almost-empty stub being saved.
|
|
|
|
### Environment
|
|
|
|
- Paperclip server: `@paperclipai/server@2026.403.0`
|
|
- Gitea instance: self-hosted Gitea
|
|
|
|
### Steps to Reproduce
|
|
|
|
1. Host a skill repo on a Gitea instance with a `SKILL.md` (32KB+), `scripts/`, and `references/` directories
|
|
2. Import the skill via URL: `https://my-gitea.example.com/org/skill-name.git`
|
|
3. Observe that only a stub SKILL.md (~283 bytes) is saved, and subdirectories are missing
|
|
|
|
### Root Cause
|
|
|
|
In `server/dist/services/github-fetch.js`, the `resolveRawGitHubUrl()` function builds:
|
|
|
|
```
|
|
https://{hostname}/raw/{owner}/{repo}/{ref}/{file}
|
|
```
|
|
|
|
This format works for **GitHub Enterprise**, but **not for Gitea**. Gitea expects:
|
|
|
|
```
|
|
https://{hostname}/{owner}/{repo}/raw/branch/{ref}/{file}
|
|
```
|
|
|
|
### Proof
|
|
|
|
```bash
|
|
# Paperclip's URL format -> 404
|
|
$ curl -s -o /dev/null -w "%{http_code}" "https://my-gitea.example.com/raw/org/skill-repo/main/SKILL.md"
|
|
404
|
|
|
|
# Correct Gitea format -> 200
|
|
$ curl -s -o /dev/null -w "%{http_code}" "https://my-gitea.example.com/org/skill-repo/raw/branch/main/SKILL.md"
|
|
200
|
|
```
|
|
|
|
### Secondary Issue
|
|
|
|
When `SKILL.md` is at the repository root, `path.posix.dirname("SKILL.md")` returns `"."`, causing the inventory filter `entry.startsWith("./")` to miss all sibling directories (`scripts/`, `references/`). This means even if the raw URL worked, subdirectories would still be excluded from the file inventory.
|
|
|
|
### Suggested Fix
|
|
|
|
1. **Detect Gitea** vs GitHub Enterprise (e.g., check for `/api/v1/` endpoint which is Gitea-specific, vs `/api/v3/`)
|
|
2. **Use the correct raw URL format** per platform:
|
|
- GitHub/GHE: `https://{hostname}/raw/{owner}/{repo}/{ref}/{file}`
|
|
- Gitea: `https://{hostname}/{owner}/{repo}/raw/branch/{ref}/{file}`
|
|
3. **Fix root-level SKILL.md inventory**: when `skillDir === "."`, include all files instead of filtering by `entry.startsWith("./")`
|
|
|
|
### Workaround
|
|
|
|
Manually clone the repo into `~/.paperclip/instances/default/skills/{company_id}/{slug}/` and update the `company_skills` table directly with correct markdown content and file_inventory.
|