Files
legal-ai/paperclip-bug-report.md
Chaim b409f1c7eb Add case data, benchmark embeddings, and bug report
Add cases symlink, Google Vision extraction and benchmark
embedding data, and Paperclip bug report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:20:40 +00:00

66 lines
2.5 KiB
Markdown

# Bug: Skill import from Gitea — wrong raw URL format causes empty SKILL.md
**File at:** https://github.com/paperclipai/paperclip/issues/new
## Title
Skill import from Gitea: wrong raw URL format causes empty SKILL.md
## Body
### Bug Summary
When importing skills from a **Gitea** instance (self-hosted), Paperclip fetches the git tree successfully via the `/api/v3/` endpoint (which Gitea supports), but then uses the **wrong raw file URL format** to download `SKILL.md` content, resulting in a 404 and an almost-empty stub being saved.
### Environment
- Paperclip server: `@paperclipai/server@2026.403.0`
- Gitea instance: self-hosted Gitea
### Steps to Reproduce
1. Host a skill repo on a Gitea instance with a `SKILL.md` (32KB+), `scripts/`, and `references/` directories
2. Import the skill via URL: `https://my-gitea.example.com/org/skill-name.git`
3. Observe that only a stub SKILL.md (~283 bytes) is saved, and subdirectories are missing
### Root Cause
In `server/dist/services/github-fetch.js`, the `resolveRawGitHubUrl()` function builds:
```
https://{hostname}/raw/{owner}/{repo}/{ref}/{file}
```
This format works for **GitHub Enterprise**, but **not for Gitea**. Gitea expects:
```
https://{hostname}/{owner}/{repo}/raw/branch/{ref}/{file}
```
### Proof
```bash
# Paperclip's URL format -> 404
$ curl -s -o /dev/null -w "%{http_code}" "https://my-gitea.example.com/raw/org/skill-repo/main/SKILL.md"
404
# Correct Gitea format -> 200
$ curl -s -o /dev/null -w "%{http_code}" "https://my-gitea.example.com/org/skill-repo/raw/branch/main/SKILL.md"
200
```
### Secondary Issue
When `SKILL.md` is at the repository root, `path.posix.dirname("SKILL.md")` returns `"."`, causing the inventory filter `entry.startsWith("./")` to miss all sibling directories (`scripts/`, `references/`). This means even if the raw URL worked, subdirectories would still be excluded from the file inventory.
### Suggested Fix
1. **Detect Gitea** vs GitHub Enterprise (e.g., check for `/api/v1/` endpoint which is Gitea-specific, vs `/api/v3/`)
2. **Use the correct raw URL format** per platform:
- GitHub/GHE: `https://{hostname}/raw/{owner}/{repo}/{ref}/{file}`
- Gitea: `https://{hostname}/{owner}/{repo}/raw/branch/{ref}/{file}`
3. **Fix root-level SKILL.md inventory**: when `skillDir === "."`, include all files instead of filtering by `entry.startsWith("./")`
### Workaround
Manually clone the repo into `~/.paperclip/instances/default/skills/{company_id}/{slug}/` and update the `company_skills` table directly with correct markdown content and file_inventory.