Bring the local project into the remote repository and reduce generated image object suffixes to six characters for shorter URLs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2.1 KiB
2.1 KiB
| name | description |
|---|---|
| pdf-to-markdown-mineru | Convert local PDF files, especially academic papers, into Markdown via MinerU and rewrite extracted local image references to hosted URLs on an R2-compatible object store. |
PDF to Markdown via MinerU
Use this skill when the user wants a local PDF converted into Markdown and the final Markdown should keep working across machines by replacing extracted local image paths with hosted URLs.
Included files
scripts/convert_pdf_to_markdown.py: standalone CLI for MinerU submission, polling, download, unzip, image upload, and Markdown rewrite.scripts/requirements.txt: minimal Python dependencies for the CLI..env: bundled MinerU and R2 configuration so the skill can run directly in this workspace.
Workflow
- Confirm the source PDF path and choose an output
.mdpath. - Ensure Python dependencies are installed. Prefer
uv pip install -r <skill-dir>/scripts/requirements.txtorpython -m pip install -r <skill-dir>/scripts/requirements.txt. - This skill first loads
.envfrom the skill root, then falls back to the current working directory or an explicit--env-file. - Ensure these environment variables are available before running:
- Required:
MINERU_API_TOKEN,R2_BASE_URL,R2_BEARER_TOKEN - Optional:
R2_PREFIX,R2_PUBLIC_BASE_URL,POLL_INTERVAL_SECONDS,TIMEOUT_SECONDS
- Required:
- Run the converter:
python scripts/convert_pdf_to_markdown.py /path/to/paper.pdf -o /path/to/paper.md
- For scanned PDFs, add
--ocr. Disable extraction features with--disable-tableor--disable-formulaif needed.
Operational notes
- The script requires outbound network access to MinerU and the R2-compatible object store.
- Progress messages are written to stderr. The final Markdown path is written to stdout.
- Only local image references are uploaded and rewritten. Existing
http,https, anddata:image URLs are left unchanged. - If the caller wants Markdown without any image hosting step, this skill is the wrong default; adjust the script first instead of running it as-is.