--- name: pdf-to-markdown-mineru description: Convert local PDF files, especially academic papers, into Markdown via MinerU and rewrite extracted local image references to hosted URLs on an R2-compatible object store. --- # PDF to Markdown via MinerU Use this skill when the user wants a local PDF converted into Markdown and the final Markdown should keep working across machines by replacing extracted local image paths with hosted URLs. ## Included files - `scripts/convert_pdf_to_markdown.py`: standalone CLI for MinerU submission, polling, download, unzip, image upload, and Markdown rewrite. - `scripts/requirements.txt`: minimal Python dependencies for the CLI. - `.env`: bundled MinerU and R2 configuration so the skill can run directly in this workspace. ## Workflow 1. Confirm the source PDF path and choose an output `.md` path. 2. Ensure Python dependencies are installed. Prefer `uv pip install -r /scripts/requirements.txt` or `python -m pip install -r /scripts/requirements.txt`. 3. This skill first loads `.env` from the skill root, then falls back to the current working directory or an explicit `--env-file`. 4. Ensure these environment variables are available before running: - Required: `MINERU_API_TOKEN`, `R2_BASE_URL`, `R2_BEARER_TOKEN` - Optional: `R2_PREFIX`, `R2_PUBLIC_BASE_URL`, `POLL_INTERVAL_SECONDS`, `TIMEOUT_SECONDS` 5. Run the converter: ```bash python scripts/convert_pdf_to_markdown.py /path/to/paper.pdf -o /path/to/paper.md ``` 6. For scanned PDFs, add `--ocr`. Disable extraction features with `--disable-table` or `--disable-formula` if needed. ## Operational notes - The script requires outbound network access to MinerU and the R2-compatible object store. - Progress messages are written to stderr. The final Markdown path is written to stdout. - Only local image references are uploaded and rewritten. Existing `http`, `https`, and `data:` image URLs are left unchanged. - If the caller wants Markdown without any image hosting step, this skill is the wrong default; adjust the script first instead of running it as-is.