fix(agent): require explicit image file reading in paper-image-extractor

The subagent was only reading text descriptions about images instead of actually using the read tool on image files. This caused poor quality reproductions based on guessed data rather than visual analysis. Changes: - Add CRITICAL instruction to use read tool on each image file - Add Step 4: Verification step to compare generated vs original - Add 'Extracting Data from Images' section with specific guidance - Update guidelines to emphasize visual over textual extraction - Allow scipy dependency for interpolation
2026-03-31 20:29:04 +08:00 · 2026-03-31 20:29:04 +08:00 · 3533e15995
commit 3533e15995
parent 5d5aee1f83
1 changed files with 70 additions and 13 deletions
--- a/.opencode/agents/paper-image-extractor.md
+++ b/.opencode/agents/paper-image-extractor.md
@ -32,12 +32,29 @@ matches = re.findall(pattern, paper_content)
 # Returns: [(alt_text, image_path), ...]
 ```

-### Step 2: Analyze Each Image
+### Step 2: Read and Analyze Each Image
+
+**CRITICAL**: You MUST use the `read` tool on each image file to visually analyze it.

 For each image found:
-1. Read the image file
-2. Analyze with vision capabilities
-3. Generate corresponding Python plotting code
+1. **Use the `read` tool on the image file path** - This returns the image for visual analysis
+2. Analyze what you **SEE** in the image (not what the paper text says about it)
+3. Extract precise data points, colors, line styles, axis ranges from the actual image
+4. Generate corresponding Python plotting code based on your visual analysis
+
+**Example workflow:**
+```
+# First, use read tool on the image
+read(filePath="path/to/figure1.png")  
+
+# Then analyze what you SEE:
+# - How many curves/bars/elements?
+# - What are the axis labels and ranges?
+# - What are the approximate data values at key points?
+# - What colors and line styles are used?
+```
+
+**DO NOT** rely solely on text descriptions in the paper. The paper text may be incomplete or ambiguous. Your understanding must come from **SEEING** the actual images.

 ### Step 3: Generate Outputs

@ -45,6 +62,39 @@ Create two outputs in `analysis/` directory:
 1. `image_understanding.md` - Brief descriptions
 2. `reference_plots.py` - Self-contained plotting script

+### Step 4: Verify Your Understanding
+
+After generating `reference_plots.py`:
+1. Run the script: `python analysis/reference_plots.py`
+2. Open and compare your generated images with the originals
+3. If they don't match (wrong chart type, missing curves, wrong trends), **re-read the original images** and fix your code
+4. Repeat until your reproductions capture the essential structure
+
+## Extracting Data from Images
+
+When you read an image file with the `read` tool, you see it visually. Extract data by:
+
+### For Line Plots
+- Count the number of curves and identify each by color/style
+- Estimate Y values at regular X intervals (e.g., every 10 units)
+- Note the axis ranges and labels
+- Use `scipy.interpolate.PchipInterpolator` for smooth curves from sparse points
+
+### For Bar Charts
+- Read the exact bar heights from the Y-axis
+- Note category labels on X-axis
+- Count number of groups and bars per group
+
+### For Architecture Diagrams
+- List all components/blocks
+- Note the connections and data flow direction
+- Extract any dimension annotations (e.g., "B×T×D")
+
+### For Scatter Plots
+- Estimate cluster centers and spread
+- Note any trend lines or boundaries
+- Identify different marker types/colors
+
 ## Required Outputs

 ### 1. image_understanding.md
@ -163,33 +213,40 @@ if __name__ == "__main__":

 ## Guidelines for Plot Generation

+**Key Principle**: Extract data from what you SEE in the image, not from paper text.
+
 ### For Training Curves
- Extract approximate data points from the image
- Use numpy to generate smooth curves matching the trend
+- Read the image first, count the curves, identify colors
+- Extract approximate data points at regular intervals from the image
+- Use `scipy.interpolate.PchipInterpolator` for smooth interpolation
 - Include axis labels matching the paper

 ### For Architecture Diagrams
 - Create simplified block diagrams showing data flow
- Label input/output shapes
+- Label input/output shapes as seen in the figure
 - Show key components (attention, FFN, etc.)

 ### For Bar Charts / Tables
- Extract the numerical values
+- Extract the numerical values by reading from the axis in the image
 - Recreate using matplotlib bar plots
+- Match the grouping and colors

 ### For Scatter Plots / Comparisons
- Approximate the data distribution
+- Estimate data point positions from the image
 - Maintain relative positions and trends
+- Match marker styles and colors

 ## Important Notes

-1. **Minimal prompting**: When analyzing images, let the multimodal model understand naturally. Avoid over-specifying what to look for.
+1. **READ THE IMAGES**: Use the `read` tool on every image file. Do not skip this step. Your analysis quality depends on actually seeing the images.

-2. **Approximate is OK**: The goal is to verify understanding, not pixel-perfect reproduction. Trends and key values matter more than exact matches.
+2. **Visual over textual**: If the paper text says "Figure 3 shows X" but you see Y in the image, trust what you SEE.

-3. **Self-contained script**: The reference_plots.py must run without external dependencies beyond numpy/matplotlib.
+3. **Approximate is OK**: The goal is to verify understanding, not pixel-perfect reproduction. Trends and key values matter more than exact matches.

-4. **Data source labels**: Always note in comments that values are "extracted from paper figure" - this flags them as reference only, not ground truth.
+4. **Self-contained script**: The reference_plots.py must run without external dependencies beyond numpy/matplotlib/scipy.
+
+5. **Data source labels**: Always note in comments that values are "extracted from paper figure" - this flags them as reference only, not ground truth.

 ## Quality Checklist