diff --git a/.opencode/agents/paper-image-extractor.md b/.opencode/agents/paper-image-extractor.md index e781c7d..fa3362f 100644 --- a/.opencode/agents/paper-image-extractor.md +++ b/.opencode/agents/paper-image-extractor.md @@ -32,12 +32,29 @@ matches = re.findall(pattern, paper_content) # Returns: [(alt_text, image_path), ...] ``` -### Step 2: Analyze Each Image +### Step 2: Read and Analyze Each Image + +**CRITICAL**: You MUST use the `read` tool on each image file to visually analyze it. For each image found: -1. Read the image file -2. Analyze with vision capabilities -3. Generate corresponding Python plotting code +1. **Use the `read` tool on the image file path** - This returns the image for visual analysis +2. Analyze what you **SEE** in the image (not what the paper text says about it) +3. Extract precise data points, colors, line styles, axis ranges from the actual image +4. Generate corresponding Python plotting code based on your visual analysis + +**Example workflow:** +``` +# First, use read tool on the image +read(filePath="path/to/figure1.png") + +# Then analyze what you SEE: +# - How many curves/bars/elements? +# - What are the axis labels and ranges? +# - What are the approximate data values at key points? +# - What colors and line styles are used? +``` + +**DO NOT** rely solely on text descriptions in the paper. The paper text may be incomplete or ambiguous. Your understanding must come from **SEEING** the actual images. ### Step 3: Generate Outputs @@ -45,6 +62,39 @@ Create two outputs in `analysis/` directory: 1. `image_understanding.md` - Brief descriptions 2. `reference_plots.py` - Self-contained plotting script +### Step 4: Verify Your Understanding + +After generating `reference_plots.py`: +1. Run the script: `python analysis/reference_plots.py` +2. Open and compare your generated images with the originals +3. If they don't match (wrong chart type, missing curves, wrong trends), **re-read the original images** and fix your code +4. Repeat until your reproductions capture the essential structure + +## Extracting Data from Images + +When you read an image file with the `read` tool, you see it visually. Extract data by: + +### For Line Plots +- Count the number of curves and identify each by color/style +- Estimate Y values at regular X intervals (e.g., every 10 units) +- Note the axis ranges and labels +- Use `scipy.interpolate.PchipInterpolator` for smooth curves from sparse points + +### For Bar Charts +- Read the exact bar heights from the Y-axis +- Note category labels on X-axis +- Count number of groups and bars per group + +### For Architecture Diagrams +- List all components/blocks +- Note the connections and data flow direction +- Extract any dimension annotations (e.g., "B×T×D") + +### For Scatter Plots +- Estimate cluster centers and spread +- Note any trend lines or boundaries +- Identify different marker types/colors + ## Required Outputs ### 1. image_understanding.md @@ -163,33 +213,40 @@ if __name__ == "__main__": ## Guidelines for Plot Generation +**Key Principle**: Extract data from what you SEE in the image, not from paper text. + ### For Training Curves -- Extract approximate data points from the image -- Use numpy to generate smooth curves matching the trend +- Read the image first, count the curves, identify colors +- Extract approximate data points at regular intervals from the image +- Use `scipy.interpolate.PchipInterpolator` for smooth interpolation - Include axis labels matching the paper ### For Architecture Diagrams - Create simplified block diagrams showing data flow -- Label input/output shapes +- Label input/output shapes as seen in the figure - Show key components (attention, FFN, etc.) ### For Bar Charts / Tables -- Extract the numerical values +- Extract the numerical values by reading from the axis in the image - Recreate using matplotlib bar plots +- Match the grouping and colors ### For Scatter Plots / Comparisons -- Approximate the data distribution +- Estimate data point positions from the image - Maintain relative positions and trends +- Match marker styles and colors ## Important Notes -1. **Minimal prompting**: When analyzing images, let the multimodal model understand naturally. Avoid over-specifying what to look for. +1. **READ THE IMAGES**: Use the `read` tool on every image file. Do not skip this step. Your analysis quality depends on actually seeing the images. -2. **Approximate is OK**: The goal is to verify understanding, not pixel-perfect reproduction. Trends and key values matter more than exact matches. +2. **Visual over textual**: If the paper text says "Figure 3 shows X" but you see Y in the image, trust what you SEE. -3. **Self-contained script**: The reference_plots.py must run without external dependencies beyond numpy/matplotlib. +3. **Approximate is OK**: The goal is to verify understanding, not pixel-perfect reproduction. Trends and key values matter more than exact matches. -4. **Data source labels**: Always note in comments that values are "extracted from paper figure" - this flags them as reference only, not ground truth. +4. **Self-contained script**: The reference_plots.py must run without external dependencies beyond numpy/matplotlib/scipy. + +5. **Data source labels**: Always note in comments that values are "extracted from paper figure" - this flags them as reference only, not ground truth. ## Quality Checklist