Alt Text Generation

This is my methodology for creating alt text, encoded as a ladder—expert knowledge turned into a tool anyone can use. After years as an accessibility expert, I’ve mapped the line of questioning I use when writing alt text into prompts that extract author intention from page structure—the implicit human variables that were previously locked behind expert judgment.

How this is different: Traditional alt text generators analyze only the image pixels. This pattern analyzes the entire page context first, then the image. It uses the same questioning process I use: What’s the page purpose? Why is this image here? What would someone miss without it? Which of the metaphorical “1000 words” an image contains should actually be in the alt text?

The same photo needs completely different descriptions on a product page (focus on features), news article (focus on context), or portfolio (focus on technique). By extracting context from DOM structure, headings, and surrounding text, this pattern identifies which description serves the author’s intent.

How Expert Questioning Becomes Automated Analysis

When I analyze an image for alt text, I ask four strategic questions:

“What’s the purpose of this page?” → Reveals communication context
“Why is this image positioned here?” → Indicates functional role
“What would someone miss without this image?” → Identifies essential information
“How much detail serves the author’s intent?” → Determines description depth

These questions extract implicit variables from explicit page structure:

// Programmatically available signals
const pageTitle = document.title;
const headings = document.querySelectorAll('h1, h2, h3');
const surroundingText = getSiblingContent(image);

// LLM interprets these signals through expert questioning
// DOM heading + adjacent chart → Business performance intent
// Hero image + minimal text → Primary visual communication
// Small inline image + detailed text → Check for redundancy

The breakthrough: Author intent and page purpose aren’t hidden—they’re embedded in programmatically accessible content. DOM structure, headings, surrounding text, and metadata all contain the contextual signals needed for meaningful alt text. By extracting these signals and feeding them to an LLM with the right questioning framework, we can generate alt text that serves the author’s actual communication intent.

The Five-Step Methodology

The following is a process I’ve designed for LLMs to follow, based on my accessibility expertise. Each step extracts specific programmatic signals and transforms them into meaningful descriptions that can be used to assist in writing contextual alt-text:

Checklist Overview

Extract page context → Decode author intent from structure
Analyze surrounding content → Determine functional placement
Classify image type → Apply systematic decision criteria
Generate alt text → Create description serving author’s intent
Validate output → Confirm accuracy and screen reader UX

Step 1: Extract Context to Decode Intention

Analyze page structure, headings, and metadata to understand why this image was chosen. The same photo needs different descriptions in different contexts—product page vs. news article vs. portfolio.

Step 2: Analyze Surrounding Content for Functional Role

Examine immediate text context, visual prominence, and placement to determine what information gap the image fills.

Step 3: Classify Function Using Decision Criteria

Apply systematic classification:

Decorative: No unique information → Empty alt text
Functional: Buttons, links, controls → Alt text describes the action (“Submit form”, “Download PDF”)
Text Image: Image contains text → Alt text includes the text content
Simple Informative: Essential info that can be conveyed concisely
Complex Informative: Data/relationships → Alt text summary + structured alternative (table, list)

Step 4: Generate Optimized Alt Text

Create functional descriptions that serve as true text alternatives:

Keep it concise: Aim for brevity—screen reader users can’t skim or navigate within alt text like regular text. Most descriptions work well under ~150-250 characters.
Lead with purpose: Convey function and meaning, not visual appearance
Serve author’s intent: What would someone miss without this image?
Avoid redundancy: Never include “image of”, “picture of”, or “graphic of”—screen readers already announce the image role

Run through these checks before finalizing:

Classification match: Does the alt text format match the image type? (Empty for decorative, action for functional, etc.)
No redundancy: Does it repeat information already in adjacent text or captions?
No hallucination: Does it only describe what’s actually visible or supported by page context?
No filler phrases: No “image of”, “picture showing”, “graphic depicting”
Serves the page purpose: Would a screen reader user get the same takeaway as a sighted user?
Appropriate length: Concise enough to not overwhelm, detailed enough to not omit essentials

For complex visuals, confirm the structured alternative (table/list) is provided alongside the summary alt text.

Good Alt Text vs Bad Alt Text

Poor: “Image of a graph with blue and red bars showing different heights representing data points across time periods with labels and a legend”

Good: “Quarterly sales up 40%, mobile revenue leading growth”

The difference: Lead with meaning, not appearance. Every word should earn its place.

The Prompts

I’ve encoded my methodology into two formats:

Option 1: Comprehensive Prompt (for Claude, ChatGPT, Gemini)

Use this when you have a powerful model that can handle complex multi-step reasoning.

You’ll notice that much of the prompt is wrapped in XML tags. Language models speak the conventions of programming languages just as well as natural language. The XML tags give us a clear grammar for establishing consistent symbols for semi-structured data.

Role: Accessibility expert specializing in converting visual images into accessible textual formats compliant with WCAG standards.

Checklist:
1. Extract page purpose, author intent, intended audience, and domain
2. Analyze surrounding content and image display context
3. Classify image
4. Generate output in strict XML-like structure
5. Include error handling if information is insufficient

<inputs>
  <page_context>
    <!-- Extracted page metadata, title, headings, purpose -->
    {{PAGE_CONTEXT}}
  </page_context>

  <surrounding_content>
    <!-- Text immediately before/after the image -->
    {{SURROUNDING_CONTENT}}
  </surrounding_content>

  <raw_image>
    <!-- Attached: image file showing the image in isolation -->
  </raw_image>

  <contextual_image>
    <!-- Attached: screenshot showing image within page layout -->
  </contextual_image>
</inputs>

Instructions:
- Begin with the checklist above for each image.
- Analyze the provided image and its full page context to generate accurate alt text.
- Follow the specified multi-step analysis procedure to ensure contextually appropriate image classification and description:

1. Extract main purpose, communication goal, intended audience, and domain from <page_context>.

2. Evaluate <surrounding_content> and <contextual_image> for the image's role, visual prominence, and textual associations.

3. Classify the image as DECORATIVE, SIMPLE_INFORMATIVE, or COMPLEX_INFORMATIVE using the provided explicit criteria:
   - **DECORATIVE**: Purely aesthetic or redundant with text, no information lost if removed
   - **SIMPLE_INFORMATIVE**: Conveys specific, essential information in ≤250 characters
   - **COMPLEX_INFORMATIVE**: Contains data, relationships, or processes requiring structured alternative

4. Generate alt text and rationale according to classification:
   - For **DECORATIVE**: alt_text = ""
   - For **SIMPLE_INFORMATIVE**: Alt description ≤250 characters
   - For **COMPLEX_INFORMATIVE**: Concise summary plus "Full data table follows." and structured alternative (markdown table, list, or detailed breakdown)
   - For **insufficient context**: Output error in all required fields, classification = "UNDETERMINED"

Output Policy:
- Always use the required structured XML-like output format below.
- Never generate <structured_alternative> for images classified as DECORATIVE or SIMPLE_INFORMATIVE.
- For ambiguous or incomplete information, supply error messages in designated fields and set classification to 'UNDETERMINED.'

After generating the output, validate that each required output field is present, corresponds with the image classification, and that no <structured_alternative> is included except for COMPLEX_INFORMATIVE classifications. If validation fails, self-correct and return a revised output.

Output Format:
<output>
  <classification>DECORATIVE | SIMPLE_INFORMATIVE | COMPLEX_INFORMATIVE | UNDETERMINED</classification>
  <author_intent>Why this image appears in this location / error message if unknown</author_intent>
  <alt_text>Concise and contextually appropriate description / error message</alt_text>
  <rationale>Justification for your classification and alt text / error message</rationale>
  [<structured_alternative>Markdown table, list, or detailed breakdown when image is COMPLEX_INFORMATIVE only</structured_alternative>]
</output>

Option 2: Prompt Chain for Small/Local Models

Small and local models often cannot hold complex state across multiple steps. Instead of one long prompt trying to guide them through everything, break it into 5 focused prompts that chain together. Each prompt does ONE thing well, then passes its output to the next. You review and can correct at each natural decision point.

Parallelization Note: Prompts 1 and 2 can run simultaneously since they analyze different inputs (page context vs. image). Their outputs then feed into Prompt 3. This can save time on final determination.

Prompt 1: Extract Page Context

ROLE: Context analyst specializing in understanding page purpose and author intent as it relates to an image.

INPUTS:
- PAGE_TITLE: The browser tab title (from <title> tag)
- KEY_HEADINGS: The h1, h2, h3 headers that structure the page
- PAGE_URL: The web address showing domain and path
- TEXT_NEAR_IMAGE: Paragraphs immediately before/after the image location

INPUT:
<page_data>
  <title>{{PAGE_TITLE}}</title>
  <headings>{{KEY_HEADINGS}}</headings>
  <url>{{PAGE_URL}}</url>
  <surrounding_text>{{TEXT_NEAR_IMAGE}}</surrounding_text>
</page_data>

TASK: Extract the page's essential context to understand its purpose and how it may relate to the image we're critically analyzing.

OUTPUT exactly this structure:
<context_analysis>
  <purpose>{{WHY_THIS_PAGE_EXISTS}}</purpose>
  <audience>{{WHO_THIS_IS_FOR}}</audience>
  <image_placement_reason>{{WHY_AN_IMAGE_IS_HERE}}</image_placement_reason>
</context_analysis>

Prompt 2: Analyze Visual Content

ROLE: Visual analyst specializing in systematic image description.

DEFINITIONS:
- MAIN_SUBJECTS: The primary objects, people, or elements visible
- TEXT_IN_IMAGE: Actual words/labels that appear within the image itself
- DATA_PRESENT: Whether the image shows charts, graphs, or data visualizations
- VISUAL_COMPLEXITY: Simple (few elements) or Complex (many elements/relationships)

[Attach image]

TASK: Describe what you see factually, without interpretation.

OUTPUT exactly this structure:
<visual_analysis>
  <main_subjects>{{WHAT_IS_IN_THE_IMAGE}}</main_subjects>
  <text_in_image>{{ANY_TEXT_VISIBLE}}</text_in_image>
  <data_present>{{yes|no}}</data_present>
  <visual_complexity>{{simple|complex}}</visual_complexity>
</visual_analysis>

Prompt 3: Classify Image Function

ROLE: Accessibility expert determining image classification.

DEFINITIONS:
- DECORATIVE: Image adds no information beyond what text already provides
- SIMPLE_INFORMATIVE: Image conveys essential info that fits in 250 characters
- COMPLEX_INFORMATIVE: Image contains data/relationships requiring detailed description

INPUTS from previous steps:
<context>{{PROMPT_1_OUTPUT}}</context>
<visual>{{PROMPT_2_OUTPUT}}</visual>

DECISION TREE:
1. Would removing this image lose information?
   NO + text explains it = DECORATIVE
   YES → Continue

2. Can essential info fit in 250 characters?
   YES = SIMPLE_INFORMATIVE
   NO = COMPLEX_INFORMATIVE

OUTPUT exactly:
<classification>
  <type>{{DECORATIVE|SIMPLE_INFORMATIVE|COMPLEX_INFORMATIVE}}</type>
  <reasoning>{{WHY_THIS_CLASSIFICATION}}</reasoning>
</classification>

Prompt 4: Generate Alt Text

ROLE: Alt text writer creating screen reader-optimized descriptions.

INPUTS:
<context>{{PROMPT_1_OUTPUT}}</context>
<visual>{{PROMPT_2_OUTPUT}}</visual>
<classification>{{PROMPT_3_OUTPUT}}</classification>

RULES:
- DECORATIVE → alt=""
- SIMPLE_INFORMATIVE → Description ≤250 characters, lead with meaning not appearance
- COMPLEX_INFORMATIVE → Brief summary + "Full data table follows"

OUTPUT:
<alt_text>
  <text>{{YOUR_ALT_TEXT}}</text>
  <character_count>{{NUMBER}}</character_count>
</alt_text>

[If COMPLEX_INFORMATIVE, also output:]
<structured_alternative>
{{TABLE_OR_LIST}}
</structured_alternative>

Prompt 5: Validate and Finalize

ROLE: Quality validator ensuring accessibility standards.

INPUTS:
<context>{{PROMPT_1_OUTPUT}}</context>
<classification>{{PROMPT_3_OUTPUT}}</classification>
<alt_text>{{PROMPT_4_OUTPUT}}</alt_text>

VALIDATE:
1. Does classification match the alt text format?
2. Is character count appropriate?
3. Does it serve the page's purpose?
4. Would a screen reader user understand the same thing?

OUTPUT:
<final_output>
  <classification>{{TYPE}}</classification>
  <alt_text>{{FINAL_TEXT}}</alt_text>
  <validation_status>{{passed|needs_revision}}</validation_status>
  [<revision_notes>{{WHAT_TO_FIX}}</revision_notes>]
</final_output>

Which Prompt Should You Use?

Option 1 (Comprehensive): Best for top of the line models like the higher end Claude, ChatGPT, and Gemini models when accuracy matters most. One prompt handles everything.

Option 2 (Prompt Chain): Best for smaller and local models, or when you want human review at each decision point. Five focused prompts that each do one thing well.

Supporting Tools

JavaScript for Context Extraction

This JavaScript extracts the contextual information needed for the prompt:

function extractPageContext() {
    // Helper function to trim text and normalize whitespace
    const trimText = (text) => {
      if (!text) return '';
      // Replaces multiple whitespace characters (including newlines) with a single space
      return text.trim().replace(/\s+/g, ' '); 
    };
  
    // Get page title
    const pageTitle = document.title;
    
    // --- HEADING EXTRACTION WITH DE-DUPLICATION ---
    const allHeadings = Array.from(document.querySelectorAll('h1, h2, h3, h4, h5, h6'));
    const uniqueHeadings = [];
    const seenHeadings = new Set();
  
    allHeadings.forEach(heading => {
      // 1. Check if the element is visible on the page
      // (offsetParent is null for hidden elements)
      if (heading.offsetParent === null) {
        return;
      }
      
      const text = trimText(heading.textContent);
      
      // 2. Skip if the heading is blank
      if (text.length === 0) {
        return;
      }
      
      const level = parseInt(heading.tagName.charAt(1));
      const key = `${level}-${text}`; // Create a unique key from level and text content
      
      // 3. Add the heading only if it hasn't been seen before
      if (!seenHeadings.has(key)) {
        uniqueHeadings.push({ level, text });
        seenHeadings.add(key);
      }
    });
    // --- END HEADING EXTRACTION ---
  
    // Use the clean, unique list of headings
    const headings = uniqueHeadings;
  
    // Get meta description
    const metaDescription = trimText(document.querySelector('meta[name="description"]')?.content);
    
    // Function to get content from semantic elements or ARIA role equivalents
    const getSemanticContent = (selector, role) => {
      let element = document.querySelector(selector) || document.querySelector(`[role="${role}"]`);
      if (!element || element.offsetParent === null) {
          element = document.querySelector(`[role="${role}"]`);
          if (!element || element.offsetParent === null) return '';
      }
      
      const links = Array.from(element.querySelectorAll('a'))
        .map(a => trimText(a.textContent))
        .filter(text => text.length > 0 && text.length < 50)
        .filter((text, i, arr) => arr.indexOf(text) === i)
        .slice(0, 5);
        
      if (links.length > 0) {
        return links.join(' • ');
      }
      
      return trimText(element.textContent).substring(0, 200) + (element.textContent.length > 200 ? '...' : '');
    };
    
    // Get current URL
    const currentUrl = window.location.href;
    
    // Get keywords
    const keywords = document.querySelector('meta[name="keywords"]')?.content || '';
    
    // Get Open Graph data
    const ogTitle = document.querySelector('meta[property="og:title"]')?.content || '';
    const ogDescription = trimText(document.querySelector('meta[property="og:description"]')?.content);
    const ogType = document.querySelector('meta[property="og:type"]')?.content || '';
    
    // Try to detect page type
    const detectPageType = () => {
      if (document.querySelector('article, [role="article"]')) return 'Article';
      if (ogType.includes('article')) return 'Article';
      if (ogType.includes('video')) return 'Video';
      return 'General';
    };
    
    // Get landmark regions
    const landmarks = {
      header: getSemanticContent('header', 'banner'),
      nav: getSemanticContent('nav', 'navigation'),
      main: getSemanticContent('main', 'main'),
      aside: getSemanticContent('aside', 'complementary'),
      footer: getSemanticContent('footer', 'contentinfo')
    };
    
    // --- MARKDOWN OUTPUT GENERATION ---
    let markdownOutput = `# ${pageTitle}\n\n`;
    markdownOutput += `**URL:** ${currentUrl}\n`;
    markdownOutput += `**Page Type:** ${detectPageType()}\n\n`;
    
    if (metaDescription) {
      markdownOutput += `**Description:** ${metaDescription}\n\n`;
    }
    
    if (keywords) {
      markdownOutput += `**Keywords:** ${keywords}\n\n`;
    }
    
    if (ogTitle || ogDescription || ogType) {
      markdownOutput += `## Open Graph Data\n\n`;
      if (ogTitle && ogTitle !== pageTitle) markdownOutput += `**OG Title:** ${ogTitle}\n`;
      if (ogDescription && ogDescription !== metaDescription) markdownOutput += `**OG Description:** ${ogDescription}\n`;
      if (ogType) markdownOutput += `**OG Type:** ${ogType}\n`;
      markdownOutput += `\n`;
    }
    
    if (headings.length > 0) {
      markdownOutput += `## Page Structure\n\n`;
      headings.forEach((heading) => {
        const indent = '  '.repeat(heading.level - 1);
        markdownOutput += `${indent}- ${heading.text}\n`;
      });
    }
    
    const hasLandmarks = Object.values(landmarks).some(content => content && content.length > 0);
    if (hasLandmarks) {
      markdownOutput += `\n## Page Landmarks\n\n`;
      for (const [name, content] of Object.entries(landmarks)) {
        if (content) {
          markdownOutput += `**${name.charAt(0).toUpperCase() + name.slice(1)}:** ${content}\n\n`;
        }
      }
    }
    
    console.log(markdownOutput);
    return markdownOutput;
  }
  
  extractPageContext();

How to Use This Pattern

Extract page context using the JavaScript in Supporting Tools
Capture screenshots of both the raw image and image-in-context
Choose your prompt based on your model’s capabilities (see guide above)
Run the prompt with the extracted context and images
Receive alt text optimized for the author’s intent and screen reader UX

Why This Is a Ladder

This pattern makes expert-level alt text accessible to anyone with access to an LLM.

The methodology doesn’t require learning WCAG guidelines or understanding screen reader behavior. Each step handles one decision: extract context, analyze visuals, classify function, generate description, validate output. The expertise is encoded in the process, not assumed in the user.

Capabilities

How Expert Questioning Becomes Automated Analysis

The Five-Step Methodology

Checklist Overview

Step 1: Extract Context to Decode Intention

Step 2: Analyze Surrounding Content for Functional Role

Step 3: Classify Function Using Decision Criteria

Step 4: Generate Optimized Alt Text

Good Alt Text vs Bad Alt Text

The Prompts

Option 1: Comprehensive Prompt (for Claude, ChatGPT, Gemini)

Option 2: Prompt Chain for Small/Local Models

Which Prompt Should You Use?

Supporting Tools

JavaScript for Context Extraction

How to Use This Pattern

Why This Is a Ladder

Ready to build this capability?

Capabilities

How Expert Questioning Becomes Automated Analysis

The Five-Step Methodology

Checklist Overview

Step 1: Extract Context to Decode Intention

Step 2: Analyze Surrounding Content for Functional Role

Step 3: Classify Function Using Decision Criteria

Step 4: Generate Optimized Alt Text

Step 5: Validate for Screen Reader UX

Good Alt Text vs Bad Alt Text

The Prompts

Option 1: Comprehensive Prompt (for Claude, ChatGPT, Gemini)

Option 2: Prompt Chain for Small/Local Models

Which Prompt Should You Use?

Supporting Tools

JavaScript for Context Extraction

How to Use This Pattern

Why This Is a Ladder

Ready to build this capability?