Micro-SaaS Architecture: AI Carousel with Claude API, Remotion & Sharp

1. The Problem

Creating a quality social media carousel today takes 3-4 hours: content research, copywriting, slide design in Canva or Figma, caption optimization. For a professional publishing 3-5 times a week, this represents a significant bottleneck.

Existing solutions fall into two categories: template builders (Canva, Carousel.so) that still require manual content work, and

AI generators that produce generic, visually flat output. None combine editorial content generation, professional design, and rendering into a single automated pipeline.

This project positions itself as an end-to-end solution: from a topic to a publishable carousel in ~30 seconds, with editorial quality comparable to manually created content.

2. System Architecture

2.1 Pipeline Overview

Topic (user input)
    ↓
Claude API → Structured JSON (slides + caption)
    ↓
Gemini Flash → Cover image (photorealistic)
    ↓
SVG Builder → Sharp → PNG per slide
    ↓
Supabase Storage → Public URLs
    ↓
[Optional] Remotion → MP4 video with animations

End-to-end generation pipeline

The entire pipeline is orchestrated by a single API route (POST /api/generate) with a 55-second time budget, leaving 5 seconds of margin on Vercel's 60s limit.

2.2 Technology Stack

Layer	Technology	Rationale
Framework	Next.js 16, React 19	App Router, Server Components, ISR
AI Content	Claude Sonnet (via OpenRouter)	Best quality for structured output in Italian
AI Images	Gemini 2.5 Flash (via OpenRouter)	Fast, cost-effective, good photographic quality
Static Rendering	Sharp + SVG	Serverless-friendly, no headless browser required
Video Rendering	Remotion 4.x	Programmatic video composition in React
Database	Supabase (PostgreSQL + Auth + Storage)	Integrated auth, RLS, storage with CDN
Payments	Stripe	Checkout, webhooks, subscription management

3. Content Generation with Claude

3.1 System Prompt Engineering

The core of output quality lies in the system prompt. After dozens of iterations, the prompt defines:

Rigid structure: 5-7 slides, with mandatory isCover and isLast
Editorial style: conversational tone in Italian, informal "tu" form, 150-250 characters per slide
Formatting: emphasis with **bold** on key concepts
Anti-slop: explicit list of patterns to avoid ("In an increasingly digital world...", emoji chains, generic CTAs)
Optimized caption: 800-1500 characters with hook → context → thread → takeaway → CTA → hashtag structure

The prompt also includes template hints that modify behavior based on the selected template. For infographic templates, Claude generates data structures (chart, stats, progressBars) alongside text:

// Example Claude output for infographic-data template
{
  headline: ["Growth of the", "**AI Market**"],
  chart: {
    type: "bar",
    items: [
      { label: "2023", value: 420 },
      { label: "2024", value: 580 },
      { label: "2025", value: 780 },
      { label: "2026", value: 1050 }
    ]
  },
  takeaway: "**Artificial intelligence** investments exceed one billion in 2026",
  source: "Source: AI Observatory, Polytechnic University of Milan"
}

3.2 URL Handling

When the user includes a URL in the topic, the system:

Extracts URLs with regex
Fetches the HTML content (8s timeout)
Extracts the main text with Cheerio (4000-character limit)
Injects the content as additional context in the Claude prompt

This allows creating carousels from articles, papers, product pages — the system synthesizes source content into editorial slides.

3.3 Resilience

Timeout: 25s per Claude call, with 1 automatic retry on transient errors
Soft validation: invalid infographic fields (stats without values, empty charts) are silently removed instead of failing the generation
Robust JSON parsing: trailing comma removal, control character unescaping, malformed output handling

4. Slide Rendering

4.1 SVG + Sharp Approach

The most important architectural decision was not using a headless browser for static slide rendering. Playwright/Puppeteer require significant resources and are not ideal in serverless environments with memory and timeout constraints.

Instead, rendering follows an SVG-first approach:

Template Builder generates a complete SVG string (text, shapes, layout)
Sharp converts the SVG to PNG at 1080×1350px
For covers, the AI image is composited beneath the SVG overlay

This approach is ~10x faster than a Playwright render and requires no heavy binary dependencies.

4.2 Template System

The system supports 9 templates, each with dedicated SVG builders:

Editorial (default): teal accent, serif font, light background, footer with dot pattern
Tech: automatic category→accent color mapping (AI→green, mobile→blue)
Bold: dark mode, high-contrast typography
Minimal: white, elegant, generous spacing
Medical: clinical tone, specialized terminology
Infographic (3 variants): data, compare, visual — with chart, stats, comparison layout

Each template exposes a uniform interface:

interface TemplateBuilders {
  buildCoverOverlay(ctx: SlideContext): string    // SVG overlay
  buildCoverFallback(ctx: SlideContext): string   // SVG without AI
  buildContentSlide(ctx: SlideContext): string    // SVG content
  buildLastSlide?(ctx: SlideContext): string      // SVG CTA
}

4.3 Full Slide Pipeline

SlideConfig → Template Builder → SVG string
                                      ↓
                          Sharp.svg(buffer).png()
                                      ↓
                              [IF cover] composite with AI bg
                              [IF free] composite watermark
                                      ↓
                              PNG Buffer (1080×1350)
                                      ↓
                          Upload → Supabase Storage → URL

Rendering pipeline for a single slide

4.4 LinkedIn PDF Export

For LinkedIn, PNG slides are upscaled to 1200×1500 (same 4:5 aspect ratio) and packaged into a PDF via pdf-lib. This allows publishing carousels on LinkedIn as "document posts" reusing the same Instagram slides.

5. Video System with Remotion

5.1 Why Video Beyond PNG

Static PNG slides are the primary product. But video formats (Reels, Stories) have significantly higher organic reach on social media. The problem is that a simple slideshow with fades adds no value — it is just "the animated version of static slides."

The challenge was: how to make animations informative, not just decorative?

5.2 Remotion Architecture

Remotion allows composing video using React:

<TransitionSeries>
  {slides.map((slide, i) => (
    <>
      <TransitionSeries.Sequence durationInFrames={210}>
        <SlideFrame slide={slide} />
      </TransitionSeries.Sequence>
      <TransitionSeries.Transition
        presentation={zoomTransition()}
        timing={linearTiming({ durationInFrames: 24 })}
      />
    </>
  ))}
</TransitionSeries>

Each slide lasts 7 seconds (210 frames at 30fps), with 0.8-second zoom transitions.

5.3 Spotlight System: Animations That Inform

The breakthrough was the spotlight system: instead of animating everything uniformly, the system automatically identifies the most relevant element in each infographic and highlights it.

How it works:

computeSpotlightIndex(slide) analyzes the slide data and finds the item with the highest value (the tallest bar, the dominant donut segment, the stat with the most significant number)
Spotlight Phase (frame 87-132): after the chart has fully entered, non-spotlight items halve in opacity (30%), while the spotlight item receives a luminous glow and a slight scale-up (1.05x)
Hold Phase (frame 132+): the spotlight item's numbers "breathe" with a micro sinusoidal oscillation (±2%, 2s period), a subtle effect that maintains attention without distracting

// Spotlight: identify the maximum value
const spotlightIndex = items.reduce(
  (maxI, item, i) => item.value > items[maxI].value ? i : maxI, 0
)

// Dim non-spotlight items
const itemOpacity = inSpotlightPhase
  ? interpolate(frame, [start, start + 12],
      [1, isSpotlight ? 1 : 0.3], { extrapolateRight: 'clamp' })
  : 1

// Breathing during hold
const breathScale = inHoldPhase
  ? 1 + 0.02 * Math.sin((2 * Math.PI * t) / 2)
  : 1

This transforms a static bar chart into a visual narrative: "here is the most important data point — look at it."

5.4 Narrative Transitions (Zoom)

Transitions between slides use a zoom effect that simulates "moving closer" to the content:

The outgoing slide scales from 1.0x to 1.4x (zoom in), fading in the last 30%
The incoming slide appears in fade underneath

This creates a sense of narrative progression — you "enter" the content — instead of a generic fade.

5.5 Micro-Motion

Three micro-animation elements enrich the experience without distracting:

Progress line: a 2px vertical line on the left edge that grows from 0% to 100% height during the slide, giving a sense of temporal progress
Reactive separator: the divider line between headline and chart changes opacity (from 33% to CC hex) and gains a glow during the spotlight phase, synchronizing with the data emphasis
Numeric breathing: the spotlight item numbers oscillate with an imperceptible sinusoid, keeping the eye anchored to the key data point

5.6 Donut Chart: SVG Filter for Glow

An interesting technical problem: CSS box-shadow does not work on SVG <circle> elements. For the glow on the donut chart spotlight, a native SVG filter was necessary:

<filter id="donut-glow">
  <feGaussianBlur stdDeviation={glowStdDev} result="blur" />
  <feFlood floodColor={accentColor} floodOpacity="0.6" result="color" />
  <feComposite in="color" in2="blur" operator="in" result="glow" />
  <feMerge>
    <feMergeNode in="glow" />
    <feMergeNode in="SourceGraphic" />
  </feMerge>
</filter>

5.7 Slide Timeline (7 seconds)

Frame   0 ──── 24: Headline entrance (spring scale-up)
Frame  18 ──── 36: Context text fade-in
Frame  33 ──── 87: Chart/stats entrance (staggered)
Frame  87 ─── 132: SPOTLIGHT PHASE (dim + glow + scale)
Frame  96 ─── 135: Takeaway text (word-by-word)
Frame 132 ─── 210: HOLD PHASE (breathing, read time)
Frame 135 ─── 145: Footer fade-in
Frame 150 ─── 210: Static hold (reading time)

Animation timeline for a single slide (210 frames @ 30fps)

6. Business Model

6.1 Three-Tier Pricing

Feature	Free	Base (€19/mo)	Pro (€49/mo)
Carousels/month	3	30	Unlimited
Watermark	Yes	No	No
AI Cover	Yes	Yes	Yes
Infographic templates	No	Yes	Yes
Customization	No	No	Yes
Video export	No	No	Yes

6.2 Feature Gating

Features are gated at the API route level, not the frontend. Every request is validated against getPlanLimits(planId) before proceeding with generation. This prevents client-side bypasses and ensures enforcement is atomic.

// Example of plan validation
const limits = getPlanLimits(profile.plan)
if (profile.carousels_used_this_month >= limits.maxCarousels) {
  return Response.json({ error: 'Monthly limit reached' }, { status: 429 })
}
if (templateId === 'custom' && !limits.customTemplate) {
  return Response.json({ error: 'Custom template requires Pro plan' }, { status: 403 })
}

7. Database and Security

7.1 Supabase Schema

Two main tables with Row-Level Security (RLS):

profiles: plan, usage counter, brand name — created automatically on signup via handle_new_user() trigger
carousels: topic, status, slide config (JSONB), URLs, caption — RLS ensures each user can only see their own carousels

7.2 Storage

PNG slides and LinkedIn PDFs are uploaded to Supabase Storage with a structured path:

carousel-slides/{userId}/{carouselId}/ig-slide-0.png
carousel-slides/{userId}/{carouselId}/ig-slide-1.png
carousel-slides/{userId}/{carouselId}/linkedin.pdf

The bucket is configured as public-read, authenticated-write — slide URLs are directly servable without authentication.

8. Technical Challenges and Solutions

8.1 The 60-Second Constraint

Vercel imposes a 60-second timeout on serverless API routes. The full pipeline (Claude + Gemini + rendering + upload) must fit within this budget.

Solution: aggressive per-step timeouts (Claude 25s, Gemini 20s, URL fetch 8s), automatic fallbacks (cover without AI if Gemini doesn't respond), and parallelization where possible (LinkedIn PDF built after IG upload).

8.2 Fonts in Serverless Environment

Sharp (via librsvg) requires system fonts to render SVG text. On Vercel, no fonts are installed.

Solution: fonts bundled in the project's /fonts/ directory, with FONTCONFIG_PATH configured in next.config.ts to point to the local directory.

8.3 Stripe Client and Module-Level Init

The Stripe client initializes at module level, which causes errors during the Next.js build when STRIPE_SECRET_KEY is not available.

Solution: lazy proxy pattern — the Stripe client is wrapped in a Proxy that defers initialization until first access:

export const stripe = new Proxy({} as Stripe, {
  get(_, prop) {
    if (!_instance) {
      _instance = new Stripe(process.env.STRIPE_SECRET_KEY!, { ... })
    }
    return _instance[prop]
  }
})

8.4 Donut Chart: Segment Overlap

The initial donut chart animation caused visual overlap between adjacent segments. The issue: strokeDashoffset shifted the visible portion toward the end of the segment, creating collisions.

Solution: replace strokeDashoffset with a dynamic strokeDasharray, where the visible portion grows from the starting point:

// Before (broken): dashoffset shifts the pattern
strokeDashoffset={segLength * (1 - progress)}

// After (fixed): dasharray grows from the start
strokeDasharray={`${segLength * progress} ${circumference - segLength * progress}`}

9. Metrics and Performance

~25-35s

Average generation time

Complete carousel (5-7 slides)

8-15s + 5-12s

Claude + Gemini

Typical AI breakdown

80-200KB

Slide size

PNG 1080×1350px

~2-5MB

Video size

30 seconds (7s × 4-5 slides)

10. Conclusions and Future Development

This project demonstrates that it is possible to build an AI-native product with editorial quality as a single-developer micro-SaaS, leveraging the modern serverless ecosystem and state-of-the-art AI APIs.

The key architectural choices — SVG+Sharp for rendering, Claude for structured content, Remotion for video — balance quality, performance, and operational costs. The video spotlight system exemplifies how animations can be informative rather than purely decorative: not everything is animated, only what matters is emphasized.

Future development:

Element Bridge Transition (C1): transition where the spotlight element "detaches" from the outgoing slide and repositions in the incoming slide
Data Storytelling: animations that build a progressive narrative across slides (e.g., temporal evolution of a chart)
Automated A/B Testing: generation of caption variants and engagement testing
Multi-language: native support for content in English, Spanish, French
Public API: endpoints for third-party integrations (Zapier, Make, n8n)

Appendix: Project Structure

carousel-ai/
├── src/
│   ├── app/api/generate/route.ts    ← Pipeline orchestration
│   ├── lib/
│   │   ├── ai/                      ← Claude API + system prompts
│   │   ├── rendering/               ← SVG builders + Sharp + templates
│   │   ├── stripe/                  ← Payments + plan gating
│   │   └── supabase/                ← Auth + database + storage
│   └── remotion/                    ← Video composition + animations
├── supabase/schema.sql              ← Database + RLS policies
└── fonts/                           ← Bundled fonts for serverless

Project directory structure

Micro-SaaS Architecture for AI-Powered Social Carousel Generation