Abstract
A micro-SaaS that automates the creation of editorial-quality social media carousels for Instagram and LinkedIn. Multi-AI pipeline within a 60-second serverless budget, SVG+Sharp rendering without headless browsers, and a spotlight system for informative video animations.
This paper is also available in Italian
Leggi in italiano →1. The Problem
Creating a quality social media carousel today takes 3-4 hours: content research, copywriting, slide design in Canva or Figma, caption optimization. For a professional publishing 3-5 times a week, this represents a significant bottleneck.
Existing solutions fall into two categories: template builders (Canva, Carousel.so) that still require manual content work, and
AI generators that produce generic, visually flat output. None combine editorial content generation, professional design, and rendering into a single automated pipeline.
This project positions itself as an end-to-end solution: from a topic to a publishable carousel in ~30 seconds, with editorial quality comparable to manually created content.
2. System Architecture
2.1 Pipeline Overview
Topic (user input)
↓
Claude API → Structured JSON (slides + caption)
↓
Gemini Flash → Cover image (photorealistic)
↓
SVG Builder → Sharp → PNG per slide
↓
Supabase Storage → Public URLs
↓
[Optional] Remotion → MP4 video with animationsThe entire pipeline is orchestrated by a single API route (POST /api/generate) with a 55-second time budget, leaving 5 seconds of margin on Vercel's 60s limit.
2.2 Technology Stack
| Layer | Technology | Rationale |
|---|---|---|
| Framework | Next.js 16, React 19 | App Router, Server Components, ISR |
| AI Content | Claude Sonnet (via OpenRouter) | Best quality for structured output in Italian |
| AI Images | Gemini 2.5 Flash (via OpenRouter) | Fast, cost-effective, good photographic quality |
| Static Rendering | Sharp + SVG | Serverless-friendly, no headless browser required |
| Video Rendering | Remotion 4.x | Programmatic video composition in React |
| Database | Supabase (PostgreSQL + Auth + Storage) | Integrated auth, RLS, storage with CDN |
| Payments | Stripe | Checkout, webhooks, subscription management |
3. Content Generation with Claude
3.1 System Prompt Engineering
The core of output quality lies in the system prompt. After dozens of iterations, the prompt defines:
- Rigid structure: 5-7 slides, with mandatory isCover and isLast
- Editorial style: conversational tone in Italian, informal "tu" form, 150-250 characters per slide
- Formatting: emphasis with **bold** on key concepts
- Anti-slop: explicit list of patterns to avoid ("In an increasingly digital world...", emoji chains, generic CTAs)
- Optimized caption: 800-1500 characters with hook → context → thread → takeaway → CTA → hashtag structure
The prompt also includes template hints that modify behavior based on the selected template. For infographic templates, Claude generates data structures (chart, stats, progressBars) alongside text:
// Example Claude output for infographic-data template
{
headline: ["Growth of the", "**AI Market**"],
chart: {
type: "bar",
items: [
{ label: "2023", value: 420 },
{ label: "2024", value: 580 },
{ label: "2025", value: 780 },
{ label: "2026", value: 1050 }
]
},
takeaway: "**Artificial intelligence** investments exceed one billion in 2026",
source: "Source: AI Observatory, Polytechnic University of Milan"
}3.2 URL Handling
When the user includes a URL in the topic, the system:
- Extracts URLs with regex
- Fetches the HTML content (8s timeout)
- Extracts the main text with Cheerio (4000-character limit)
- Injects the content as additional context in the Claude prompt
This allows creating carousels from articles, papers, product pages — the system synthesizes source content into editorial slides.
3.3 Resilience
- Timeout: 25s per Claude call, with 1 automatic retry on transient errors
- Soft validation: invalid infographic fields (stats without values, empty charts) are silently removed instead of failing the generation
- Robust JSON parsing: trailing comma removal, control character unescaping, malformed output handling
4. Slide Rendering
4.1 SVG + Sharp Approach
The most important architectural decision was not using a headless browser for static slide rendering. Playwright/Puppeteer require significant resources and are not ideal in serverless environments with memory and timeout constraints.
Instead, rendering follows an SVG-first approach:
- Template Builder generates a complete SVG string (text, shapes, layout)
- Sharp converts the SVG to PNG at 1080×1350px
- For covers, the AI image is composited beneath the SVG overlay
This approach is ~10x faster than a Playwright render and requires no heavy binary dependencies.
4.2 Template System
The system supports 9 templates, each with dedicated SVG builders:
- Editorial (default): teal accent, serif font, light background, footer with dot pattern
- Tech: automatic category→accent color mapping (AI→green, mobile→blue)
- Bold: dark mode, high-contrast typography
- Minimal: white, elegant, generous spacing
- Medical: clinical tone, specialized terminology
- Infographic (3 variants): data, compare, visual — with chart, stats, comparison layout
Each template exposes a uniform interface:
interface TemplateBuilders {
buildCoverOverlay(ctx: SlideContext): string // SVG overlay
buildCoverFallback(ctx: SlideContext): string // SVG without AI
buildContentSlide(ctx: SlideContext): string // SVG content
buildLastSlide?(ctx: SlideContext): string // SVG CTA
}4.3 Full Slide Pipeline
SlideConfig → Template Builder → SVG string
↓
Sharp.svg(buffer).png()
↓
[IF cover] composite with AI bg
[IF free] composite watermark
↓
PNG Buffer (1080×1350)
↓
Upload → Supabase Storage → URL4.4 LinkedIn PDF Export
For LinkedIn, PNG slides are upscaled to 1200×1500 (same 4:5 aspect ratio) and packaged into a PDF via pdf-lib. This allows publishing carousels on LinkedIn as "document posts" reusing the same Instagram slides.
5. Video System with Remotion
5.1 Why Video Beyond PNG
Static PNG slides are the primary product. But video formats (Reels, Stories) have significantly higher organic reach on social media. The problem is that a simple slideshow with fades adds no value — it is just "the animated version of static slides."
The challenge was: how to make animations informative, not just decorative?
5.2 Remotion Architecture
Remotion allows composing video using React:
<TransitionSeries>
{slides.map((slide, i) => (
<>
<TransitionSeries.Sequence durationInFrames={210}>
<SlideFrame slide={slide} />
</TransitionSeries.Sequence>
<TransitionSeries.Transition
presentation={zoomTransition()}
timing={linearTiming({ durationInFrames: 24 })}
/>
</>
))}
</TransitionSeries>Each slide lasts 7 seconds (210 frames at 30fps), with 0.8-second zoom transitions.
5.3 Spotlight System: Animations That Inform
The breakthrough was the spotlight system: instead of animating everything uniformly, the system automatically identifies the most relevant element in each infographic and highlights it.
How it works:
- computeSpotlightIndex(slide) analyzes the slide data and finds the item with the highest value (the tallest bar, the dominant donut segment, the stat with the most significant number)
- Spotlight Phase (frame 87-132): after the chart has fully entered, non-spotlight items halve in opacity (30%), while the spotlight item receives a luminous glow and a slight scale-up (1.05x)
- Hold Phase (frame 132+): the spotlight item's numbers "breathe" with a micro sinusoidal oscillation (±2%, 2s period), a subtle effect that maintains attention without distracting
// Spotlight: identify the maximum value
const spotlightIndex = items.reduce(
(maxI, item, i) => item.value > items[maxI].value ? i : maxI, 0
)
// Dim non-spotlight items
const itemOpacity = inSpotlightPhase
? interpolate(frame, [start, start + 12],
[1, isSpotlight ? 1 : 0.3], { extrapolateRight: 'clamp' })
: 1
// Breathing during hold
const breathScale = inHoldPhase
? 1 + 0.02 * Math.sin((2 * Math.PI * t) / 2)
: 1This transforms a static bar chart into a visual narrative: "here is the most important data point — look at it."
5.4 Narrative Transitions (Zoom)
Transitions between slides use a zoom effect that simulates "moving closer" to the content:
- The outgoing slide scales from 1.0x to 1.4x (zoom in), fading in the last 30%
- The incoming slide appears in fade underneath
This creates a sense of narrative progression — you "enter" the content — instead of a generic fade.
5.5 Micro-Motion
Three micro-animation elements enrich the experience without distracting:
- Progress line: a 2px vertical line on the left edge that grows from 0% to 100% height during the slide, giving a sense of temporal progress
- Reactive separator: the divider line between headline and chart changes opacity (from 33% to CC hex) and gains a glow during the spotlight phase, synchronizing with the data emphasis
- Numeric breathing: the spotlight item numbers oscillate with an imperceptible sinusoid, keeping the eye anchored to the key data point
5.6 Donut Chart: SVG Filter for Glow
An interesting technical problem: CSS box-shadow does not work on SVG <circle> elements. For the glow on the donut chart spotlight, a native SVG filter was necessary:
<filter id="donut-glow">
<feGaussianBlur stdDeviation={glowStdDev} result="blur" />
<feFlood floodColor={accentColor} floodOpacity="0.6" result="color" />
<feComposite in="color" in2="blur" operator="in" result="glow" />
<feMerge>
<feMergeNode in="glow" />
<feMergeNode in="SourceGraphic" />
</feMerge>
</filter>5.7 Slide Timeline (7 seconds)
Frame 0 ──── 24: Headline entrance (spring scale-up) Frame 18 ──── 36: Context text fade-in Frame 33 ──── 87: Chart/stats entrance (staggered) Frame 87 ─── 132: SPOTLIGHT PHASE (dim + glow + scale) Frame 96 ─── 135: Takeaway text (word-by-word) Frame 132 ─── 210: HOLD PHASE (breathing, read time) Frame 135 ─── 145: Footer fade-in Frame 150 ─── 210: Static hold (reading time)
6. Business Model
6.1 Three-Tier Pricing
| Feature | Free | Base (€19/mo) | Pro (€49/mo) |
|---|---|---|---|
| Carousels/month | 3 | 30 | Unlimited |
| Watermark | Yes | No | No |
| AI Cover | Yes | Yes | Yes |
| Infographic templates | No | Yes | Yes |
| Customization | No | No | Yes |
| Video export | No | No | Yes |
6.2 Feature Gating
Features are gated at the API route level, not the frontend. Every request is validated against getPlanLimits(planId) before proceeding with generation. This prevents client-side bypasses and ensures enforcement is atomic.
// Example of plan validation
const limits = getPlanLimits(profile.plan)
if (profile.carousels_used_this_month >= limits.maxCarousels) {
return Response.json({ error: 'Monthly limit reached' }, { status: 429 })
}
if (templateId === 'custom' && !limits.customTemplate) {
return Response.json({ error: 'Custom template requires Pro plan' }, { status: 403 })
}7. Database and Security
7.1 Supabase Schema
Two main tables with Row-Level Security (RLS):
- profiles: plan, usage counter, brand name — created automatically on signup via handle_new_user() trigger
- carousels: topic, status, slide config (JSONB), URLs, caption — RLS ensures each user can only see their own carousels
7.2 Storage
PNG slides and LinkedIn PDFs are uploaded to Supabase Storage with a structured path:
carousel-slides/{userId}/{carouselId}/ig-slide-0.png
carousel-slides/{userId}/{carouselId}/ig-slide-1.png
carousel-slides/{userId}/{carouselId}/linkedin.pdfThe bucket is configured as public-read, authenticated-write — slide URLs are directly servable without authentication.
8. Technical Challenges and Solutions
8.1 The 60-Second Constraint
Vercel imposes a 60-second timeout on serverless API routes. The full pipeline (Claude + Gemini + rendering + upload) must fit within this budget.
Solution: aggressive per-step timeouts (Claude 25s, Gemini 20s, URL fetch 8s), automatic fallbacks (cover without AI if Gemini doesn't respond), and parallelization where possible (LinkedIn PDF built after IG upload).
8.2 Fonts in Serverless Environment
Sharp (via librsvg) requires system fonts to render SVG text. On Vercel, no fonts are installed.
Solution: fonts bundled in the project's /fonts/ directory, with FONTCONFIG_PATH configured in next.config.ts to point to the local directory.
8.3 Stripe Client and Module-Level Init
The Stripe client initializes at module level, which causes errors during the Next.js build when STRIPE_SECRET_KEY is not available.
Solution: lazy proxy pattern — the Stripe client is wrapped in a Proxy that defers initialization until first access:
export const stripe = new Proxy({} as Stripe, {
get(_, prop) {
if (!_instance) {
_instance = new Stripe(process.env.STRIPE_SECRET_KEY!, { ... })
}
return _instance[prop]
}
})8.4 Donut Chart: Segment Overlap
The initial donut chart animation caused visual overlap between adjacent segments. The issue: strokeDashoffset shifted the visible portion toward the end of the segment, creating collisions.
Solution: replace strokeDashoffset with a dynamic strokeDasharray, where the visible portion grows from the starting point:
// Before (broken): dashoffset shifts the pattern
strokeDashoffset={segLength * (1 - progress)}
// After (fixed): dasharray grows from the start
strokeDasharray={`${segLength * progress} ${circumference - segLength * progress}`}9. Metrics and Performance
~25-35s
Average generation time
Complete carousel (5-7 slides)
8-15s + 5-12s
Claude + Gemini
Typical AI breakdown
80-200KB
Slide size
PNG 1080×1350px
~2-5MB
Video size
30 seconds (7s × 4-5 slides)
10. Conclusions and Future Development
This project demonstrates that it is possible to build an AI-native product with editorial quality as a single-developer micro-SaaS, leveraging the modern serverless ecosystem and state-of-the-art AI APIs.
The key architectural choices — SVG+Sharp for rendering, Claude for structured content, Remotion for video — balance quality, performance, and operational costs. The video spotlight system exemplifies how animations can be informative rather than purely decorative: not everything is animated, only what matters is emphasized.
Future development:
- Element Bridge Transition (C1): transition where the spotlight element "detaches" from the outgoing slide and repositions in the incoming slide
- Data Storytelling: animations that build a progressive narrative across slides (e.g., temporal evolution of a chart)
- Automated A/B Testing: generation of caption variants and engagement testing
- Multi-language: native support for content in English, Spanish, French
- Public API: endpoints for third-party integrations (Zapier, Make, n8n)
Appendix: Project Structure
carousel-ai/ ├── src/ │ ├── app/api/generate/route.ts ← Pipeline orchestration │ ├── lib/ │ │ ├── ai/ ← Claude API + system prompts │ │ ├── rendering/ ← SVG builders + Sharp + templates │ │ ├── stripe/ ← Payments + plan gating │ │ └── supabase/ ← Auth + database + storage │ └── remotion/ ← Video composition + animations ├── supabase/schema.sql ← Database + RLS policies └── fonts/ ← Bundled fonts for serverless
Want to build something like this?
If you have a technical project requiring advanced AI architectures, let's talk.
