Vive Health: Optimizing Hundreds of Amazon Products With One AI Analyst

Last updated on June 11, 2026

Why did this product’s profit drop last month? On Amazon, that simple question has an expensive answer. Sales and traffic live in one report. Ad spend sits in another, across three campaign types. Fees, refunds and reimbursements arrive in settlement data weeks later. Inventory splits between Amazon’s warehouses and the company’s own. Price history, keyword ranks and competitor moves belong to third-party tools.

Vive Health sells medical equipment across the US – wheelchairs, braces, therapy supplies – with hundreds of products on Amazon. Every week, managers answered that question product by product: open tool after tool, export the data, line up the dates, reason toward a cause. Hours per product, quality depending on who did it, and no record of how the conclusion was reached.

They asked us to build a system that does the analysis itself – from raw data to root cause to recommended action.


The Problem

No single dashboard explains a profit change. The daily sales report says units fell. Only the ads data shows a bidding war doubled the cost per click. Only the price tracker shows a competitor undercut the price two weeks earlier. Only the storage report shows a surcharge on aging inventory eating the margin. A correct diagnosis needs all of it at once – and a manager assembling it by hand spends most of the time on collection, not thinking.

The numbers also refuse to sit still. Amazon attributes ad sales for up to two weeks after a click and accepts returns for a month after a sale, so a report pulled Tuesday contradicts the same report pulled Friday. Both are correct for their moment.

A skilled manager works through one product in an afternoon. With hundreds of products, most of the portfolio never gets that afternoon. Problems surface only in the totals – months after the cause.


The Data Foundation: One Lake, Every Metric, Per Product, Per Day

Before any AI, we built the foundation it reasons over. Every night, ingestion jobs pull from every system the business runs on – Amazon’s Selling Partner and Ads APIs, the ERP, profit analytics, price history, keyword intelligence, listing data, the team’s own product tracker – into a single PostgreSQL data lake.

At its core: the complete financial picture of every product at daily grain – sales, advertising, fees, refunds, costs, margin – reconciled once and shared downstream. Around it, daily snapshots of inventory on both sides, storage fees, keyword rank positions, and badge observations: the day a product gained or lost Amazon’s Choice, the day it was flagged as frequently returned.

Amazon
Selling Partner API
sales, fees, inventory, listings
Ads API
spend across three campaign types
Market intelligence
Price & rank history
day-by-day timelines
Keyword intelligence
search volume, ranks, competitors
Listing data
badges, reviews, content
Company systems
ERP
warehouse stock & costs
Profit analytics
daily P&L per product
Product tracker
team notes & screenshots
↓ every night · re-pulled as numbers settle
Product data lake
Financial performance
full daily P&L per product
Inventory & storage
company warehouses and FBA
Market position
ranks, prices, competitors
Listing health
content, badges, availability
↓ stored, reconciled rows – read in seconds
Product analysis
Pre-Launch workflow
Post-Launch CoPilot

The analysis layer reads these stored rows, not live APIs: analyses run in seconds instead of waiting on Amazon’s report queue, every run is reproducible because the data it saw is on disk, and any window – week, month, year – is queryable at daily grain.

Most of the engineering went into making the data trustworthy, not the model: an analyst reasoning over broken inputs produces confident wrong answers, and those kill trust. Four problems, solved source by source:

  • Amazon doesn’t hand its data over. Reports must be requested, then wait in Amazon’s queue from half a minute to half an hour before collection and parsing. The pipeline tracks every report through that journey, so a nightly run that fails halfway resumes where it stopped.
  • The data rewrites itself. A sale attributed to an ad today may be re-attributed tomorrow; a return lands weeks after the order. So ingestion jobs re-pull the same dates repeatedly over the following month, treating yesterday’s numbers as a draft, not a fact.
  • Some numbers Amazon never provides. Brand campaigns report what they spent but not which product the spend belongs to, so the pipeline estimates the split – and carries the “estimated” label into the analysis, so no conclusion leans on it harder than it deserves.
  • The systems disagree. The profit feed and Amazon’s settlement reports differ on the same transactions; the ERP and Amazon’s live stock count differ by hours. Every number has one source designated as truth; the rest are demoted to cross-checks.

The Analysis: Minutes Instead of an Afternoon

When a manager asks about a product now, the system does what they used to do by hand. One job fans out dozens of parallel queries – external APIs and the data lake at once – and merges everything a diagnosis could need into one structured context: the period and year-long comparison windows, unit economics per variation, keyword ranks and history, the closest competitors with prices and review counts, a listing-content audit, live inventory, the badge timeline.

Live from external APIs
Catalog & A+ content
Price & rank history
Keywords & search volume
Closest competitors
ERP stock & velocity
Badges & listing flags
From the data lake
Current-period financials
Comparison periods through the year
Unit economics per variation
Refund rates with denominators
Storage fees & inventory aging
Campaign performance & rank trends
From the team
Activity notes – what changed, when
Screenshots, read as images
↓ dozens of parallel queries merge into one structured context
Claude · extended thinking · the order an experienced operator would ask
1Listing health2Unit volume3Returns4Ad efficiency5Margin
↓ a report built to be acted on
Direct reason
what moved, in plain operational terms
Root cause
a dated chain of events, tied to the team's own changes
Recommended action
one, definitive – not a list of maybes

It also reads the team’s notes. Managers log what they change – a price, a main image, an ad campaign – in their product tracker, often with screenshots, and the system passes those screenshots to Claude as images. It doesn’t just see that units fell this week – it sees the team changed the hero image two days before the drop, and connects the two.

Claude runs with extended thinking and works through the questions in an experienced operator’s order: is the listing healthy, then volume, then returns, then advertising efficiency, then margin. The report comes back built to act on: the direct reason in plain terms, the root cause as a dated chain of events, and one definitive recommended action. The cross-signals earn their keep – the unit drop that traces to a competitor’s price cut and a lost badge in the same week, the margin slide that is really aging inventory crossing a surcharge tier, the “advertising problem” that is actually a suppressed variation.

Context engineering: decomposition first, formatting second

The naive approach is one giant prompt: hand the model every number and hope. It fails predictably – attention spread across everything lands on nothing, an early misreading contaminates everything built on it, and a wrong output can’t be localized to the step that broke.

So the work is decomposed before any prompt is written. Each workflow is a chain of focused LLM steps, each carrying its own prompt, its own narrow context, and its own output contract. In the pre-launch workflow, competitor analysis informs keyword strategy, keyword strategy feeds title generation, and titles, review analysis and Fact Pack feed the image plan and A+ copy – each step seeing only what its decision requires. Even within a single analysis the model follows a fixed diagnostic sequence into a fixed output structure, with the boundary between manager-facing summary and full analysis defined by one constant both the prompt and the parser reference, so the two can’t drift apart. And decomposition keeps the system maintainable: every prompt is a versioned, labeled artifact in Langfuse, tuned and rolled back independently, so a fix to the title generator can’t regress the CoPilot.

The other half is what the data looks like when it reaches the model. Rates arrive with their sample size attached – “2 returns / 50 units (last 7 days)” next to the trailing 12-month figure – because a bare 4% from two returns triggers false alarms, and early versions proved it. Sales velocity comes in two labeled windows, one for stock risk and one for inventory runway. ERP and Amazon stock figures stay separately labeled, so the model reasons about the gap instead of averaging it away. Cost drift is flagged only across comparable periods with enough volume to mean something.

Every run is traced end to end – inputs, reasoning, output. When a manager disputes a conclusion, we replay exactly what the model saw.


Workflows for Every Stage of a Product’s Life

“Why did profit drop” is a mature product’s question. A product that hasn’t launched needs different questions answered; a product three weeks post-launch different ones again. So the platform runs a dedicated workflow per stage: launch planning before go-live, a weekly CoPilot through the launch phase, and the deep performance analysis above once a product is established.

Product lifecycle
Before go-live
Pre-Launch Analysis
one run
Page-1 competitor analysis & realistic monthly units
Launch price, mature price & launch budget
Keyword strategy: longtail now, high-volume later
Listing content grounded in a verified Fact Pack
Fulfillment call & first inbound quantity
Launch phase
Post-Launch CoPilot
weekly
Stage-aware diagnosis: reviews → rank → profitability
Controlled ad ramp on longtail keywords
Exactly one time-boxed test at a time
Tasks, alerts & an audit log of every decision
Stock protection: true daily rate locked during throttling
Established
Performance Analysis
on demand + batches
Full-context root-cause diagnosis
Cross-signals: ads, price, badges, fees, stock
Dated causal chain & one definitive action
Whole portfolio covered on schedule

Pre-Launch. Turns a single seed keyword into a launch plan. It studies page one of Amazon search, filters out unrealistic comparisons – wrong product type, or entrenched players with a thousand-plus reviews – and estimates what a newcomer can sell per month. From margin targets it derives a launch price and a mature price, adjusted for psychological thresholds, plus a launch budget to gain rank. It reverse-engineers the closest competitors’ keyword lists, keeps the keywords several of them rank for, and splits them into longtail targets for launch and high-volume keywords to defer until the product has traction. Then it drafts the listing – two title variants, a nine-image plan, A+ copy, a video script mapped against the top complaints in competitors’ reviews – every claim grounded in a “Fact Pack” of verified specs from the product’s R&D docs and manual; where a spec is missing, the output says so. It closes with a fulfillment recommendation and inbound quantity, and a listing manager reviews before go-live.

Weekly CoPilot. From go-live to maturity, it reviews every launch-phase product weekly, aware of each one’s stage: a listing with no reviews is pushed toward review acquisition, a ramping product toward rank building, a maturing one toward profitability. Each week it assembles the full picture – traffic, reviews, returns with buyer comments, inventory, fees, per-keyword ad performance, rank movement, budget state, the team’s notes, and its own previous output, so it remembers what it said last week – and applies the team’s playbook in strict priority order. A suppressed or unbuyable listing overrides everything. Negative review themes and return reasons are early warnings. Ad spend ramps in small deliberate steps on a handful of longtail keywords; a keyword that burns budget for two weeks without rank progress triggers a pivot. When progress stalls, it proposes exactly one time-boxed test – a price change, a coupon, a creative experiment – never five at once. Output comes in two layers: a readable weekly diagnosis with at most five actions, and structured tasks, alerts and an audit log – validated against typed schemas – recording which rules fired, which inputs were missing, and how confident the run was.

One rule shows how much operational knowledge lives in the playbook. When stock runs low, Amazon throttles a product’s visibility and observed sales collapse – so a reorder based on the throttled rate under-buys and deepens the shortage. The CoPilot locks in the true daily rate from the last normal days before throttling and reports both numbers, with a standing note to purchasing: reorder on the true rate, not the observed one.


How We Worked Together

That throttling rule didn’t come from us. It came from Vive’s managers, and so did the rest: the thousand-review cutoff for realistic competitors, mandatory Subscribe & Save for consumables, variation naming conventions that cut sizing-related returns, the discipline of one test at a time. We mapped how the team’s best people actually decide before writing a single prompt – the workflows encode their judgment, not a generic playbook. The collaboration runs continuously: the team’s activity notes feed every analysis, and every published analysis lands back in the product tracker the team already works in.

That same loop improves the system. When a conclusion looks off, managers log it in a shared feedback sheet. Each entry follows the same path: reproduce the claim against live data, trace it to the layer that caused it – a database query, a context-formatting decision, a prompt rule – fix that layer, and pin the fix with a permanent regression test so it can’t quietly return.

Most “AI mistakes” were context mistakes. Refund-rate alarms came from missing denominators. A velocity figure that “looked very off” came from a 7-day window where the team expected 30 days – the fix was presenting both, labeled. Each fix made the context more truthful, and the analyses better for every product, not just the disputed one. Prompt adjustments ship through Langfuse versions, checked against the traces that prompted them, then promoted – which is why trust grew instead of eroding after the first wrong answer.


The Result

The platform runs in production. Nightly jobs keep the data lake current. Analyses run on demand when a manager asks about a product and in scheduled batches across the portfolio, each tracing from raw daily metrics to a dated root cause and a definitive recommendation. New products move through the same platform from pre-launch planning through weekly CoPilot reviews until they graduate to the standard cadence.

BeforeAfter
Analysis time per productAn afternoonMinutes
Portfolio coverageOnly products in visible troubleEvery product, on schedule
Evidence assemblyManual export and date-alignmentAuto-assembled; the manager judges the conclusion
Reasoning recordNoneEvery run traced and replayable

The review team’s job changed shape. The hours once spent hopping between services and assembling spreadsheets now go into decisions, and coverage stopped being selective – every product gets looked at, on schedule, with the same rigor.

The lesson we keep relearning: the model is the smallest part. The data lake underneath it, the decomposition that keeps every model task focused, the context engineering that makes data hard to misread, and the feedback loop that turns every disagreement into a permanent improvement – that is what turns an impressive demo into an analyst the team trusts.

Something brought you here. Let's figure out if we can help.

Download our AI Launch Plan to see the proven framework from 20+ AI launches, or schedule an intro call to understand what you're building and how we might help.

"What truly stood out was Softcery's deep AI expertise. They were able to take our vision and turn it into a reality, and the final product has exceeded our expectations. Working with Softcery has been a game-changer for our business."

Jeanette Kreft

Jeanette Kreft

Managing Director, The Compliance Company & Upskill AI

"Softcery is not your typical software development agency – they're a full-scale product consultancy. The benefit of working with them is the collaboration."

Ryan Tabb

Ryan Tabb

Founder, Bullseye