Sankey diagram

A sankey diagram shows how a quantity splits, merges, and flows from one set of categories to another. Each flow is drawn as a ribbon, and the width of the ribbon is proportional to the amount flowing. When you see a sankey, your eye immediately follows the thickest ribbons — those are the biggest flows, and they're usually the story.

When to use it

Use a sankey when all three are true:

You're tracking a quantity as it moves or transforms. Energy through a system, budget across departments, users between app screens, water through a watershed. If nothing is "moving," a sankey is the wrong chart.
The total is conserved. What goes into each node equals what comes out. If quantities appear or disappear along the way without accounting, the ribbons will look broken and the chart will mislead. This is the #1 rule — more on it below.
You have two or more stages. A single source splitting into categories is really a bar chart or treemap. Sankeys earn their keep when there are multiple stages of splitting and merging.

Common uses: website click paths, budget allocation, energy flow diagrams, customer lifecycle stages, survey response paths, supply chain flows.

Example data

Sankey data is a list of flows. Each flow has three things: a source, a target, and a value. That's it.

Here's a month of website traffic showing how visitors move from their entry page to their next action:

Source	Target	Visitors
Homepage	Product page	4,200
Homepage	Pricing page	1,800
Homepage	Blog	2,000
Product page	Signup	1,500
Product page	Exit	2,700
Pricing page	Signup	900
Pricing page	Exit	900
Blog	Product page	600
Blog	Exit	1,400
Signup	Paid account	800
Signup	Free account	1,600

Read that table and you can already see the story: 8,000 visitors enter, they split into three pages, some come back through the product page, 2,400 end up signing up, and one-third of those become paid. A sankey draws that same story as ribbons.

The "flows must balance" rule

This is where sankeys go wrong. Everything flowing into a node must equal everything flowing out of it, or the chart lies about where quantities went.

Look at the Product page row in the data above:

In: 4,200 (from Homepage) + 600 (from Blog) = 4,800
Out: 1,500 (to Signup) + 2,700 (to Exit) = 4,200

Those don't match — 600 visitors unaccounted for. In a real dataset this usually means:

Some visitors are still on the page at the end of the month (need a "Still browsing" node).
Some went to a destination you didn't track (need an "Other" node).
Your data has duplicate rows or missing rows (needs cleaning).

The fix is always to add a node to absorb the difference, never to fudge the numbers. Most sankey libraries will silently draw a broken chart if you don't balance flows yourself — the ribbons will look fine but the widths won't add up.

Endpoints (the first and last nodes) are exempt: the Homepage has no "in" flow and Paid account has no "out" flow. That's expected.

A month of site traffic flowing from entry pages through to account creation — ribbon widths are proportional to visitor counts.

How to read it

Sankey diagrams are read in three passes:

Trace the thickest ribbon. Follow it from left to right. That's the dominant path — usually the one that matters most.
Find the biggest split. Which node sends its flow to the most branches? That's a decision point in the process.
Find the biggest drop. Which flow shrinks the most between stages? That's your leak — the equivalent of a funnel chart's worst-converting step.

Colors in a sankey usually encode the source of each ribbon — so you can trace where a particular flow ends up, even after it merges with others. Some diagrams color by target instead. Neither is wrong, but the reader needs to know which. Always include a legend or label the nodes clearly.

Sankey vs. funnel — which to use?

These two charts often describe similar things. The rule:

Funnel is for a single linear path with no branching. Visited → viewed → added to cart → bought. Everything moves in one direction through the same stages.
Sankey is for flows that branch, merge, or loop. Visitors can go from Homepage to three different pages, and the Blog can loop back to the Product page. A funnel can't show that.

If someone asks for a funnel but the data has any branching, they actually want a sankey. If someone asks for a sankey but the data is one linear path, save them the complexity and give them a funnel.

How to build one

In a spreadsheet (Excel or Google Sheets)

Neither Excel nor Google Sheets has a built-in sankey chart. Your options:

SankeyMATIC (sankeymatic.com) is a free web tool. Paste your data as Source [value] Target lines and it generates an SVG you can download. This is what most non-technical users should use.
Flourish (flourish.studio) has a sankey template with a spreadsheet-style editor. Free tier, good output.
Excel add-ins exist (like "think-cell") but require licenses.

The spreadsheet stays useful as the data source — keep your flows in a three-column table (Source, Target, Value) and paste into whichever tool you pick.

With code (for a dashboard or report)

Almost every major charting library has a sankey. The data pattern is universal:

// Nodes and links — what Plotly, D3-sankey, ECharts, and most libraries expect
const data = {
  nodes: [
    { id: "Homepage" },
    { id: "Product page" },
    { id: "Pricing page" },
    { id: "Blog" },
    { id: "Signup" },
    { id: "Exit" },
    { id: "Paid account" },
    { id: "Free account" },
  ],
  links: [
    { source: "Homepage",     target: "Product page", value: 4200 },
    { source: "Homepage",     target: "Pricing page", value: 1800 },
    { source: "Homepage",     target: "Blog",         value: 2000 },
    { source: "Product page", target: "Signup",       value: 1500 },
    { source: "Product page", target: "Exit",         value: 2700 },
    // ... etc.
  ],
};

Some libraries want source and target as numeric indices into the nodes array instead of string IDs. Check the docs — this is the second most common source of bugs after unbalanced flows.

Tips

Limit to 3–5 stages. More than that and the ribbons cross so much that the chart becomes unreadable. If you have 10 stages, split into two sankeys or use a different chart.
Keep node counts manageable. A sankey with 50 nodes is a plate of spaghetti. Aggregate small categories into "Other" before charting.
Order nodes to minimize crossings. Most libraries auto-arrange nodes vertically to reduce ribbon crossings. If yours doesn't, do it manually — it's worth the effort.
Don't use a sankey for time-series data. "Revenue by month flowing into revenue by quarter" is not a sankey — nothing is actually flowing. That's a grouped bar chart.
Label the big flows directly. Putting a number like "1,500 visitors" on the widest ribbon saves the reader from estimating widths. Small ribbons can go unlabeled.
Watch for loops. Sankey libraries handle forward flows well (left-to-right). Flows that go backward (user returns to a previous page) break most libraries. If your data has loops, either use a network graph instead, or duplicate the node (e.g., "Product page (first visit)" and "Product page (return)").