Understanding Your Microbiome Data — A Visual Guide to 16S Downstream Analysis

16S microbiome analysis workflow running in RStudio — The full workflow running in RStudio · 13 sections · from QIIME2 artifacts to publication-ready figures

You have run your samples through QIIME2. The denoising is done, the feature table exists, taxonomy has been assigned. You are now looking at hundreds of ASVs across dozens of samples — and the question that always follows is the same: what do I do with this now, and what are these numbers actually telling me about biology?

This is a guide through the core analyses of 16S downstream work — not a step-by-step tutorial, but the conceptual map that makes the steps make sense. Every figure here was generated from the Moving Pictures dataset using the workflow described at the end of this page.

Dataset

All figures come from the Moving Pictures dataset (Caporaso et al. 2011, Genome Biology 12:R50) — 34 samples, two human subjects, four body sites (gut, left palm, right palm, tongue), sampled longitudinally over five months. It is the standard reference for microbiome methods development and validation.

First: look at who is there

Every investigation starts with a survey of the scene. Before any statistical test, the first question is compositional: which microbial phyla dominate, and does the pattern vary across groups? If it does not, everything downstream will be harder to interpret. If it does — as it almost always does across distinct body sites or conditions — you already have your first biological result.

The standard approach aggregates ASVs to phylum level, normalizes each sample to relative abundance so every bar sums to 100%, and displays the result as stacked bars — one column per sample, colored by phylum, arranged by group. It is the most immediate visual evidence you have about community structure.

Phylum Composition Across Groups — stacked bar chart for gut, left palm, right palm, tongue — Phylum composition across body sites · Moving Pictures dataset · Generated with phyloseq + ggplot2 · Firmicutes and Bacteroidetes dominate the gut; Actinobacteria and Proteobacteria are enriched at palm sites; the tongue shows a distinct profile with Fusobacteria absent from other sites.

The signal is immediate: the gut is defined by Firmicutes and Bacteroidetes, a ratio shaped by diet, age, and antibiotic history. The palms carry more Actinobacteria — a phylum that includes skin-adapted genera like Corynebacterium. The tongue occupies its own distinct space, with Fusobacteria visible only there.

"The microbial composition of a body site is not random. It is the product of decades of co-evolution, shaped by pH, oxygen, nutrient gradients, and immune pressure. When you plot a stacked bar chart, you are reading that history — compressed into a column of color."

This composition plot sets the expectation for everything that follows. Groups that look different here should show statistically significant differences in diversity analyses. Groups that look similar here should make you question what you are really comparing.

Alpha diversity: how rich is each community?

Alpha diversity answers a deceptively simple question: how diverse is the microbial community within a single sample? The answer changes depending on what aspect of diversity you are measuring.

🔢

Observed features

Raw count of unique ASVs in the sample. Intuitive, but sensitive to sequencing depth — deeper sequencing finds more rare taxa even if the community is identical.

⚖️

Shannon index

Combines richness (how many taxa) with evenness (how equally distributed). A sample dominated by two species has low Shannon even with 200 taxa present.

🎯

Simpson index

Measures the probability that two randomly chosen individuals belong to different species. More sensitive to dominant taxa, less sensitive to rare ones.

📐

Evenness (Pielou's J)

Pure evenness measure, independent of species count. A community with 100 taxa all at 1% abundance has perfect evenness; one with 99% dominated by a single taxon does not.

Running multiple indices in parallel gives a more complete picture. Shannon and Simpson capture different aspects of the same community: if they tell the same story, your result is robust. If they diverge, it suggests the difference between groups is driven by rare taxa (Shannon is more sensitive to rare taxa than Simpson).

Alpha diversity — composite plot with evenness, Shannon index, and Simpson index by body site — Alpha diversity — composite panel · Evenness (Pielou's J), Shannon index, and Simpson index · Kruskal-Wallis global test with Dunn post-hoc and FDR correction · Each panel shows the same 34 samples under a different diversity lens.

Rarefaction: the prerequisite you cannot skip

Before calculating any diversity metric, sequencing depth must be standardized. If one sample was sequenced to 14,000 reads and another to 1,100, comparing their Observed features is meaningless — you will always find more species when you look harder. Rarefaction solves this by randomly subsampling every sample down to the same depth.

The workflow rarefies to 90% of the minimum sample depth, with set.seed(42) for reproducibility. This is conservative — you retain nearly all samples — and easy to justify in a Methods section. Samples below the threshold are excluded, which is why inspecting read depth before this step is not optional.

The right statistical test for alpha diversity

Alpha diversity distributions are rarely normal. They are bounded, often skewed, and heteroskedastic across groups. The appropriate test is non-parametric: Kruskal-Wallis for the global comparison, Dunn's post-hoc test for pairwise comparisons, and Benjamini-Hochberg FDR correction throughout. The significance stars in the plot above reflect p-values after that correction — not raw p-values, which would dramatically overstate the evidence.

Beta diversity: how different are the communities from each other?

Where alpha diversity asks "how diverse is this sample?", beta diversity asks "how different is this sample from that one?" The answer is a distance matrix — one number for every pair of samples — and the choice of distance metric changes what biological signal you capture.

🔵

Bray-Curtis dissimilarity

Accounts for both presence and abundance. Two communities with the same taxa in very different proportions will be dissimilar. Sensitive to dominant taxa.

⚪

Jaccard distance

Presence/absence only. Two communities are similar if they share the same species, regardless of abundances. Captures community turnover and membership.

🟢

Weighted UniFrac

Incorporates phylogenetic relatedness and abundances. Communities with evolutionarily similar taxa are less dissimilar, even if exact species differ.

🟡

Unweighted UniFrac

Phylogenetic presence/absence — captures whether evolutionary lineages are shared, independent of how abundant they are in each sample.

Using Bray-Curtis and Weighted UniFrac together gives you a fuller picture. Bray-Curtis captures abundance-weighted compositional differences; Weighted UniFrac embeds those differences in an evolutionary context. When both metrics show the same pattern, the result is robust.

NMDS: mapping communities in two dimensions

The full distance matrix — 34 × 34 pairwise distances in this dataset — cannot be shown directly. NMDS (Non-metric Multidimensional Scaling) compresses it into two dimensions while preserving the rank order of distances. Points that cluster together have similar communities; points that are far apart do not. The stress value tells you how faithfully the 2D representation captures the true distances: below 0.10 is good.

Beta Diversity NMDS Bray-Curtis — four body site clusters with 95% confidence ellipses, stress 0.075 — Beta diversity · NMDS ordination · Bray-Curtis dissimilarity · Stress = 0.075 (good representation) · Gut samples (red circles) form a tight, well-separated cluster · Tongue (blue crosses) equally distinct · Left and right palm ellipses overlap, reflecting the similar skin microbiome between hand sites.

The NMDS reveals something biologically meaningful beyond the simple group separation: the gut cluster is tight — gut communities are consistent across samples and time points. The palm sites overlap with each other but not with gut or tongue — left palm and right palm share a similar skin environment. Tongue samples cluster separately from all three, reflecting the unique oral microbiome ecology.

PCoA: a complementary view of the same distances

PCoA (Principal Coordinates Analysis) preserves actual distances, not just their rank order. The axes are interpretable — they represent the directions of maximum variance in the distance matrix — and the percentage of variance explained by each axis is shown directly on the plot. When applied to Bray-Curtis, PCoA and NMDS should tell the same story. If they do not, investigate why.

Beta diversity PCoA — Bray-Curtis dissimilarity with variance explained per axis — Beta diversity · PCoA ordination · Bray-Curtis dissimilarity · Variance explained shown on each axis · The same four-group separation confirmed with a metric that preserves actual distance values.

PERMANOVA: is the separation statistically significant?

An ordination plot shows that groups look separated. PERMANOVA tests whether that separation is real. It asks: if you randomly shuffled group labels 999 times, how often would you see a separation at least as strong as what you actually observed?

The result is an R² (what fraction of total community variance is explained by your grouping variable) and a permutation-based p-value. For the Moving Pictures data, body site explains 57% of the variance in community composition (R² = 0.57, p = 0.001). This is a large and robust effect.

Never skip this validation

PERMANOVA is sensitive to differences in group dispersion, not only differences in group centroids. If one group has much more variable communities than another, PERMANOVA can return a significant result even when centroids are identical. Always run betadisper() alongside PERMANOVA. If the dispersion test is significant (p < 0.05), say so explicitly in your Results and Methods.

The R workflow

All of this, scripted and ready to run on your data.

NMDS, PCoA, PERMANOVA, betadisper, all alpha diversity metrics with Dunn post-hoc — 13 labeled sections, one structured R script. Point it at your QIIME2 output and run section by section. Each section produces its own output before you move on.

Get the workflow script ~~€99~~ €49.50 · instant download

Differential abundance: which taxa drive the difference?

Beta diversity tells you the communities are different. Differential abundance testing names the suspects. It is the transition from the general — "the groups differ" — to the specific: "here is the genus that distinguishes them, and here is the evidence."

For each genus, test whether its relative abundance differs significantly across groups. Two groups: Wilcoxon rank-sum test. Three or more groups: Kruskal-Wallis, followed by Dunn's post-hoc for pairwise comparisons. Throughout: Benjamini-Hochberg FDR correction. When you test 80 genera simultaneously, the probability of finding at least one false positive by chance is essentially 1. Correction is not optional — it is the difference between a finding and a noise artifact.

Differential abundance — top genera across body sites, Kruskal-Wallis FDR corrected — Top differentially abundant genera · Kruskal-Wallis with FDR correction (Benjamini-Hochberg) · Post-hoc Dunn's test for pairwise comparisons · Significance stars indicate adjusted p-values · Each panel shows the relative abundance distribution of one genus across all body sites.

The compositionality problem

Microbiome data is compositional. You are measuring relative proportions, not absolute abundances. If Bacteroides increases in relative abundance, every other genus mathematically decreases — even if absolute cell counts are unchanged. Standard non-parametric tests were not designed for this constraint and can generate spurious associations when compositional effects are strong.

Wilcoxon/Kruskal-Wallis on relative abundances is standard practice in published microbiome research and appropriate for exploratory analysis. For studies where compositionality is the central concern, ALDEx2 or ANCOM-BC provide more rigorous alternatives. Use these results as hypothesis-generating — be explicit about this in your Methods section.

Integrative correlations: how does the microbiome connect to your metadata?

The final layer of analysis asks whether specific taxa correlate with continuous metadata variables — clinical measurements, days since treatment, age, BMI, or any numeric variable in your sample data. This is where microbiome composition meets the biology you actually care about.

The approach: CLR (Centered Log-Ratio) transformation of genus-level counts, then Spearman rank correlation between each taxon and each metadata variable. Spearman is the right choice — it makes no distributional assumptions, handles the zero-inflated nature of microbiome data, and is robust to outliers. Global FDR correction across all taxon × variable pairs controls the false discovery rate for the entire correlation matrix simultaneously.

Taxa-metadata correlation heatmap — Spearman with CLR transformation and global FDR correction — Taxa–metadata correlation heatmap · Spearman ρ · CLR-transformed genus-level data · Top 30 most abundant genera · Global FDR correction applied across all comparisons · Only cells with FDR-corrected p < 0.05 are shown · *, **, *** indicate significance thresholds.

Blue cells: as the metadata variable increases, genus abundance decreases. Red cells: positive correlation. Intensity reflects the strength of association. Blank cells (grey background) did not survive FDR correction — they are excluded rather than shown as weak or ambiguous signals. This is a deliberate design choice: a correlation heatmap that shows everything is a map of noise.

Identifying consensus key taxa

After differential abundance testing and correlation analysis, the workflow synthesizes results into a consensus view: which genera appear in both analyses, and which shows the strongest combined signal? This is not a formal statistical step — it is a guided look at which taxa are most robustly associated with your study variables across multiple lines of evidence.

Consensus strongest correlations of key taxa — bar chart showing Spearman rho for top genera — Consensus view · Strongest Spearman ρ for key taxa identified across differential abundance and correlation analyses · Colour encodes direction and strength of association · This plot synthesizes multiple analyses into a single ranked view of the most robustly associated genera.

Interpretation

Correlation is not causation — in microbiome data even less than elsewhere. A significant Spearman ρ between a genus and a clinical variable reflects a statistical association that may be driven by a third factor (diet, medication, host genetics) rather than a direct biological relationship. Use these findings as hypotheses to test, not conclusions to report.

From concept to code

Every analysis in this guide — composition, alpha diversity, beta diversity with PERMANOVA, differential abundance with FDR, correlation heatmap, and the consensus synthesis — is implemented in a single structured R script organized into 13 labeled sections.

You open the script in RStudio. You set three variables in the configuration block — your folder path, your sample ID column, your grouping variable. You run one section at a time, check the output, and move on. The script handles everything else: package installation, data import from QIIME2 or CSV, filtering, rarefaction, all analyses, all figures, and two text files with a complete Methods section and software citations ready for journal submission.

# The only section you edit before running MY_WORKING_DIR <- "path/to/your/project" MY_SAMPLE_ID <- "sample-id" MY_GROUP_VAR <- "Treatment" # or Disease, Body_site, Timepoint… # QIIME2 .qza artifacts — or plain CSV tables if you have exported your data

The workflow script

Stop rebuilding the pipeline.
Start with one that works.

Every analysis in this guide, scripted and commented. Download, run on the validation dataset, verify the output, then point it at your own data.

Get the 16S Workflow Script ~~€99~~ €49.50 · instant download · individual license

Methods reference

McMurdie & Holmes (2013) phyloseq, PLoS ONE 8(4):e61217 · Oksanen et al. (2024) vegan · Kassambara (2023) rstatix · Barnett et al. (2021) microViz, JOSS 6(63):3201 · Revelle (2024) psych · Wickham et al. (2016) ggplot2 · Gu et al. (2016) ComplexHeatmap, Bioinformatics 32:2847 · Caporaso et al. (2011) Moving Pictures, Genome Biology 12:R50.