Big picture performance analysis using Lighthouse Parade

At 4/19/2024

There are great tools for doing performance analysis on a single web page. We use Lighthouse and WebPageTest for this all the time. But what if you want to evaluate the performance characteristics of an entire site? It is tedious to manually run a report for each page and then the output is a jumble of individual reports that have to be analyzed one-by-one. We recently created Lighthouse Parade to solve this problem.

Permalink to Lighthouse Parade

Lighthouse Parade

Lighthouse Parade is a Node.js command line tool that crawls a domain and gathers lighthouse performance data for every page. With a single command, the tool will crawl an entire site, run a Lighthouse report for each page, and then output a spreadsheet with the aggregated data.

Under the hood, the tool is performing a combination of three tedious tasks:

discovering all the URLs of a site,
running Lighthouse reports for each, and
aggregating the individual results from each page into a single spreadsheet.

Each row in the generated spreadsheet is a page on the site, and each individual performance metric is a column. This helps with high-level analysis because you can sort the rows by whichever metric you are analyzing.

Permalink to Performance analysis workflow

Performance analysis workflow

Let’s go for a test drive! We’ll use npx, but you could also install the package globally.

Permalink to Scan a site

Scan a site

Lately, I’ve been mesmerized by the Dallas-Fort Worth Freeways website (http://dfwfreeways.com/) because it reminds me of what the internet looked like at the very beginning of my career, so that’s what I’ll use for this demonstration. But you can pick any site you want. If you test with a smaller site, the scan will complete more quickly.

npx lighthouse-parade http://dfwfreeways.com/
Code language: Bash (bash)

The Lighthouse Parade has begun! You should see output in the terminal as the crawler discovers the pages of the site and runs Lighthouse reports on them.

Starting the crawl...
✔ http://dfwfreeways.com/
...
Aggregating reports...
DONE!
Code language: Bash (bash)

A new lighthouse-parade-data/ directory will be created in the current directory. When the scan completes, it’s contents should look like this:

2020-12-10T10_30_15/
    urls.csv
    aggregatedMobileReport.csv
    reports/
Code language: YAML (yaml)

urls.csv is a spreadsheet with a row for each URL found by the crawler.
reports/ is a directory containing each individual Lighthouse report.
aggregatedMobileReport.csv is a spreadsheet containing all the Lighthouse data inside reports/ with each individual page report transposed into a row.

Now what? Let’s open the aggregated spreadsheet and look at some questions we can easily answer with this data. If you are not following along locally, you can also view the data in this public Google Sheet.

Permalink to How well does the site perform, overall?

How well does the site perform, overall?

Each page’s overall Lighthouse performance score is in column C in our spreadsheet. Note that the spreadsheet data contains normalized scores, represented as a decimal between 0 and 1. So 1 is perfect and 0 is the worst. This may not feel intuitive at first, especially if you have experience looking at raw values (Example: 2.3 seconds). But the benefit of this abstraction is that 1 is always the best and 0 is always the worst, no matter which metric you are looking at in the report.

Let’s create a graph that will allow us to visualize all the data at once. A histogram is a chart that shows the frequency of values within a data set. Each bar represents a specific range. The height of the bars show the amount of data that falls in that range. So all we have to do is create a histogram from the data in column C. This is on the Histograms tab of the example Google Sheet. I’m not walking you through the individual steps to create a histogram because Google Sheets has great documentation elsewhere, and I’ve already created a template that will do this for you automagically. (We’ll come back to that.)

The following histogram shows the “shape” of the performance category scores for the entire DFWF site.
Histogram chart showing a cluster of good scores on the right sloping downward to the left
We can see from the cluster of tall bars on the right side that most of the pages perform well. But we also see a few short bars with lower scores to the left. Now that we have a general sense of how well the site performs, let’s dig a little deeper.

Permalink to Which specific pages performed the worst?

Which specific pages performed the worst?

There are many individual attributes that determine how performant a website is. The same process we are about to step through could be applied to study any of the data points provided by Lighthouse.

The histogram showed us that we have some pages that perform poorly, but it didn’t show us which pages were the culprits. We can answer this question with a few clicks in the spreadsheet by re-ordering the rows by that column. The way to do this will differ slightly between spreadsheet applications, but these are the instructions for sorting rows in a Google Sheet.

Now we can see precisely which pages were in the bars to the left side of the histogram.
Spreadsheet with rows ordered by performance score
Depending on your goals, you might focus on these lowest-scoring pages for further analysis. I often add a new column where I paste links to WebPageTest results for individual pages.

For a more comprehensive understanding of how a site is performing, I recommend repeating this procedure for each of the three core web vitals.

Permalink to Spreadsheet template

Spreadsheet template

Lighthouse Parade’s super power is its ability to compile performance data from an entire site into a single spreadsheet that is convenient for analysis. If you are not a spreadsheet ninja wizard, don’t worry. We made a Google spreadsheet template that you can import your data into. It includes several handy features:

Basic formatting
The three core web vitals columns are highlighted
Histograms for each web core vital metrics are automatically generated
Averages and median scores for the web core vitals are also calculated next to each histogram

We hope you find this tool useful. If you try it, please let us know what you think. And if you have questions or thoughts about how to make it better, we welcome your contributions.