Open-sourcing the grid emissions data needed for 24/7 clean energy

Startup Singularity is sharing data on hourly U.S. power-grid emissions. It wants energy markets, governments and companies like Google and Microsoft to use it — and to disclose their own data too.

high-voltage transmission lines in front of the cooling tower at a coal-fired power plant
(Bernd Wüstneck/Picture Alliance via Getty Images)
  • Link copied to clipboard

Wenbo Shi, CEO and founder of Singularity Energy, is on a quest to quantify the hour-by-hour carbon emissions from the U.S. electric grid. One way to do that, he says, is to open-source the data and methodologies that can help accurately pin down those emissions — a step that could ultimately contribute to reaching the lofty goal of 24/7 carbon-free electricity.

Today, the ground-truth” data on how much carbon is being emitted from power plants on an hourly basis is incredibly hard to come by, Shi said. What’s more, the methods involved in using that data to make accurate claims about emissions reductions are largely opaque and proprietary, held behind private paywalls. 

Subscribe to receive Canary's latest news

An open set of sources and methodologies for this kind of data could be vital for companies such as Google and Microsoft that have pledged to supply themselves entirely with 24/7 clean energy by 2030. It’s valuable for energy-trading mechanisms that are shifting to hourly, rather than annual, systems for clean-energy accounting. And it’s important for governments seeking publicly available and transparent data on which to base mandates aimed at pushing their electricity sectors’ carbon emissions toward zero as rapidly and cost-effectively as possible. 

That’s why Shi and his team of data scientists at Singularity have decided to disclose their data and methods to the public, through what they’re calling the Open Grid Emissions initiative. Its inaugural offering, unveiled on Tuesday, is a data set that Shi described as the most precise one currently available measuring hourly U.S. electric grid generation and emissions, all derived using open-source, well-documented, and validated methodologies.” 

The open-source aspect is critical. While many researchers and companies are providing hour-by-hour breakdowns of grid carbon emissions, few have opened up how they do it to public scrutiny and collaboration, Shi said in an interview. 

If you want to use market-based accounting and want to pursue 24/7 [clean energy], then you need to ensure there’s a reliable, transparent… data infrastructure to support that,” he said. Otherwise, it will become another cherry-picking game for corporations [with] everyone claiming they’re greener than the others. Who is going to claim they’re worse?” 

Researchers have struggled to get a good handle on grid carbon emissions using the data that’s been available to date, said Shi, a former Harvard postdoctoral researcher. Singularity, founded in early 2019, has spent the past three years working with researchers, grid operators, utilities and companies on what he calls the most traceable and scientific approaches to help them understand” the data.

This is a broader data initiative that we’re hoping to invite others to work with us on to solve the problem,” he said. 

The long, hard path to unveiling hour-by-hour U.S. grid emissions data

The Open Grid Emissions initiative’s first data set is the result of years of work by researchers at the Catalyst Cooperative, an energy data collection and analysis nonprofit, and Greg Miller, a University of California doctoral candidate who will soon take the role of Singularity’s incoming research and policy lead. 

The project, which earned an award last year from the U.S. Environmental Protection Agency, was designed to solve some of the problems with today’s hourly measurements of U.S. grid emissions, Miller explained in an interview. He and Catalyst researchers looked to two sets of data to do that: continuous emission monitoring system (CEMS) data collected by EPA from power plants across the country, as well as monthly power plant data reported to the U.S. Energy Information Administration. 

CEMS data feeds into the EPA’s Emissions & Generation Resources Integrated Database, which is at the heart of many of the U.S. carbon emissions calculations available today. Lots of researchers use that data, as do companies that are publishing hour-by-hour estimates of the emissions-intensity of different parts of the U.S. power grid, Miller said. 

But there’s a big problem with that EPA data set, Miller said: It only provides annual averages of power plant emissions. Annual averages are a flawed source for calculating hourly emissions since hourly emissions can vary greatly depending on how power plants are being operated, how hot or cold it is when they’re running, and other factors that affect the relative efficiency of how they’re burning fuel to generate power. 

And because most of the existing sources of hourly emissions estimates based on that data aren’t open to public review, we had no way of validating how good those hourly estimates were,” he said. 

Miller and Catalyst turned to EPA’s CEMS data to get hourly breakdowns. But that data has its own shortcomings, said Zane Selvans, co-founder and chief data wrangler” at Catalyst Cooperative. For one, while it includes the biggest fossil-fueled generators in the country, it doesn’t cover all power plants, he said. 

What’s more, the CEMS data doesn’t unveil the individual generation units that are often sited at a single power plant complex, he said. That mix can include coal-fueled steam boilers, gas-fired combustion turbines and gas-fired combined-cycle turbines, all of which have very different emissions profiles depending on how they’re being run at any moment. 

Where we’ve done way more work, much more messy work, is linking the CEMS data to other EIA data that’s reported at the level of individual boilers and generators,” Selvans said. The EIA is the only source of truth” for this granular data, known as Form EIA-860 data, but it comes in cumbersome spreadsheet formats, and parsing those is annoying. But if you want to know the fuel content, the source, what the heat content of that fuel was, you have to go back to that original spreadsheet data.” 

To add even more complexity, this EIA data comes in monthly formats, Miller said. Using it to generate hourly emissions estimates requires making some assumptions about how individual units inside a generation facility can be expected to operate from hour to hour. 

These are the kinds of assumptions that data analysts have to make quite often, Shi said, but before now, those assumptions haven’t always been made visible to the public. In the case of the Open Grid Emissions initiative, those assumptions and the methods used to turn that monthly data into hourly profiles are available for inspection and critique by anyone with the technical chops to delve into the source material. 

Last week, the Midcontinent Independent System Operator (MISO), the Midwestern grid operator whose territory covers parts of 15 states from the Gulf of Mexico to Canada’s Manitoba province, unveiled the first publicly available application of this data set. 

It comes in the form of a map that allows users to click through to check the historical hourly generation and carbon emissions of large swaths of the U.S. power grid. Users can also click on individual power plants to discover their nameplate generating capacity and corresponding greenhouse gas emissions.

A map of hourly U.S. grid and power plant carbon emissions enabled by Singularity’s Open Grid Emissions initiative
Screenshot of an interactive map of hourly grid and power plant carbon emissions enabled by Singularity’s Open Grid Emissions initiative (MISO)

The map represents just part of the functionality available from the data set that Singularity has released through the Open Grid Emissions initiative, Shi noted. In the future, MISO wants to provide that kind of data in real time, similar to how they provide the pricing information today,” he said. This is the first step.” 

Why accurate hourly emissions data is important

What’s so important about being able to track power grid emissions accurately on an hourly basis? A study that Miller co-authored and published earlier this year offers one answer. It found that relying on annual averages can lead to underestimating or overestimating the carbon footprint of a consumer’s electricity usage by wide margins, depending on how much carbon-free energy is available on the grids they’re connected to. 

For regions that have higher concentrations of renewable energy already on the grid, the emissions tend to vary a lot more throughout the day,” as more solar and wind power floods onto the grid during sunny or windy times, he said. One notable example is the California grid, where large and growing solar resources are driving big variations between hourly and annual emission factors, as this chart from Singularity indicates.

Chart of hourly versus annual carbon emissions factors for California grid operator CAISO
The carbon emission factors for power supplied by California grid operator CAISO vary greatly from hour to hour (purple area) compared to the annual average (red line). (Singularity)

In these circumstances, using an annual emissions factor can underestimate or overestimate true carbon emissions by as much as 35%, Miller said. 

Countries and power consumers with 100 percent clean energy goals need something better than annual averages to guide how they invest in new carbon-free energy generation that can fill the dirtier gaps in their power supply mix, he said. They also need insight into the hour-by-hour emissions profile of their grid supply to guide when they should use more power, avoid using power or store it for later use. 

Ben Gerber, CEO of the Midwest Renewable Energy Tracking System (M-RETS), a nonprofit that tracks and trades renewable energy certificates, said Singularity is providing a valuable data source” for the kind of work it’s doing with companies including Google, which last year partnered with M-RETS to launch North America’s first effort to track hourly clean energy certificates. 

M-RETS is working with Singularity to expand its own use of open-source data, which should be able to guide those decarbonization strategies” of companies such as Google, and ensure we’re doing things efficiently and cost-effectively,” Gerber said.

That’s one of many ways the Open Grid Emissions initiative’s data could be put to use, Miller said. Another is grid operators using it to visualize data about their operations.” 

That’s in keeping with MISO’s plan, laid out in its 2021 MISO Forward report, to begin offering time and location emissions data in a similar way to how MISO publishes actual energy price data.” That kind of visibility could become more valuable as MISO and other U.S. grid operators begin to consider how they might integrate carbon pricing into the operation of their markets.

Miller also cited the value of data for environmental justice and policy research, to pinpoint where emissions are happening.” Christina Gosnell, president and co-founder of Catalyst Cooperative, highlighted the potential for more granular data to inform the work of environmental advocates who are challenging utility plans to build more fossil-gas plants or keep existing plants running, for example. 

You used to be able to know about a whole collection of generators,” Gosnell said. Now you know what individual generators are doing — and those generators have wild variations” in terms of the financial costs to utility ratepayers, air pollution that affects nearby communities and carbon emissions. 

From measuring carbon emissions to paying for them 

There’s an important difference between the raw data that the Open Grid Emissions initiative is making available and the ways in which it can be put to use, Shi emphasized. Once you have the emissions data from every site — measured data, ground-truth data — then there’s another layer, which is the modeling layer.”

The variety of modeling methodologies used to convert raw data into sought-after information introduces new variables, he said. One example is market-based carbon accounting of the kind that groups like the EnergyTag initiative and the United Nations 24/7 Carbon-Free Energy Compact are working on. 

Today, corporations sign power-purchase agreements (PPAs) with clean energy project developers and track how much power those projects generate via renewable energy certificates (RECs) in North America or guarantees of origin in the European Union. Shifting those certificates of corporate clean-energy purchases from annual to hourly measurements is the next step in clearing what Killian Daly, EnergyTag general manager, described as the massive data fog” that’s obscuring the optimal ways for companies and governments to make their investments. 

It’s very clear that the tens of billions of dollars going into PPAs and RECs today are not being invested as wisely as they should be because they’re not looking at the temporal value of the energy they’re supporting,” he said. Unless they’re changed, energy markets will not supply clean energy every hour, everywhere.” 

You cannot do the math unless you have hourly emissions data,” Daly said. The Open Grid Emissions initiative is, for what I believe is the first time in the U.S., publishing an open data set to say, This is where you are, this is the average mix, this is the average emissions, of that time and place, of my consumed electricity.’”

The specific data on location is an important differentiation,” he said. They’re taking into account grid topology and estimated power flows.” While companies may purchase energy from power producers, they consume the electricity on the grids they’re connected to — and those grids are constantly in the process of importing and exporting power across operational and regional borders. 

If you want to use consumption-based accounting, then you need to understand the power flows in the region,” Shi said. Singularity’s Carbonara platform — its commercial subscription product — combines its emissions data with information from the EIA’s Hourly Electric Grid Monitor, which tracks the flow of power between the country’s myriad grid operators and balancing authorities. That allows it to yield maps that measure the emissions mix of in-region and imported energy for different grids across the hours of the day, as with this snapshot from August 2020.

Map of generated versus consumed electricity carbon emissions factors for major U.S. grid regions
Electricity imports and exports can yield significant differences between generated power and consumed power across U.S. grid regions. (Singularity)

The analysis of how power flows influence emissions ratings across the grid could get even more granular, Shi said. One example is Singularity’s work with New England utility Eversource, which is looking at ways to overlay this carbon layer on top of the power flow, so you can start generating consumption-based emissions for not only subregions but even down to the nodal level — or if you want, even down to the substation,” he said. 

Eventually, this kind of consumption-based emissions tracking will need to be incorporated into the Greenhouse Gas Protocol’s Scope 2 guidance, the global standard for how companies track and report the emissions from their purchased energy. It could also play a role in corporate climate disclosures being explored by the U.S. Securities and Exchange Commission, or in the building emissions reporting requirements being set by cities, like New York City’s Local Law 97, Miller said. 

More finely grained emissions tracking data will also be vital for establishing baselines for marginal emissions” accounting, Shi said. This term describes a variety of methods by which energy investors and consumers measure the emission-reduction impact of decisions about where to site new clean energy, or what hours of the day they choose to use more or less electricity, compared to a world in which they didn’t make those decisions. 

These kinds of marginal impact calculations are something of a rabbit hole” for data scientists since they are built on hypothetical presumptions about what would have happened in alternate scenarios, Shi said. But as Miller noted, that just makes transparency and trust around data sources” even more important. 

We’re not trying to claim the data we’re presenting is perfect,” Miller said. This is a really complex issue. Part of this initiative isn’t just the data set but making this open-source repository of all the code and methods that went into it, and a discussion of all the open questions that we’ve already identified that aren’t perfect that need to be fixed but are identified for future research.” 

Shi agreed that Singularity’s data scientists don’t have all the answers. But this data set should enable us to have those conversations. We want to empower people who have the same problems.”

Jeff St. John is director of news and special projects at Canary Media.