EPA’s ENERGY STAR building benchmarking scores have little validity

I have been spending this week at the American Council for an Energy Efficient Economy’s (ACEEE) Summer Study on Energy Efficiency in Buildings. Yesterday I presented a paper that summarizes my findings from an 18-mos study of the science behind the EPA’s ENERGY STAR building rating systems.

The title of my paper, “ENERGY STAR building benchmarking scores: good idea, bad science,” speaks for itself.  I have replicated the EPA’s models for 10 of their 11 conventional building types: Residence Hall/Dormitory, Medical Office, Office, Retail Store, Supermarket/Grocery, Hotel, K-12 School, House of Worship, Warehouse, and Senior Care.  I have not yet analyzed the Hospital model — but I have no reason to believe the results will be different. (Data for this model were not available at the time I was investigating other models.  I have since obtained these data through a Freedom of Information Act request but have not yet performed the analysis.)

There are many problems with these models that cause the ENERGY STAR scores they produce to be both imprecise (i.e. have large random uncertainty in either direction) and inaccurate (i.e., wrong due to a errors in the analysis).  The bottom line is that, for each of these models, the ENERGY STAR scores they produce are uncertain by about 35 points! That means there is no statistically significant difference between a score of 50 (the presumed mean for the US commercial building stock) and 75 (an ENERGY STAR certifiable building).  It also means that any claims made for energy savings based on these scores are simply unwarranted.  The results are summarized by the abstract of my paper, reproduced below.


The EPA introduced its ENERGY STAR building rating system 15 years ago. In the intervening years it has not defended its methodology in the peer-reviewed literature nor has it granted access to ENERGY STAR data that would allow outsiders to scrutinize its results or claims. Until recently ENERGY STAR benchmarking remained a confidential and voluntary exercise practiced by relatively few.

In the last few years the US Green Building Council has adopted the building ENERGY STAR score for judging energy efficiency in connection with its popular green-building certification programs. Moreover, ten US cities have mandated ENERGY STAR benchmarking for commercial buildings and, in many cases, publicly disclose resulting ENERGY STAR scores. As a result of this new found attention the validity of ENERGY STAR scores and the methodology behind them has elevated relevance.

This paper summarizes the author’s 18-month investigation into the science that underpins ENERGY STAR scores for 10 of the 11 conventional building types. Results are based on information from EPA documents, communications with EPA staff and DOE building scientists, and the author’s extensive regression analysis.

For all models investigated ENERGY STAR scores are found to be uncertain by ±35 points. The oldest models are shown to be built on unreliable data and newer models (revised or introduced since 2007) are shown to contain serious flaws that lead to erroneous results. For one building type the author demonstrates that random numbers produce a building model with statistical significance exceeding those achieved by five of the EPA building models.

In subsequent posts I will elaborate on these various findings.

USGBC Continues to “cherry pick” LEED energy data

At the 2007 GreenBuild Conference the USGBC released the results of their first major study of energy consumption by LEED-certified buildings.  Then they presented conclusions from the now infamous study conducted by the New Buildings Institute (paid for by the USGBC and EPA) which, based on data “volunteered by willing building owners” for only 22% of the eligible buildings certified under LEED NC v.2, concluded that LEED certified buildings, on average, were demonstrating the anticipated 25-30% savings in (site) energy.

NBI’s analysis and conclusions were subsequently discredited in the popular media by Henry Gifford and in the peer-reviewed literature by me [see IEPEC 2008 and Energy & Buildings 2009].  NBI’s analytical errors included:

  1. comparing the median of one energy distribution to the mean of another;
  2. comparing energy used by a medium energy subset of LEED buildings with that used by all US commercial buildings (which included types of buildings removed from the LEED set);
  3. improper calculation of the mean (site) energy intensity for LEED buildings and comparing this with the gross mean energy intensity from CBECS;
  4. NBI looked only at building energy used on site (i.e., site EUI) rather than on- and off-site energy use (i.e., source EUI).

To NBI’s credit they made their summary data available to others for independent analysis with no “strings attached.”  In the end even the data gathered by NBI, skewed towards the “better performing” LEED buildings by the method for gathering data, when properly analyzed demonstrated no source energy savings by LEED buildings.  LEED office buildings demonstrated site energy savings of 15-17% — about half that claimed by NBI, the difference being associated with NBI’s improper averaging method.  This site energy savings did not translate into a source energy savings because LEED buildings, on average,  used relatively more electric energy, and the off-site losses associated with this increased electric use wiped out the on-site energy savings.

The lack of representative building energy data was addressed in LEED v.3 (2009) by instituting a requirement that all LEED certified buildings supply the USGBC with annual energy consumption data for five years following certification.  Never again would the USGBC have to publish conclusions based on data volunteered by 1 in 5 buildings.  Expectations were high.

But what has this produced?  The USGBC has learned from their experience with NBI — not to hand over such an important task to an outside organization because you can’t control the outcome.  NBI’s analysis was scientifically flawed — but it was transparent, and such transparency gave critics ammunition to reach different conclusions.  Nowadays the USGBC simply issues carefully packaged sound bites without supplying any details to support their conclusions.  There isn’t even a pretense of conducting scientifically valid analysis.

Consider the most recent claims made by the USGBC at the 2013 Greenbuild conference, summarized by Tristan Roberts in “LEED buildings above average in latest energy data release.”  Roberts asserts the following:

  1. The USGBC has received energy data from 1,861 certified buildings for the 12-mos period July 2012 – June 2013;
  2. About 70% of these were certified through LEED-EBOM (existing buildings);
  3. 450 of these buildings reported their data through the EPA’s Portfolio Manager;
  4. the “building-weighted” (or un-weighted) average source EUI for these 450 buildings is 158 kBtu/sf;
  5. this average is 31% lower than the national median source EUI;
  6. 404 (of the 450) buildings above were eligible for (and received) ENERGY STAR scores;
  7. the average ENERGY STAR score for these 404 buildings was 85.

In addressing the above claims it is hard to know where to begin.  Let’s start with the fact that the USGBC only provides energy information for 450 (or 24%) of the 1,861 buildings for which it has gathered data.  Is this simply due to the fact that it is easier to summarize data gathered by Portfolio Manager than data collected manually?  If so I willingly volunteer my services to go through the data from all 1,861 buildings so that we can get a full picture of LEED building energy performance — not just a snapshot of 24% of the buildings which “self-select themselves” to benchmark through Portfolio Manager.  (The EPA has previously asserted that buildings that benchmark through Portfolio manager tend to be skewed towards “better performing” buildings and are not a random snapshot of commercial buildings.)

Next, consider the “un-weighted” source EUI figure for the 450 buildings.  This is a useless metric.  All EUI reported by CBECS for sets of buildings are “gross energy intensities” equivalent to the gsf-weighted mean EUI (not the un-weighted or building-weighted mean EUI).  This was a major source of error in the 2008 NBI report — leading NBI to incorrectly calculate a 25-30% site energy savings rather than the actual 15-17% site energy savings achieved by that set of LEED buildings.

Consider the assertion that the 158 kBtu/sf source EUI figure is 31% lower than the median source EUI (presumably for all US commercial buildings).  To be correct this would require the median source EUI for all US commercial buildings be 229 kBtu/sf.  This is rubbish.  The best way to obtain such a median EUI figure is from the 2003 CBECS data.  The Energy Information Administration (EIA) does not report source energy figures in any of its CBECS reports.  But the EIA does report site and primary electric energy used by buildings, and these may be combined to calculate source EUI for all 2003 CBECS sampled buildings.  This results in a median source EUI for the estimated 4.9 million commercial buildings to be 118 kBtu/sf.  If you instead restrict this calculation to all buildings with non-zero energy consumption you find these estimated 4.6 million buildings have a median source EUI of 127 kBtu/sf — way below the 229 kBtu/sf figure asserted by the USGBC.  This USGBC claim is patently false.  Of course the USGBC may be referring to the median source EUI of some unspecified subset of U.S. buildings.  By choosing an arbitrary subset you can justify any claim.  And if you don’t specify the subset — well, the claim is nothing more than noise.

What about the average ENERGY STAR score of 85?  Is this impressive?  The answer is no.  Even if you believed that ENERGY STAR scores were, themselves, meaningful, such an average would still mean nothing.  ENERGY STAR scores are supposed to represent percentile rankings in the U.S. building population.  Since there are 4.8 million buildings, by definition we would expect 10% of these (or 480,000) to rank in the top 10% and we would expect another 480,000 of these to rank in the bottom 10%.  That means that if 1,861 buildings are chosen at random from the building population, we expect 10% of these to have ENERGY STAR scores from 91-100.  Similarly, we expect 30% of these (or 558) to have ENERGY STAR scores ranging from 71-100.  Guess what — the average ENERGY STAR scores of these 558 buildings is expected to be 85.  Only those who are mathematically challenged should be impressed that the USGBC has found 404 buildings in its set of 1,861 that have an average ENERGY STAR score of 85.  If you cherry pick your data you can demonstrate any conclusion you like.

And, of course, these 1,861 buildings are not chosen at random — they represent buildings whose owners have a demonstrated interest in energy efficiency apart from LEED.  I would guess that the vast majority of the 404 buildings were certified under the EBOM program and have used Portfolio Manager to benchmark their buildings long before they ever registered for LEED.  LEED certification is just another trophy to be added to their portfolio.  No doubt their ENERGY STAR scores in previous years were much higher than 50 already.  What was the value added by LEED?

I openly offer my services to analyze the USGBC energy data in an unbiased way to accurately asses the collective site and source energy savings by these LEED buildings.  How about it Brendan Owens (VP of technical development for USGBC) — do you have enough confidence in your data to take the challenge?  Which is more important to you, protecting the LEED brand or scientific truth?

ENERGY STAR energy benchmarking is not ready for prime time

I recently had occasion to read an old paper by Janda and Brodsky describing the “first class” of ENERGY STAR certified office buildings.  This is one of only a handful of papers in the peer-reviewed literature regarding ENERGY STAR building scores.  Janda and Brodsky describe the brand name ENERGY STAR as

a set of voluntary partnerships between the U.S. government and product manufacturers, local utilities, home builders, retailers, and businesses.  These partnerships are designed to encourage energy efficiency in products, appliances, homes, offices, and other buildings.

This was the basis for the EPA’s building ENERGY STAR scoring system.  It was a “game” that building managers voluntarily agreed to play with rules (methodology for scoring buildings) set by the EPA in consultation with those playing the game.  There was no scientific vetting of the “rules of the game” — nor did there need to be — it was just a game designed to “encourage energy efficiency.”  No one was forced to play the game.  Data submitted to Portfolio Manager (the EPA’s web-based tool for calculating scores) and ENERGY STAR scores issued by the EPA were confidential — unless a building sought and received ENERGY STAR certification.  Participation was entirely voluntary.  Building managers disappointed with their ENERGY STAR scores could just walk away from the game — no harm, no foul.

But this has all changed.  In recent years 1) the EPA has published specific claims regarding energy savings associated with its ENERGY STAR benchmarking program (real savings not just fantasy football), 2) external organizations like the USGBC have adopted the ENERGY STAR score as their metric for energy efficiency in green building certification programs and are using these scores to make energy savings claims of their own, and 3) major U.S. cities have passed laws requiring commercial building owners to use Portfolio Manager to benchmark their buildings and, in many cases, the resulting ENERGY STAR scores are being made public.  With federal, state, and local governments requiring LEED certification for public buildings this is no longer a voluntary game — it is mandatory and real (testable) energy claims are being made based upon ENERGY STAR scores.  Now the science behind such claims actually matters — and this science has never been vetted.

Its kinda like a small, “mom and pop” operation that has been selling chicken soup using “grandma’s recipe” without obtaining proper license or FDA approval.  Now imagine Walmart decides to market and sell the soup — the scrutiny changes.

As a voluntary game with no connection to reality it is OK that the EPA negotiates rules for its ENERGY STAR ratings with different constituents — like allowing Washington DC office buildings to ignore their “first floors” in seeking ENERGY STAR certification.  After all, who am I to interfere in the activities between consenting adults when these activities do not affect me?  But for ENERGY STAR — these days are gone.

In the next year we will learn much about the science that underpins the EPA’s ENERGY STAR benchmarking system — and the results are likely to be very disappointing.  This benchmarking system is not ready for prime time.

The EPA doesn’t know the basis for its own ENERGY STAR building model

The US Environmental Protection Agency (EPA) issues ENERGY STAR building ratings for 11 different kinds of commercial buildings.  The so-called Technical Methodology for each of these building ratings is described in documents posted on the EPA web site.  Presumably anyone can work through the details of these technical documents to duplicate the EPA’s methodology.

But this is not the case for one of the models — that for Medical Office buildings.  If you follow the instructions set forth in the EPA’s document for extracting the building records from the 1999 CBECS on which this model is based you do not obtain the list of 82 buildings the EPA claims are the basis for this model.  Instead you obtain a list of 71 buildings.  Furthermore, if you calculate the mean properties of this set of 71 buildings you do not obtain those published by the EPA for this building set.  And finally, if you perform the regression the EPA says it has applied to these buildings you obtain different results than those published by the EPA.  In short, it is clear that the EPA’s Technical Methodology document for Medical Offices does not correctly describe their model.

I have petitioned the EPA through the Freedom of Information Act to supply the list of CBECS 1999 building ID’s that are used this model (EPA-HQ-2013-009270).   The EPA has responded that it does not have this list.  This means that the EPA not only has incorrectly described its own Medical Office model — it does not even know what the basis for this model is!  Its document describing the Technical Methodology for this model is fiction — just like the ENERGY STAR scores the EPA hands out for Medical Office buildings.

EPA Announces Carbon Emission Regulations for new Power Plants

Apparently the EPA has decided to move forward under its own authority to regulate carbon emissions for new power plants — according to an article in yesterday’s New York Times.  The emission standards are said to be 1000 lbs CO2 or new natural gas plants and 1100 lbs CO2 for new coal plants, both per MWh (1000 kWh) of electricity generated.

If you work through the numbers for natural gas that mean that the natural gas plant must be 40% efficient.  This is more efficient than many existing peaking generators in the midwest, but certainly attainable with current technology.  Therefore I think this is a reasonable standard for new natural gas plants.

But new coal plants will not be able to attain this standard through efficiency — they instead will have to capture and store CO2 — and frankly such technology is not demonstrated and available at this level.  Therefore this standard essentially means that no new coal plants will be able to meet the standard for many years to come.

More worrisome is what will be next for existing natural gas and coal plants.  How many years before the EPA extends these regulations to existing plants?

Do Buildings that use Energy Star’s Portfolio Manager save energy?

The EPA regularly puts out press releases claiming the amount of energy that has been saved nationally by its Energy Star program.  In its October 2012 Data Trends publication entitled “Benchmarking and Energy Savings” the EPA writes the following:

Do buildings that consistently benchmark energy performance save energy? The answer is yes, based on the large number of buildings using the U.S. Environmental Protection Agency’s (EPA’s) ENERGY STAR Portfolio Manager to track and manage energy use.

After making this claim the EPA offers the following supporting evidence.

Over 35,000 buildings entered complete energy data in Portfolio Manager and received ENERGY STAR scores for 2008 through 2011, which represents three years of change from a 2008 baseline. These buildings realized savings every year, as measured by average weather-normalized energy use intensity and the ENERGY STAR score, which accounts for business activity. Their average annual savings is 2.4%, with a total savings of 7.0% and score increase of 6 points over the period of analysis.

What does this mean?  Does this mean that every one of the 35,000 buildings in question saw energy savings?  Impossible – over time some buildings saw their energy use go up and others saw it go down.  The statement clearly refers to an average result.  But what is being averaged?  The EPA is referring to the average (weather normalized) source energy intensity (EUI) for these 35,000 buildings — saying that it has decreased by 7% over three years  In addition it points out that the average Energy Star score for these buildings has increased by 6 points over three years.  The graphs below summarize these trends.

Data Trends

So here is the problem.  The average EUI for a set of N buildings has nothing to do with the total energy used by these buildings.  The average EUI could go down while the total energy use goes up and vise versa.  Some buildings see their EUI go up – and these buildings use more energy – and some see their EUI go down – and these buildings use less energy.  But you cannot determine whether more or less energy is used in total without calculating the actual energy saved or lost by each building – and this requires that you know more than the energy intensity (EUI) — you must also factor in each building’s size or gsf.  This set of 35,000 buildings includes buildings that are 5,000 sf in size and others that are 2,000,000 sf in size – a factor of 400 larger.  The EPA calculates mean EUI by treating every building equally.  But each building does not contribute equally to the total energy – bigger buildings use more energy.   (The EPA has employed the methodology used by the New Buildings Institute in their, now discredited, 2008 study of LEED buildings.)

It may be that these 35,000 buildings, in total, save energy.  But we don’t know and the EPA has offered no evidence to show that they do.  Moreover, I have asked the EPA to supply this evidence and they refuse to do so.  It is an easy calculation – but they choose not to share the result.  You can bet they have performed this calculation – why do you suppose they don’t share the result?

Now turn to the increased average Energy Star score.  There is actually no connection whatsoever between the average Energy Star score for a set of buildings and their total energy use.  For a single building, its Energy Star score, combined with its measured EUI and gsf allows you to calculate the energy it saved as compared with its predicted energy use.  Readers might be surprised to learn that a building’s Energy Star score can go up while its energy use rises as well.

But for a collection of buildings no such relationship exists.  If they are all one type of building (for instance, all dormitories) you can combine their individual scores with their individual gsf and their individual EUI to learn something about their total energy – but absent this additional information it is hopeless.  And if the buildings are from more than one building type there is absolutely no meaning to their average Energy Star Score.  Such statistics are intended only to impress the ignorant.

The EPA, therefore, has presented no evidence to support the claim that buildings that are regularly scored in Portfolio Manager collectively save energy.  Instead they have offered meaningless sound bites — claims that sound good but have no scientific relevance.

It is easy to see the problem by considering a simple case — two buildings – one a 100,000 sf office building and the other a 10,000 sf medical office.  Suppose in year 1 the office building has an EUI of 100 kBtu/sf and an Energy Star Score of 60, while in year 2 it has an EUI of 120 kBtu/sf and an Energy Star Score of 58.  Suppose that the medical office building in year 1 has an EUI and Energy Star score of 140 kBtu/sf and 50, respectively, and in the year 2 an EUI of 120 kBtu/sf and an Energy Star score of 60.

In this simple example the “average EUI” for year 1 is 120 kBtu/sf and for year two is 110 kBtu/sf – by the EPA’s standards, an 8% energy savings.  But when you work out the numbers you find their combined energy use in year two actually rose by 14%.  Surely EPA officials understand the difference.

To summarize, the EPA has claimed that the energy consumption of buildings that regularly use portfolio manager has gone down by 2.4% per year but they have offered no evidence to support this – only evidence that the average EUI for these 35,000 buildings – a meaningless figure, has gone down.

The EPA should either withdraw their claim or provide the evidence to back it up.

Energy Star scores for Medical Office Buildings exhibit “grade inflation”

This month I am beginning a series of articles to discuss the science (or lack thereof) behind the US Environmental Protection Agency’s building Energy Star benchmarking score.  Energy benchmarking has become very popular these days with eight or more major US cities having passed ordinances requiring commercial buildings to benchmark their energy data.  The EPA’s Energy Star Portfolio Manager is being used by all these cities for this effort.  In addition, both the US Green Building Council and Green Globes have adopted the building Energy Star score as the metric for energy efficiency success in their green building certification programs.

What is Benchmarking?

Benchmarking is a process by which you compare the energy used by your building with that used by other buildings in order to learn how you stand relative to “the pack.”  The energy used by your building is easily quantified by simply recording monthly energy purchases, combining data for twelve consecutive months to determine your annual energy consumption.  Anyone interested in lowering operating costs or improving the operation of a specific building might decide to to track their own annual energy consumption, comparing annual usage for successive years.  Simply comparing annual energy use for successive years of the same building can guide a building manager in making equipment and operational changes intended to improve energy efficiency.

But it is also useful to know how your energy use compares with energy used by other, similar buildings.  This is really what benchmarking is all about.  If you learn that your building uses much more energy than most other similar buildings – that would suggest there are some changes you can make to significantly lower your own energy consumption (and cost).  If, on the other hand, your building uses much less energy than most other buildings – then it probably does not make sense to invest a lot of time and energy in making further energy efficient improvements to your building.

the Commercial Building Energy Consumption Survey

So how do you find out how much energy other buildings use?  The basic tool for this is the Commercial Building Energy Consumption Survey (CBECS) usually conducted every 3-4 years by the Energy Information Administration (EIA).  The US commercial building stock consists of about 5 million buildings with 70 billion sf of floor space.  CBECS is designed to gather data from a small fraction of these buildings (about 6,000) specifically chosen to accurately represent the entire building stock.  In addition to recording size and annual energy purchases for these buildings the survey gathers numerous other pieces of information to characterize these buildings and how they are used.  Strict confidentiality is maintained for the 6,000 or so sampled buildings.  Nevertheless, sufficient data are gathered to perform queries on the data to learn average properties for various kinds of buildings broken down by climate region, function, size, age, and use.  The last CBECS to be performed was in 2003 and data for the next survey (2012) are to be released in 2014.

The Energy Star Building Score

In 1999 the EPA first introduced its Energy Star building score for office buildings, the most common building type.  The score is a number ranging from 1-100 that is intended to represent a particular building’s percentile ranking with respect to energy consumption as compared with similar buildings nationally.  So, if your building receives a score of 75 that is supposed to mean that, if you were to look at all similar buildings across the country, your building uses less energy than 75% of them, adjusting for indicated operating conditions.  Office buildings are the most common type of building.  Presumably if it were possible to determine the Energy Star score for every office building in the country you would find that half of them have scores ranging from 1-50 and the rest from 51-100.  Similarly you expect 10% of office buildings to have scores ranging from 91-100 and another 10% would have scores from 1-10, etc.  In general, you would expect a histogram of Energy Star scores for all office buildings to look like this.

Uniform ES score distribution

The Problem with Energy Star Scores

In the last 8 years or so more and more building studies have published the Energy Star scores for fairly large sets of buildings.  For some reason the mean Energy Star scores for these buildings sets always seems to be greater than 50.  It is, of course, possible that, in each case, the buildings studied represented “better than average” buildings.  But it also raises the question – how do we know that the Energy Star scores for all US buildings are distributed as expected?  What evidence has the EPA ever offered to demonstrate the validity of these scores?  So far as I can tell the answer is none.  There are no peer-reviewed articles and no masters or Ph.D. theses describing these models and the numerous tests undertaken to demonstrate their validity.  All we have are rather short technical descriptions of algorithms used to define the models.  In fact, the EPA has known for years that the mean Energy Star score for all buildings whose data were entered into Portfolio Manager was 60 (now 62).  You would think they might want to investigate why?

One obvious way to test this is to conduct a random sample of a large number of US commercial buildings, use EPA algorithms to calculate their Energy Star scores, and see how these scores are distributed.  But the only such sample is CBECS!  When the 2012 CBECS data become available this will afford an excellent opportunity to conduct such a test – that should be sometime in 2014.  (Meanwhile, thousands of commercial buildings in major US cities are benchmarking their buildings using these Energy Star models.)  For many building types the CBECS 2003 data were the basis for the associated Energy Star model – this is the case for the current model for office buildings.  In these cases the 2003 CBECS data cannot provide independent confirmation of the Energy Star models.

But there are a few building types for which the Energy Star models are based on 1999 CBECS data.  One such building type is “Medical Office Buildings.”  In this case we can extract data for medical office buildings from CBECS 2003, calculate their Energy Star scores using the EPA’s model, then generate a histogram to show how these scores are distributed for all medical office buildings contained in the 2003 US commercial building stock.  The distribution is expected to be uniform as shown in the Figure above with some random uncertainty, or course.

I have done just that and the results are graphed below.  The graph clearly demonstrates that the scores are not uniformly distributed, and therefore the score cannot have the stated mathematical interpretation.  The mean Energy Star score is 65 well above the expected value of 50.  Nearly 45% of US medical office buildings have Energy Star scores from 81-100 – significantly higher than the expected 20%. and only 8% have scores ranging from 11-40, well below the expected 30%!  It is highly unlikely that US medical office buildings saw massive improvements in energy efficiency from 1999 to 2003.  The explanation is simpler — the model is based on faulty assumptions.

Medical Office 2003 ES histogram

This graph clearly calls into question the validity of the Energy Star Medical Office building model.  This model was developed in 2004 and has been in use for nearly a decade.  Is it possible that the EPA never conducted this simple test to check the validity of this model?   It would appear that for a decade now the EPA has employed a flawed building model to generate Energy Star scores for medical offices and to draw conclusions about the amount of energy the Energy Star program has saved.

If this one model is wrong — and the error went undetected so long — what confidence can we have in Energy Star models for other building types?

In my next issue I will look at the distribution of Energy Star scores for Dormitories/Residence Halls.