This article was originally published in the September/October 1997 issue of Home Energy Magazine. Some formatting inconsistencies may be evident in older archive content.


| Back to Contents Page | Home Energy Index | About Home Energy |
| Home Energy Home Page | Back Issues of Home Energy |



Home Energy Magazine Online September/October 1997

Home Energy Rating Systems: Actual Usage May Vary

by Jeff Ross Stein

Jeff Ross Stein, a former research assistant at Lawrence Berkeley National Laboratory, is currently a design engineer at ACCO, an HVAC contractor.

Home energy ratings attempt to predict typical energy costs for a given residence and estimate the savings potentials of various energy retrofits. But one question has gone unanswered: How accurate are these ratings at predicting actual energy consumption? A new analysis suggests the ratings could do better.

Home energy rating systems (HERS) and related energy efficiency financing products have been in use since the late '70s. Today, 21 states have HERS. These systems score homes and estimate how much typical occupants would spend on energy. Consumers use the scores and annual cost estimates to compare the current and potential energy consumption of different homes.

The estimated energy costs of a high-rated home can help buyers to qualify for larger mortgages. However, if they get a larger mortgage based on the rating and still have high energy bills, their wallets will feel the squeeze. And if the estimate of resident energy use is wrong, the list of suggested cost-effective improvements that comes with the rating may include money-losing investments. To avoid these problems, HERS estimates need to approximate actual energy cost in homes.

As a research project for Lawrence Berkeley National Laboratory, Alan Meier and I compared home energy ratings with actual utility billing data for about 500 houses. The ratings were supplied to us by HERS providers in four states--the California Home Energy Efficiency Rating System (CHEERS), Energy Rated Homes of Colorado, Home Energy Ratings of Ohio, and Midwest Energy, a utility company and HERS provider in Kansas.

The CHEERS ratings were conducted in 1994; the others in 1996. These HERS all used different rating software and had slightly different rating procedures. For example, the CHEERS ratings did not include blower door testing, while the other ratings did. All of the HERS providers assured us that the samples were representative of the house types they rate and were within the expected accuracy of their ratings. (However, CHEERS has changed its software significantly since 1994, so our analysis of their predictions may no longer be relevant.)

We examined weather data from local federal weather stations for all of the locations, to ensure that utility bills during our study period were not thrown off by unusual weather. Since the heating degree-days during our study period were all close enough to the long-term averages used in the HERS software, we deemed weather normalization unnecessary.

What's a HERS?

A HERS is a computer simulation-based method for assessing a home's existing energy efficiency and its potential for improvement. The rating usually requires a detailed home inspection by a trained rater. It will typically generate three types of output:
  • Rating score. Rating scores are usually on a 0- to 100-point or one- to five-star scale. The score is based on a comparison between the rated house and a reference house that meets a desired energy code or standard but is tailored to the same dimensions as the rated house. A rating score tells only how close a house compares to the standard. Houses, particularly ones with different sizes and fuels, can have very different energy loads and still be fully compliant with a standard, so rating scores should not be used to compare houses. The score reflects how close a certain house is to its potential given its size, shape, fuel mix, and other factors.
  • Energy use/cost predictions. HERS make typical energy use and energy cost predictions for specific end uses, such as heating and hot water, and for the whole house. The predicted energy cost is simply predicted energy use multiplied by the local utility rate. Unlike scores, which are relative to a reference house, predictions are absolute measures. Absolute measures can be used to compare houses in the same way that miles-per-gallon ratings are used to compare cars.
  • Recommendations. HERS produce a list of recommended improvements that are calculated to be life cycle cost-effective. Typical recommendations include adding attic insulation, replacing old heating or cooling equipment, and installing a programmable setback thermostat. HERS-recommended improvements can be financed in an Energy Improvement Mortgage (see Making Energy Mortgages Work, HE May/June '95, p. 27).
Is HERS on Target? We checked HERS performance in three areas: scores, energy predictions, and recommendations (see What's a HERS?). In general, we found that HERS can be remarkably accurate at predicting average annual energy costs for groups of homes. Predictions for individual homes were less impressive. Some individual ratings significantly overpredicted or underpredicted energy costs, especially for older homes. Furthermore, there was no clear relationship between the rating score of an individual home and actual energy cost.

Figure 1. CHEERS predicted that the higher-rated homes would spend less on energy than the lower-rated homes. The dotted line is the regression of CHEERS' predictions. However, in reality, all levels of homes averaged the same energy use, around $1,000 per year.

One of our most surprising discoveries was that none of the HERS we examined showed any clear relationship between rating score and total energy use or energy cost. Technically, rating scores only measure a house's individual potential for energy improvement; they are not designed to be used to compare different houses in the same way miles-per-gallon ratings are designed to compare cars. However, many consumers and HERS-related financing programs assume that houses with higher scores will have lower energy costs. Unfortunately, houses with higher scores, even when compared to houses of similar size, did not tend to use any less energy than houses with lower scores. The dashed line in Figure 1 shows the regression line of the CHEERS predictions. The declining energy use with higher ratings would seem to make sense. However, the solid line shows the regression line of actual average energy cost. It was constant at about $1,000 per year, regardless of the score.

The discrepancy between scores and energy use may be due to the take-back effect. Take-back occurs when people with more efficient homes use more energy than expected because they are less cautious about maintaining thermostat setbacks and other basic efficiency measures. In other words, higher-scoring houses may indeed be more efficient than lower scoring houses, but only if they are operated in the same manner.

Energy Predictions

Because of the way the results are presented, people are being led to believe that energy use and cost predictions are more precise than they really are. HERS predictions sometimes calculate energy costs or life cycle savings to four significant digits, a much higher level of accuracy than is necessary or realistic. A sample rating from CHEERS stated, Upgrading the cooling system to SEER 12.0 will save $2,166 on a life cycle basis. However, even ratings systems that are quite accurate on average have large margins of overprediction and underprediction for individual homes.

Three of the four HERS--Kansas, Ohio, and Colorado--were remarkably accurate at predicting actual energy cost or energy use for all homes in our sample (see Table 1). For example, on average, the Colorado system underpredicted the actual energy use by only 3%. The fourth system, CHEERS, tended to overpredict the actual energy cost by about 50%, but it was much more accurate for newer houses, underpredicting them by 8% on average.

Again, while the average estimates were close to the real average in most cases, individual errors were often high. For example, the standard deviation of CHEERS predictions from actual energy use was 80%, with about one-third of the houses overpredicted by more than 130% or underpredicted by more than 30%. While much of this individual error can be attributed to occupant behavior, the magnitude (and CHEERS's consistent tendency to overpredict energy use) implies the existence of a systematic error in the rating procedure.

Table 1. Breakdown of the Rating Systems
  California (all homes) California (new only) Kansas Ohio Colorado
Sample size 185 30 16 14 276
Avg actual energy cost $1,154 $1,327 $1,462 $1,697 135,000 Btu
Avg predicted energy cost $1,585 $1,026 $1,531 $1,402 120,000 Btu
Avg yr built 1959 1992 1995 N/A 1969
Blower door test? no no yes yes yes
Avg HDD/yr, '84-'95 2,791 2,791 4,954 5,371 6,254
Avg energy cost error 51% -8% -7% -14% -3%*
Standard deviation§ in errors 80% 44% 15% 20% 35%*
* Energy cost data were not available for Colorado, so error and standard deviations refer to site energy use. Averages are for the houses that were rated
§ Standard deviation measures dispersion from the average.
Recommended Energy Improvements

A HERS rating comes with recommended measures to improve a given home's energy efficiency. The recommended measures are expected to be cost-effective. For example, a HERS might calculate that a hot-water tank wrap will reduce water heating stand-by losses and pay for itself in a particular house in one year.

We wanted to know what the impact of these recommended measures really was. We compared the actual energy use of CHEERS homes to the total energy savings that CHEERS predicted the occupants would receive if they implemented all recommendations.

We found obvious errors--some ratings predicted that homeowners would save more energy than they actually used, and many ratings predicted savings greater than 50% of the actual consumption. When total savings estimates are impossibly high, it is likely that some recommended measures are not actually cost-effective. This is especially likely because HERS only require that life cycle cost be less than predicted life cycle savings. Recommendations do not always have a built-in margin of safety to account for likely variation between occupants.

On the other hand, the value of many typical HERS recommendations are not dependent on the accuracy of the rating. In the water-tank wrap example above, the rating calculated that the wrap would pay back in one year. Even if the rating overpredicted hot water use by 300%, the tank wrap would still pay for itself in about three years. The detailed economic information that usually comes with HERS recommendations, such as simple payback period, allows consumers to compare the financial aspects of different options and possibly reduce the risk of a bad investment.

Moreover, many recommended improvements also provide intangible benefits, such as increased comfort, reduced noise, greater security, and better aesthetics.

Why Isn't HERS Perfect? The algorithms most HERS use to rate a house include many variables, among them the dimensions of every window, wall, and floor in the house. To satisfy the ratings formulas, raters must also collect data on a wide range of variables, from duct leakage rates to insulation thickness to window overhang dimensions.

Accurate measurements for each of these are necessary for accurate predictions. Although raters are required to be trained and certified, they can introduce errors by collecting or recording inaccurate data. For example, in the CHEERS ratings, the six raters who rated the 185 CHEERS homes used for our study had ratings with significantly different average error and variance, suggesting that the data may have been entered incorrectly.

In addition, the simulation algorithms can be based on incorrect assumptions. For example, algorithms make assumptions about local weather (based on typical years); about some physical features, such as the number of appliances; and about the occupants.

Occupant behavior is probably the single most significant determinant of actual energy use (see Can We Transform the Market Without Transforming the Customer? HE Jan/Feb '94, p. 17). HERS have the difficult task of making assumptions based on typical occupant behavior. Reality can easily diverge from these assumptions; predicted energy use or energy cost can be off by 50% or more due to occupant behavior. Other variables also rely on assumptions rather than on measurement. For example, the weather variable is based on long-term averages, while the actual weather can differ considerably from the average in a given year. Any assumption can introduce error (see Differences Between HERS and HERS).

Improving HERS Our study results suggest several areas in which HERS could be improved, including better software, training, evaluation, and disclaimers. As national HERS accreditation moves forward, minimum standards in each of these areas may help to resolve many of HERS's problems.

Accurate Disaggregation of End Uses

The accuracy of specific end-use predictions, such as space heating and cooling and hot water heating, must be improved if recommendations are to be accurate. Suppose that a HERS provider calibrated a software package by assuming less hot water use and higher winter thermostat settings. The rating system might recommend replacing a lot of furnaces and not replacing hot-water heaters when, in reality, the opposite might be more appropriate.

Philip Fairey of the Florida Solar Energy Center studied HERS in Florida. By submetering certain end uses, he showed that the total energy use prediction was generally quite accurate, but that HERS tended to overpredict some end uses and underpredict others. While submetering particular equipment can be very expensive, it is the best way to verify and improve accuracy. Disaggregation is one area where the continual evolution of software can have a beneficial effect. 

Error Correlations and Corrections

Analysis of billing data can be taken a step further by looking for correlations between rating accuracy and house characteristics. For example, we found that CHEERS overpredicted gas use more in Eureka, California, which has a relatively cold climate, than in Fresno, California, which has a relatively hot climate, and that it overpredicted electricity use more in Fresno. In general, we found that climates calling for more heating or cooling were the climates with more overpredicted energy use. CHEERS may be using incorrect heating and cooling setpoints, infiltration rates, or conduction rates.

Analyzing utility billing data can be a valuable and inexpensive way to improve HERS accuracy; however, it doesn't give the whole picture. Other types of research are also needed to document and improve accuracy. For example, the HERS BESTEST, which benchmarks HERS against DOE-2 and other state-of-the-art simulation software, is a valuable tool for testing the simulation properties of HERS.

To evaluate ratings on an ongoing basis, some HERS get utility bills for many rated homes. As nationwide accreditation leads to nationwide monitoring and evaluation, HERS guidelines may be modified. Accreditation will give HERS administrators a chance to note uniform irregularities nationwide. The currently proposed process for accreditation would require each HERS provider to collect utility bills for at least 10% of homes rated annually or 500 homes annually, whichever is less.

Training the Raters

Another important trend we found in the CHEERS data was that some raters tended to produce more accurate ratings than others. This emphasizes the need for rater training, supervision, and retraining, and the need to minimize rater judgment calls in the rating procedures. Rater training varies in length and detail from state to state. For example, Indiana uses a weeklong course; across the border in Illinois, training takes just two days. Also, raters bring different backgrounds to the job. Some have no experience; some have done weatherization; and some are contractors who are familiar with blower doors, analysis software, and the whole-house approach. Again, accreditation may provide an opportunity to require minimum training levels.

Disclaimers: The Scores Are Not What You Think!

HERS providers need to give consumers more information about the accuracy and meaning of the ratings. HERS agencies generally do not explain how scores are calculated or how they should be interpreted. Rating scores are not designed to compare houses in the same way that miles-per-gallon ratings are used to compare cars.

Today, many people in the HERS industry want to overhaul or eliminate the scoring system and focus consumers' attention exclusively on energy use and cost predictions. However, these predictions might be more accurately presented as a range of savings, which would eliminate much of the uncertainty in the calculation.

This approach has its critics. Mark Janssen of Indiana's HERS believes that rating software is accurate enough to be trusted. More importantly, he points out that customers want a number, not a range. They want to be told whether an improvement will be cost-effective or not.

Regardless of how accurate the ratings are, an increasing number of HERS are including a lengthy disclaimer. These disclaimers attempt to communicate to customers that savings estimates do not guarantee savings.


Many homes have been rated by HERS across the country in the last several years. However, agencies using ratings systems have not rigorously evaluated whether the ratings are providing accurate and useful information. To improve their ratings, agencies need to rigorously evaluate their programs and make data about their programs available to researchers. Researchers studying HERS can start with easy-to-use, low-cost data--for example, actual utility bill data for rated houses. Utility data can be used to validate accuracy, to calibrate rating systems, and to help identify and correct specific system errors.

At this point, those in the field do not generally consider accuracy to be a significant barrier to widespread HERS use. But everyone agrees that accuracy is important for credibility and long-term success of the programs.

Furthermore, a lack of accuracy may eventually catch up with some HERS and create a stigma that could spread to other programs. When other energy efficiency technologies have failed to live up to initial expectations, they have suffered from serious and long-lasting problems--for example, solar water heaters or compact fluorescent light bulbs. For these reasons, HERS organizations and HERS providers must continue to document and improve accuracy.



 | Back to Contents Page | Home Energy Index | About Home Energy |
| Home Energy Home Page | Back Issues of Home Energy |

Home Energy can be reached at:
Home Energy magazine -- Please read our Copyright Notice



  • 1
  • NEXT
  • LAST
Earn BPI CEU credits
Home Performance with EnergyStar
SPONSORED CONTENT Insulated, Air-Sealed Drapes Learn more! Watch Video
Email Newsletter

Home Energy E-Newsletter

Sign up for our free monthly

How can Home Energy serve you better?

Let us know how by taking our short reader survey.

Take the survey