A Rating Tale
HERS ratings have been structured to be as independent of occupant behavior as possible. Maybe that premise should be reexamined.
Drivers who treat the daily commute like the Daytona 500 will never achieve their cars’ miles-per-gallon rating—and shouldn’t expect to. The mantra of the Home Energy Rating System (HERS) industry—“Rate the home, not the occupant”—communicates a similar approach to home ratings. HERS ratings strive to reflect the energy use that a home would have if a typical family were living in it, and so they are based on a home’s structural components and computer simulations of energy use under typical conditions. Ratings do not take into account how much energy these homes actually use. Based on my analysis of HERS ratings in Wisconsin, however, I believe that more actual usage should be brought to bear on the HERS industry, not less.
In 1998 and 1999, I led an Energy Center of Wisconsin study to characterize single-family homes in Wisconsin. Our main goal was to bring some facts to the table in the design and implementation of residential programs in Wisconsin. Early on, we recognized that by linking up with the state’s HERS program—Wisconsin Home Performance (WHP)—we could tap into a network of trained raters to gather the data we needed for our research project. Since 1996, a pool of about 70 WHP raters have conducted HERS ratings on more than 2,300 homes in Wisconsin.
Our plan was simple: We would recruit homeowners at random by telephone, offering them an incentive of $50–$100 for allowing us to audit their homes. Then we would send a WHP rater to each home to conduct a standard HERS rating and gather additional data (such as measuring the temperature of the domestic hot water (DHW) supply and doing short-term monitoring of refrigerator electricity use). We also asked the homeowners to fill out a voluminous questionnaire that dealt with life-style issues, and we received permission to obtain utility billing histories.
Working with Wisconsin Energy Conservation Corporation, Opinion Dynamics Corporation, and about a dozen WHP raters, we were able to recruit 299 homeowners around the state. Their homes ranged from small mobile homes to an older mansion in a wealthy Milwaukee neighborhood.
Along the way, it dawned on us that we had an excellent set of data for examining the accuracy of the HERS ratings in Wisconsin, especially given some of the concerns that were raised in this magazine back in 1997 by Jeff Ross Stein (“Home Energy Rating Systems: Actual Usage May Vary,” HE Sept/Oct ’97, p. 21). Accordingly, I undertook a comparison between energy use predictions from the HERS ratings and what we observed from actual customer bills.
The Wisconsin HERS program uses Architectural Energy Corporation’s popular REM/Rate software. For our study, the raters used version 8.46. Like most HERS programs, REM/Rate serves three purposes. It provides an overall rating score; predicts home energy use and cost; and recommends energy improvement measures.
Though the rating score and the usage estimates are meant to be occupant neutral, the predictions of energy use that are incorporated into calculating the costeffectiveness of potential improvements take into account such factors as the homeowner’s thermostat setpoint.
Isolating Heating Effects
I restricted my analysis to comparing REM/Rate’s prediction of space-heating use (with thermostat setpoint taken into account) with the utility billing data. Why? First, since space heating use accounts for more than 40% of the typical Wisconsin homeowner’s annual energy bill, it is by far the largest single end use in our cold climate.
Second, space-heating use can fairly easily be split out from utility meter reading information. I used the Princeton Scorekeeping Method (PRISM), which estimates weathernormalized heating use from monthly utility bills to do this for homes that heated with natural gas (see “Getting Heating Use from Utility Bills,” p. 34).
Third, space heating is probably the most tractable end use when it comes to adjusting for the effect of occupant behavior. All Wisconsin residents use a lot of energy for space heating to get through a typical winter. How much they use is determined mainly by the size of the home, the condition of the building shell, and the efficiency of the heating system. Though occupants can certainly affect how much heat they use by their choice of thermostat settings, this was a variable that we addressed on our questionnaire, as well as one that REM/Rate captures in its model of heating use. With this information, we were able to account for the effects of occupant behavior on space-heating use.
In contrast, air conditioning use in Wisconsin is much more idiosyncratic. Though more than half of Wisconsin homes have central A/C, many homeowners use it only for the very hottest days, and some never turn it on. Water heating too is highly dependent on people’s habits. And as for refrigerators, other appliances, and lights, though REM/Rate allows users to capture some of this information, raters in Wisconsin rarely bother to adjust the default values, because these adjustments would have little impact on the rating score.
For the 147 homes in our study that are heated with natural gas only, I compared the REM/Rate predictions of annual heating energy use to the PRISM estimates and showed a reasonably good fit to heating degree-days (HDD). We did not collect usage data for bulk fuels such as propane. I also eliminated from the analysis homes with auxiliary heating sources such as wood stoves or baseboard electric heating that would not be reflected in natural-gas data. I left in homes with wood fireplaces and portable electric heaters, because I found that these did not affect the results. I used heating energy intensity (Btu/ft2 /HDD) to reduce the influence of large homes and control for differences in climate across the state.
Heating Use Overestimated
The results show that, on average, the ratings overestimate heating energy use by what I would term a moderate amount. The median home’s heating use is overestimated by about 22%, and the REM/Rate prediction is higher than the PRISM estimate in about 80% of the cases. These statistics probably understate the actual difference somewhat for the reasons cited in “Getting Heating Use from Utility Bills.”
There’s more to the story, however. While there is a reasonably good correlation between the HERS predictions and the billing data estimates (see Figure 1), the trend line suggests that the difference between the two is a function of a home’s predicted heating energy intensity. Heating energy use for homes that are predicted to be very efficient—ones that have a low heating energy intensity—is slightly underestimated, while use for inefficient homes is badly overestimated.
In fact, though 15% of the homes in our sample were predicted to use more than 15 Btu/ft2/HDD (about 1,800 therms for a typical Wisconsin home), none of the homes in the analysis had actual heating use at this level. For this group of homes, the HERS prediction exceeds the PRISM estimate by a median of 62%. These are mostly older homes that were modeled as having multiple deficiencies, such as uninsulated walls, underinsulated ceilings, high air leakage, or low heating system efficiency.
What would account for this observed trend? I, and others, have entertained a number of theories.
Theory 1: People in inefficient homes keep their thermostats set lower. This theory holds that people who live in less efficient homes are more likely to keep their thermostat set lower to save on their utility bills, especially since these are more likely to be low-income households. It’s a good theory, but the information that people supplied on our questionnaire suggests just the opposite: Underinsulated and leaky homes are associated with somewhat higher reported thermostat settings. A plausible explanation is that these homeowners need to keep the thermostat set higher to be comfortable in homes that have colder wall surfaces and more air movement. In general, we found little connection between people’s proclivity for saving energy and the state of their building shells.
The question inevitably arises, can you trust what people report about how they set their thermostats? I believe so, for two reasons. First, the self-reported thermostats are well correlated with differences in actual heating use, even after accounting for differences in insulation levels and air leakage. The data largely confirm the old rule of thumb that each degree of difference in the thermostat setting translates into about a 3% difference in heating use.
Second, I found that the difference between what the homeowner reported to us about thermostat settings and what the rater entered into REM/Rate is a statistically significant predictor in the error in predicted heating use. Although homeowners in our sample reported thermostat settings that ranged from 59°F to 74°F, in about three-quarters of the cases the rater left the thermostat at the default value (68°F) in the rating. I found that each degree of difference between what the homeowner reported to us and what was used in the rating resulted in a disparity of roughly 2.5% between the HERS prediction and the PRISM estimate. Although errors in modeling the thermostat setpoint account for some of the error in predictions of heating use for individual houses, the average modeled and self-reported setpoints agree closely, and accounting for the errors does not affect the overall pattern in Figure 1.
Theory 2: Garbage in, garbage out. Wisconsin raters may not be correctly modeling the homes. Certainly there is room for error in modeling older homes, even though our raters were experienced and each rating underwent three levels of review. In a test conducted as part of the 1999 Affordable Comfort Conference, Jim Cavallo of Argonne National Laboratory asked four raters to independently rate an older and a new home, each using REM/Rate (see “HERS Experiment Cause for Confidence,” HE Sept/Oct ’99, p. 17). The resulting rating scores had a 7.5% difference for the older home, compared to 1.8% for the new home. But while random errors in modeling homes could be expected to contribute to the scatter in Figure 1, they would not produce a consistent pattern— unless Wisconsin raters are systematically doing something wrong.
Some have suggested that the way basements are treated in the Wisconsin program is the root of the problem. Raters in Wisconsin generally model basements as conditioned spaces, though in reality most Wisconsin basements are unintentionally conditioned—the furnace and water heater warm the space but do not maintain the thermostat setpoint. Architectural Energy Corporation (AEC) recommends that these basements be modeled as unconditioned space. However, when I reanalyzed a sample of a dozen homes the way AEC recommends, I found that it didn’t make much difference. Higher estimated heat loss through the foundation walls in the one case (conditioned) is mostly offset by higher predicted duct leakage in the other (unconditioned).
Another possibility is that raters systematically underestimate the insulation levels or the heating system efficiency of older homes. It is certainly possible to misjudge wall insulation levels, since the insulation is hidden in the wall cavities. However, the raters indicated to us that they visually verified wall insulation levels in at least one location in about three-quarters of our sample. Furthermore, homes whose predicted heating use is extreme in either direction may have modeling errors that make them stand out from the rest of the sample. But even if we ignore the extreme homes and focus only on the middle 50% of homes (in terms of predicted heating use), a similar trend line emerges. This suggests that whatever is going on is affecting most or all homes, not just the extreme cases.
Theory 3: Undersized furnaces. This theory holds that inefficient homes should use a lot more energy, but their heating systems are so undersized that they can’t maintain the thermostat setpoint, so they end up using less heating energy. We did have a couple of households that reported to us that they kept their thermostat set at 80°F–90°F, and on investigation, these turned out to have undersized furnaces. But the billing data analysis suggests that the average heating system is oversized by about 60%, and fewer than 5% are undersized.
Theory 4: Rogue raters, mad wood burners, and other possibilities. This is really a grab bag of possibilities. Was there perhaps one rogue rater who consistently underestimated insulation levels? Are the problem homes clustered in one county? Do homeowners with wood fireplaces or electric space heaters offset a large chunk of their heating load? Without going into detail, I looked into these possibilities, and in every case the answer seems to be no.
Theory 5: Heat loss models versus reality. The failure to find any other explanation for what we observed leaves me thinking that perhaps there is something inherent in typical energy modeling algorithms that tends to significantly overestimate heating use in poorly insulated homes. Standard heat loss theory tells us that the lower the R-value of a structural component, the more sensitive the predictions are to small changes in the assumptions. For example, the difference in predicted heat loss between an R-4 wall and an R-3 wall is 25%, but the difference between an R-19 wall and an R-18 wall is only about 5%.
Dave Roberts at AEC speculates that perhaps the phenomenon results from a simplifying assumption common to energy models— namely, that all areas of the house are at the thermostat setpoint right up to the thermal boundary. Perhaps rooms that are farther away from the thermostat are colder, particularly at the wall surface. Colder rooms mean less heat loss through the walls and ceiling. And the less insulation there is, the colder these outlying rooms are likely to be.
Whatever is going on, I doubt that the phenomenon is limited to REM/Rate. After all, REM/Rate passes the National Renewable Energy Laboratory’s HERS BESTEST, which pits HERS software against three state-of-the-art building simulation models (DOE-2, BLAST, and SERI/RES) in a series of highly defined hypothetical homes.
But the BESTEST test simulations are meant to benchmark models against other models, not against actual homes. And what’s not so widely known is that even among the three BESTEST reference simulations there is substantial variation in the estimates. For a series of hypothetical homes in Colorado Springs, the highest estimates exceed the lowest estimates by an average of 60%, and the difference is largest for the simulated energy-inefficient home (see Figure 2, p.34). It seems clear that the simulation tools out there do not necessarily agree on how to model energy use, especially in inefficient homes.
This issue goes beyond errors in predicting annual energy use. If there is a consistent error in estimating heating energy use, then it stands to reason that the rating scores would be off base as well. I found that there is a strong relationship between the predicted heating energy use and the rating score, even though the rating scores are specifically designed not to take occupancy behavior into account (see Figure 3). Because space heating is such a large part of overall energy bills (at least in Wisconsin), it is apparently the main factor in determining the HERS score.
By combining the regression lines in Figures 1 and 3, I could estimate how much rating scores would change on average if the HERS predictions took into account the actual billing data results. The result is a big increase in scores for low-scoring homes, and a slight decrease in scores for high-scoring homes (see Figure 4, p. 36). For example, the results indicate that homes that get a rating of 50 are being underscored by 20 points on average.
After making this adjustment, our indicates that 90% of Wisconsin homes would have a rating score somewhere between 74 and 84. Ironically, thesample rating score (an arguably vague creation anyway) becomes less useful as a tool for distinguishing good from bad energy performance. Partly in recognition of this, the State of Wisconsin has since moved to modify how rating stars are awarded under its WHP program based on the HERS score. Previously, a home that scored 70 would receive a three-star rating; under the new scheme, the same home would get one star, and homes scoring below 70 would get no stars.
The good news is that once the average error in heating use has been accounted for, individual ratings are reasonably accurate a fair amount of the time. The scatter around the regression line in Figure 1 indicates that after removing the systematic error in the predictions, the predicted heating use will be within 10% of the billing data about one-third of the time and within 20% about half of the time. That’s not bad, considering that the average 10% uncertainty in the PRISM estimates themselves is embedded in these figures.
HERS Industry Lessons
So what does all of this mean for the HERS industry? First, I’d like to see more validation of HERS software against actual customer bills. According to energy consultant Bion Howard, actual studies comparing the energy use of homes to HERS predictions are few and far between, despite a need for the HERS industry to make good on performance claims. Perhaps what I observed is a fluke; only additional studies will tell. Yes, variation in life-style will mean that some homes will be overestimated and some will be underestimated, but we need to be confident that HERS software gets it right on average.
Second, why not build into the software the ability to enter and analyze actual customer bills? When it comes to heating-dominated energy use in places like Wisconsin, it’s not that difficult to derive a fairly accurate picture of heating use from utility bills. Even if the rating score does not directly make use of this information, it makes for a good crosscheck against the modeling assumptions. Moreover, I think that the auditing side of HERS—the recommendations for energy improvements and estimates of their savings and payback—should be linked more directly to actual customer use. The HERS industry suffers a loss of credibility every time a rater hands a homeowner a HERS report that recommends improvements based on an estimate of energy use that is far beyond what the customer actually uses.
In the meantime, there are several steps HERS raters can take to minimize rating errors and build consumer confidence in HERS:
Do reality checks against actual usage whenever possible. Yes, the goal of a rating score is to rate the home, not the occupant. But if the HERS is estimating an annual heating bill of $1,000 per year for a home with an actual gas bill of $500, something is clearly wrong. This cross-check does not need to be a fancy statistical analysis like PRISM; a couple of minutes of figuring using a summer bill and a winter bill In the meantime, there are several steps HERS raters can take to minimize rating errors and build consumer confidence in HERS: Do reality checks against actual usage whenever possible. Yes, the goal of a rating score is to rate the home, not the occupant. But if the HERS is estimating an annual heating bill of $1,000 per year for a home with an actual gas bill of $500, something is clearly wrong. This cross-check does not need to be a fancy statistical analysis like PRISM; a couple of minutes of figuring using a summer bill and a winter bill can yield a reasonable estimate for space heating in cold climates like Wisconsin.
Know what represents unusual energy use. Raters should be wary when a rating program predicts energy use that is much higher (or lower) than what most homes use. This might be cause to go back and review the modeling assumptions that went into the HERS analysis. How is a rater supposed to know what represents unusual usage? Because energy use for heating and cooling varies widely around the country, I think it is incumbent on the organizations that sponsor HERS programs in various states to provide guidelines based on samples of actual homes. Raters in climates like Wisconsin’s can use the heating energy intensity figures that we found from our study. But beware! Heating energy intensity tends to increase as the climate gets milder—because, I suspect, overall energy costs drive building construction practices. Homes in mild climates can use a lot more energy per HDD and still not have excessive heating bills.
Take the time to find out the owner’s thermostat-setting habits. This means working through the setbacks that the homeowner may practice, and deriving an average thermostat setpoint. Using the default setpoints can lead to errors of 20% or more in estimating space-heating use.
Enter your comments in the box below:
(Please note that all comments are subject to review prior to posting.)
While we will do our best to monitor all comments and blog posts for accuracy and relevancy, Home Energy is not responsible for content posted by our readers or third parties. Home Energy reserves the right to edit or remove comments or blog posts that do not meet our community guidelines.