Skip to main content

I compared AC prices. The cheapest one costs more.

·11 min read
pythoncost-modelingdata-pipelineengineering

I wanted to compare air conditioners across two Vietnamese retail sites. Scrape both catalogs, match the same models, sort by price. Few hours of work, tops.

It took a bit more than that.

TLDR: The AC that's cheapest at the register costs about 9.5 million dong more to own over five years, because it eats electricity. To find the real answer I had to figure out when two listings are the same product, build a cost model that counts years of power bills, and then verify every number the retailers gave me because the efficiency labels were wrong more often than I expected.

Before you compare prices, you need to know what you're comparing

You can't say "this model is cheaper at retailer A" until you've confirmed both retailers are selling the same machine.

The two sites list the same 1 HP inverter ACs, but they use different product codes for the same physical unit. Casper lists JC-09IU36 on one site and JC-09IU36X on the other. Funiki: HIC09TMU vs HIC09TMU.ST3. Same machine, different strings.

My pipeline normalizes each model code (uppercase, strip separators) and joins on BRAND:NORMALIZEDCODE. This auto-join matched 20 of the 24 models sold at both retailers. The other 4 I matched by hand against the spec sheets.

The obvious next move is fuzzy matching: if two codes share a long prefix, just merge them. But LG's IEC09M1 and IFC09M1 are one letter apart. They look identical. They're different generations with different specs. Merging them would fabricate a product that doesn't exist.

So the rule: a matching code is a hint to go check the spec sheets. Equal BTU, equal CSPF, equal coil dimensions = same machine. Similar code with different specs = different machine. I kept a small aliases file for the 4 confirmed matches that the auto-join missed. That file can't be regenerated by re-running the scraper. It's just human knowledge written down.

89 distinct models, 24 sold at both retailers. Now I could talk about price.

The sticker price is the wrong number

Once I had clean models, the obvious move is sort by price.

Cheapest to buy: Hisense AS-10CR4RYDDJ02 at 4,490,000d, CSPF 3.62. Cheapest to own over 5 years: Hisense AS-10TR4RGUUA00 at 7,990,000d, CSPF 6.28.

The second one costs 3.5 million more at the register and comes out roughly 9.5 million cheaper over five years. I already knew going in that AC inflates the electric bill. I'd heard people talk about 1 million, 2 million dong monthly bills after installing one. But I didn't know the actual figure until I sat down and calculated it. Turns out the purchase price is the small number. The electricity is where you actually pay.

Top 10 air conditioners ranked by five-year total cost of ownership

How the cost model works

Pretty simple. Every machine cools the same room to the same temperature. The only thing that changes is how efficiently they turn electricity into cooling, measured by CSPF (the seasonal efficiency rating on the energy label).

seasonal_cooling_load_kwh = reference_capacity * load_factor * hours_per_year  # the room's CSTL
electricity_kwh           = seasonal_cooling_load_kwh / cspf                    # CSPF = efficiency only
tco = purchase_price + electricity_kwh * tariff * years

You set a temperature on the AC and it cools the room to that point. A stronger BTU just gets the room there faster. Once it hits the target, the inverter kicks into an energy saving cycle and basically idles. So BTU is about speed, not about how much cooling you get in total. Every machine in the comparison delivers the same cooling to the same room. The only difference is how much electricity each one burns doing it.

The inputs are my actual schedule, because the whole point was to answer my own buying question:

  • Weekday: 14 hours (evening work around 6pm through 8am)
  • Weekend: 16 hours (sleep in to noon, run later)
  • Weighted average: about 14.6 hours/day, 330 days/year
  • 50% load factor (the room's average cooling demand vs the unit's rated capacity, a sizing number, not an efficiency one)
  • EVN top tier tariff: 3,500d/kWh including VAT

That works out to a seasonal cooling load (CSTL) of about 6,342 kWh of heat per year. Under this load, one point of CSPF is worth roughly 5.5 million dong over five years. Most price gaps between budget ACs are smaller than that.

What the 50% actually is. Picture the room as a leaky bucket. Heat seeps in through the walls, the windows, the sun, your own body. The AC is the scoop you bail with. Walk into a hot room and you scoop flat out to empty it, that's the pull-down at full power. Once it's cold you only trickle, just fast enough to match the leak, and that gentle trickle is most of the night. Average it out and the scoop is about half full each stroke. That's the 50%.

Two things follow. The total you bail equals the total that leaked in, so the cooling demand is set by the room and the weather, not by how big your AC is. A bigger BTU just empties the bucket faster, then idles. That's why every unit here delivers the same cooling to the same room. And the load factor only describes how much cooling the room wants. It says nothing about how much electricity each scoop costs. That second question is the only thing CSPF answers. Because I use the same load factor for every unit, it scales the whole electric bill up or down but never reorders the table. The ranking depends on CSPF and price alone.

Why 5 years, not 10

Air conditioners are "supposed to" last 10 years. Compressor warranties run 10 to 12.

I report the 10 year number. I don't rank on it.

At 14+ hours a day in a hot humid climate, these things don't last 10 years. When I've gone looking at rooms, I've seen old ACs where the outdoor unit is completely oxidized. They sound terrible, take forever to cool down the room, and clearly aren't performing anywhere near their rated efficiency. The real useful life is more like 5, 6, maybe 7 years. A 10 year horizon credits cheap high CSPF units with electricity savings they probably won't live to deliver.

5 year TCO is the sort key. 10 year is a reference column.

Don't trust the efficiency label either

CSPF drives the whole ranking, and it's the shakiest number in the dataset. Scraped from one site, sometimes missing, sometimes wrong.

I run it through a trust ladder, always biased toward understating efficiency. If the manufacturer's own site contradicts the retailer, the manufacturer wins (checked by hand). If both retailers agree, I use their number. If they disagree, I take the lower value. If nobody published a label at all, I fall back to the dataset median, about 5.27.

That last fallback is where I got burned. The Nagakawa NIS-C09R2 had no energy label on the retailer site, so the pipeline imputed the median: 5.28. The manufacturer's own site lists the matching SKU at 4.51. The median had given a bottom tier machine an above average efficiency it doesn't have. It would've climbed the ranking on a number I made up.

So I added a manufacturer verification pass. A hand-built override file with a source URL for every correction, keyed so it survives a full re-scrape.

That verification pass is where it got weird. Once the pipeline spat out the top 10 TCO picks, I went to each manufacturer's site myself to check if the numbers were real. The #1 pick was a Hisense. I opened their product page and the specs looked fine. But the page was a single URL with a JavaScript variant selector. My scraping agent had been using curl, which fetches the static HTML. The static HTML rendered the 1.5 HP variant's specs, not the 1 HP I was actually ranking. The CSPF behind my top recommendation was, for a while, literally from the wrong machine. I only caught it because I went and looked with my own eyes, then switched the agent to a browser tool that could wait for the JS to load the right variant. The correct CSPF was 6.28, which happened to confirm the ranking. But it could easily have gone the other way. A static scrape can hand you a confident, precise, wrong number.

The price is a snapshot

The scraped price is the displayed price at one moment. Some pages carry a "10% online discount" banner the pipeline doesn't capture. Sale prices drift; spot checks ran about 1 million higher than the snapshot.

I don't model either. They're conditional and volatile. When the ranking looks off near a price boundary, I just re-scrape.

Build quality and warranty are not part of the cost score

TCO tells you which machine is cheapest to run. It doesn't tell you which one is best built. I wanted both answers, so I built a separate build quality grade and a separate warranty grade. Three scores, kept apart.

I got the idea for the build quality score from a Facebook post about outdoor unit weight. Heavier outdoor units tend to have better noise insulation, bigger radiators, and the fan doesn't have to work as hard. That's a weak signal on its own, but the stronger signal is the fin coating. Remember those oxidized outdoor units I mentioned? Coil corrosion is the #1 failure mode in this climate. So the build grade is mostly about whether the coils have anti-corrosion coating or gold plating vs bare copper and aluminum, with unit weight as a tiebreaker.

Warranty is just the manufacturer's compressor warranty. Not the retailer's warranty, not parts. The manufacturer is betting its own money on the most expensive component. That's the cleanest signal of how long they expect it to last.

I kept all three scores separate on purpose. If I folded build quality into TCO, a flimsy but efficient unit could outrank a tank, and you'd never see the trade-off. Same problem if I merged build and warranty into one "durability" number. A sturdy machine with a short warranty and a flimsy one with a long warranty are different choices. Collapsing them hides that.

The #1 TCO pick, the Hisense, has good but not top tier build (anti-corrosion coating, not gold plated). The #2 TCO pick, a Casper, costs more at the register but lands second on 5 year cost. It has the best build grade and a 12 year compressor warranty. Two different machines, two different strengths. If I'd blended everything into one number, you'd never see that.

For models where warranty data is missing (about 23 of the 89), I just label it unknown. I don't guess. The CSPF median imputation already burned me once. I'm not doing that again with durability.

So what did I buy

I'm not going to name the model, but here's the logic.

The pipeline gave me the top 20 sorted by real cost. That narrowed the field. Then I checked the build and warranty grades and factored in the deal I got. I tracked deals in a file that the ranking actually subtracts before sorting. A real negotiated discount moves a unit up the table. The rule is it has to be a real, currently available deal. No inventing discounts to flatter a unit.

The spreadsheet didn't pick for me. It just killed the bad options. If a friend asks me which AC to get, I'm sending them the sheet, not a recommendation. Sort it yourself, check the warranty, check the build quality, factor in whatever deal you find. You probably already have a feeling about which AC is right for you. The pipeline just visualizes that feeling so you can actually see the numbers and decide.