I compared AC prices. The cheapest one costs more.
I wanted to compare air conditioners across two Vietnamese retail sites. Scrape both catalogs, match the same models, sort by price. Few hours of work, tops.
It took a bit more than that.
TLDR: The AC that's cheapest at the register costs about 9.5 million dong more to own over five years, because it eats electricity. To find the real answer I had to figure out when two listings are actually the same product, then build a cost model that counts years of power bills instead of just the sticker.
Before you compare prices, you need to know what you're comparing
You can't say "this model is cheaper at retailer A" until you've confirmed both retailers are selling the same machine.
The two sites list the same 1 HP inverter ACs, but they use different product codes for the same physical unit. Casper lists JC-09IU36 on one site and JC-09IU36X on the other. Funiki: HIC09TMU vs HIC09TMU.ST3. Same machine, different strings.
My pipeline normalizes each model code (uppercase, strip separators) and joins on BRAND:NORMALIZEDCODE. This auto-join matched 20 of the 24 models sold at both retailers. The other 4 I matched by hand against the spec sheets.
The obvious next move is fuzzy matching: if two codes share a long prefix, just merge them. But LG's IEC09M1 and IFC09M1 are one letter apart. They look identical. They're different generations with different specs. Merging them would fabricate a product that doesn't exist.
So the rule: a matching code is a hint to go check the spec sheets. Equal BTU, equal CSPF, equal coil dimensions = same machine. Similar code with different specs = different machine. I kept a small aliases file for the 4 confirmed matches that the auto-join missed. That file can't be regenerated by re-running the scraper. It's just human knowledge written down.
89 distinct models, 24 sold at both retailers. Now I could talk about price.
The sticker price is the wrong number
Once I had clean models, the obvious move is sort by price.
Cheapest to buy: Hisense AS-10CR4RYDDJ02 at 4,490,000d, CSPF 3.62.
Cheapest to own over 5 years: Hisense AS-10TR4RGUUA00 at 7,990,000d, CSPF 6.28.
The second one costs 3.5 million more at the register and comes out roughly 9.5 million cheaper over five years. If you just sort by sticker price you'd buy the first one and quietly overpay in electricity. For a machine you run 14 hours a day, the purchase price is the small number.
How the cost model works
Pretty simple. Every machine cools the same room to the same temperature. The only thing that changes is how efficiently they turn electricity into cooling, measured by CSPF (the seasonal efficiency rating on the energy label).
room_demand_kwh = reference_capacity * load_factor * hours_per_year
electricity_kwh = room_demand_kwh / cspf
tco = purchase_price + electricity_kwh * tariff * years
A bigger BTU unit doesn't cool more. It reaches the target and cycles off sooner. So I compute the room's annual thermal demand once, then divide by each machine's CSPF. Higher CSPF just means buying the same comfort with less power.
The inputs are my actual schedule, because the whole point was to answer my own buying question:
- Weekday: 14 hours (evening work around 6pm through 8am)
- Weekend: 16 hours (sleep in to noon, run later)
- Weighted average: about 14.6 hours/day, 330 days/year
- 50% load factor (inverter at part load)
- EVN top tier tariff: 3,500d/kWh including VAT
That works out to about 6,342 kWh of heat per year. Under this load, one point of CSPF is worth roughly 5.5 million dong over five years. Most price gaps between budget ACs are smaller than that.
Why 5 years, not 10
Air conditioners are "supposed to" last 10 years. Compressor warranties run 10 to 12.
I report the 10 year number. I don't rank on it.
At 14+ hours a day in a hot humid climate, budget units realistically last 6 to 8 years. Plenty die at 6 or 7. People also move, upgrade, or just get tired of the thing before it dies. A 10 year horizon credits cheap high CSPF units with electricity savings they probably won't live to deliver.
5 year TCO is the sort key. 10 year is a reference column. Don't let a round number quietly pick a winner that depends on a lifespan the machine won't reach.
Don't trust the efficiency label either
CSPF drives the whole ranking, and it's the shakiest number in the dataset. Scraped from one site, sometimes missing, sometimes wrong.
I run it through a trust ladder, always biased toward understating efficiency. If the manufacturer's own site contradicts the retailer, the manufacturer wins (checked by hand). If both retailers agree, I use their number. If they disagree, I take the lower value. If nobody published a label at all, I fall back to the dataset median, about 5.27.
That last fallback is where I got burned. The Nagakawa NIS-C09R2 had no energy label on the retailer site, so the pipeline imputed the median: 5.28. The manufacturer's own site lists the matching SKU at 4.51. The median had given a bottom tier machine an above average efficiency it doesn't have. It would've climbed the ranking on a number I made up.
So I added a manufacturer verification pass. A hand-built override file with a source URL for every correction, keyed so it survives a full re-scrape.
The price is a snapshot
The scraped price is the displayed price at one moment. Some pages carry a "10% online discount" banner the pipeline doesn't capture. Sale prices drift; spot checks ran about 1 million higher than the snapshot.
I don't model either. They're conditional and volatile. When the ranking looks off near a price boundary, I just re-scrape.
So what did I buy
I'm not going to name the model, but here's the logic.
The pipeline gave me the top 20 sorted by real cost. That narrowed the field. Then I looked at things the pipeline doesn't capture. Warranty: five years from one brand vs two or three from others, that matters. Where it's manufactured, because China-made units are known for cutting costs on production quality, so I mentally ranked those lower. And the deal I got on the unit, which tilted the math more than any CSPF difference would.
The spreadsheet didn't pick for me. It just killed the bad options. If a friend asks me which AC to get, I'm sending them the sheet, not a recommendation. Sort it yourself, check the warranty, check where it's made, factor in whatever deal you find. The pipeline stops the sticker price from deciding for you. The rest is yours.