In today’s episode of the Centrelink #notmydebt tail of fail we will dig into how the data matching process Centrelink uses is broken by design.
The specific part we’re going to look at is the way the business name matching process works. My source information is the Centrelink whistleblower document [PDF] which details the fuzzy name matching algorithm, sources who have worked at Centrelink in the past, and the Department of Human Services itself.
I asked DHS if the algorithm in the whistleblower document is accurate, and it did not deny that it was. It—to be attributed to the ubiquitous spokes-human Department of Human Services General Manager Hank Jongen—said this:
- The online system has an ABN lookup feature where users can select the employer’s ABN if the name is unfamiliar.
- Where employer ABN matching is unsuccessful, the system tries name matching.
- In the absence of a response from the payment recipient, the system defers to employer names provided by the ATO. This was the same process that applied under the manual process.
DHS has been very evasive in answering my highly specific questions, so I had to do some digging to figure out what this vagueness actually means.
Data Input Verification
When you supply details to Centrelink about your fortnightly income, as you are required to do, you fill in who it was received from. But—and this is a big but, and major source of fail—the way you do this has a fatal data hygiene flaw.
You can use an employer’s ABN to add a new employer, but you can also use the name. If you use the name, there’s no further check on the data to ensure it’s correct, which leads to data quality issues that are a major problem.
Anyone who designs systems that deal with data should be aware of the issue of data entry errors. If you’re smart, you try to ensure the highest quality of data at the point of entry as you can, because every time you manipulate the data, you risk adding errors. If your data starts off messy, then it just gets messier later on.
Which is exactly what is happening with Centrelink’s data matching.
If you enter the ABN wrong, such as with a simple typo, then the name that gets shown on the Centrelink input form is likely to be very different to the actual business name you’re expecting. The error becomes obvious, and you’re more likely to correct it, particularly when you need to confirm that all the information is correct on a later screen.
But if you enter the name incorrectly, again, maybe just a typo, then you won’t get an obviously wrong name showing up. You’ll get what you typed in.
It gets worse.
I’m trying to confirm this for the system as it is today, but my understanding is that the reporting system takes whatever you typed in for the business name as correct. [UPDATE: Yes, this is what it does. It really is this broken.] It performs no more checking that the name you entered is a valid business name. It doesn’t display a list of possibly similar names, or show you the ABN of the name you’ve chosen so you can check it against the ABN on any of the company’s paperwork/website/whatever. The system doesn’t enforce having an ABN for this data. A name is all it needs, and if the name is wrong, it provides you with no help in figuring that out.
I have to look up my business’s ABN when I need to put it on a form, and I own the thing. People tend to do whatever is easiest, so a lot of people will just put in the name of their employer and be done with it. Or at least, whatever they think the name of it is.
Can you see where this is heading?
People, humans, are typing these names into a webpage. Humans are really bad at data entry. They make mistakes all the time. I have done many projects that work with data that have their source as a human typing things in somewhere, and it never ceases to amaze me how many different ways people have of typing wrong data in.
Some of the names entered will have typos, as anyone who has used any computing device ever will attest. You may not notice these typos until you hit “submit” and the data is saved. Can you go back and correct data from previous fortnights? I don’t know, and how many people will bother? That means even if you fix the name this time, the record for your last fortnight will still have the incorrect name on it.
There are all sorts of other ways the data can be flawed at entry. You might spell it differently. Is it Sharon, Charon, Sharyn, Sharrynn, or Sha-rinn? Is the name hyphenated, or not? Is it the trading name, of the official registered name?
Why does this all matter so much? Because if you don’t supply an ABN into this income data then the fuzzy matching algorithm kicks in when it does income matching with the ATO. As we’ve established, that’s going to happen a lot because it’s easier to enter the name, and on all the confirmation screens, the name is all you see.
The name matching algorithm itself is mind-bendingly stupid. Go look at it again. I’ll just quickly point out some of the ways it will be wrong:
- If there’s a typo in a one word named business, it won’t match.
- If you use the trading name instead of the official ATO name, it won’t match.
- If you write PivotNine Proprietary Limited, it won’t match PivotNine Pty Ltd because 2 of the 3 words aren’t an exact match.
- If you hyphenate a word, it won’t match. No idea if this counts as one word or two.
And we’ve seen reports from people that when the names don’t match, Centrelink treats them as separate income amounts. That means it double-counts the same set of income because it’s under one name in the Centrelink data, and a different name in the ATO data.
If you had a typo in the name for part of the year and then correct it later, you’ll end up with partial mismatches inside an income year.
As the leaked email makes clear, this is how the system is designed to work. It’s astounding that this is the case, particularly given how easy it would be to make it, if not perfect, vastly less bad.
Automating Bad Makes It Worse
DHS seems to think that “oh, but it’s the same system as it was when it was manual” is somehow a defense. That they even try this is extremely worrying. It implies they know that automating the system creates a significantly different one and hope you don’t notice (i.e. you’re an idiot) or they don’t realise that automating a flawed system might be bad (i.e. they’re idiots).
Why? Because the automation has gone from 20,000 checks a year to over 200,000 in six months. That’s a 2000% increase. 20 times more. If the manual system makes mistakes—and as we’ve seen, it very much does—then you’re now making a lot more mistakes. Automatically.
Automating a bad system makes it worse, not better.
We’re still finding out how long it will take DHS to discover this is true and act accordingly. “Keep digging” is not a good strategy here.