The Office of the Victorian Information Commissioner issued a media release here about how the release of myki data by Public Transport Victoria (PTV), which is now part of the Department of Transport, breached people’s privacy.
I read the report on the investigation and tweeted about it, which I’ve archived below with some light editing. The quotes are from the text of the report, which I encourage you to read yourself.
Let’s dive in, shall we?
Department Doesn’t Agree With Facts and Expert Findings
“The Department of Transport does not accept the Commissioner’s finding that the release of the myki dataset breached myki users’ privacy.”
The Department of Transport is wrong, and clearly cannot be trusted to keep our data safe.
You were shown proof that you were wrong, OVIC investigated and found you were wrong. To deny this reality in the face of all this evidence means you cannot be trusted. When will people understand that?
The correct response is to accept that you got it wrong, apologise, and clearly explain what you’ll do differently from now on. Why is that so hard?
“If the Organisation does not comply with this compliance notice, the penalty is: 3000 penalty units, in the case of a body corporate.”
At current penalty unit rate of $165.22, that’s $495,660.
“the dataset identified myki card types. The participant raised a query about the potential identification of state and federal police and politicians in the data.”
“They downloaded the data from an open Amazon Web Services ‘S3’ “bucket” linked to the Datathon’s public facing website.”
Funny how open S3 buckets are a vector for so many data breaches, innit?
The co-traveller identification is awesome. Follow someone on a tram once, and then find out all their other travel! Ideal for finding out where a cop lives. Or for abusive stalkers looking for their ex.
Or kids, btw, because of the children’s card type, and noting when they get on and off going to school.
Get more red teamers involved. Design for misuse cases, not just use cases.
“PTV and DPC’s response team found [that] ‘it was not possible using the myki dataset alone to positively identify specific travellers and their prior travel movements. Supplementary information is required before positive identification can be made.’”
Happily the world is completely devoid of any information other than what is in a dataset you release.
Your opponent doesn’t have to play by your rules. Evil wins because good is dumb.
“Victoria Police conducted a risk assessment and reported there was a ‘low/minimal’ risk to the safety of police and parliamentarians as a result of the data release.”
Uhuh. Is that because none of them use public transport?
Let Us Celebrate Those Who Defend Privacy
Kudos to the nameless participant in the Datathon for raising the alarm, btw! I will gladly buy you a beverage and/or send you some stickers if you would like to make yourself known to me.
[I think I already gave @VTeagueAus some stickers. @bipr and @chrisculnane are also welcome to have some when the new batch arrives.]
Chris Culnane, Benjamin Rubinstein, and Vanessa Teague wrote a report on their de-identification work as well. It is called, hilariously, Stop the Open Data Bus, We Want to Get Off [PDF].
“[PTV and DPC]’s response to the investigation indicated an openness and willingness to respond constructively to the Deputy Commissioner’s concerns.”
Good! How you respond to mistakes is a vital part of getting better and demonstrating you are trustworthy!
TIL: The definition of personal information in Victoria comes from the 1983 report of the Law Reform Commission. Report No. 22, AGPS Canberra, 1983, Vol 2, which was adopted by the Privacy Act 1988, which is where Victoria got its definition for the Vic PDP Act.
“To determine whether a piece of information is ‘personal information’, it must be considered in context and on a case-by-case basis.”
PTV is Stubborn, And Also Wrong
“It is clear that before releasing the dataset PTV had considered this issue and decided the dataset did not contain personal information. PTV maintains the dataset does not contain personal information.”
PTV is wrong.
And this is why OVIC has issued a compliance notice. PTV was unrepentant about its actions.
PTV made submissions to OVIC, which are summarised at - in the report:
PTV does not consider the data extract is personal information as defined in the [PDP Act]. PTV’s view is that there has been no breach or contravention of the Information Privacy Principles (IPPs) as result disclosing the data extract to the Datathon. This is based on our interpretation of the definition of personal information which [is] key to the establishment of the sensitivity of the data and therefore its impact if a breach did occur. The data extract disclosed for the Datathon contained no personal identifiable information. The ability to identify an individual [rests] with the relationship between the card number and the myki account for that individual. The data extract disclosed to the Datathon substituted each card number with randomly generated card numbers which anonymised the individuals. This was undertaken by PTV prior to the disclosure. As it is not possible to identify individuals through the card numbers disclosed to the [Datathon] this significantly lowers the sensitivity of the data disclosed.
OVIC is pretty clearly unimpressed with this stance.
PTV provided several additional written submissions about whether the information was personal information. The Deputy Commissioner carefully considered each submission made by PTV. The above extract provides a good overview of PTV’s position, and it is not necessary to set out PTV’s submissions in full.
“Data61 was engaged to analyse and describe the dataset, and provide an expert opinion about re-identification risks to the dataset. Data61’s opinion is that the overall risk of re-identification for the dataset is ‘extremely high’.”
If PTV had been more willing to admit its error and correct its ways, things may not have gotten this pointy.
The lesson for other orgs is clear: take this more seriously from the outset, but if you do screw up, admit fault quickly and embrace the assistance of those trying to help you be better at this.
Don’t get all huffy and pretend nothing is wrong.
“The Deputy Commissioner considered it would be too great a risk to raise awareness of that possible re-identification method while the current six-month window overlapped with the date range of the published dataset.”
A challenging call here by OVIC, but I think the correct one in the circumstances.
Begone Thou Literal and Technical Approach!
“In the Deputy Commissioner’s view, the approach suggested by PTV was a literal and technical approach that has been warned against by authorities in discussing the definition of personal information.”
This “literal and technical approach” is far too common in organisations of all kinds, but especially with government agencies. We have seen this time and time again. It must stop.
“The evidence […] suggests the identities of individuals can be extracted from the dataset with relative ease. PTV has provided no persuasive evidence to the contrary, and has instead relied on technical arguments about the definition of personal information.”
The tide is shifting. If orgs don’t move away from self-serving semantic arguments about whether data is private or not, they will be forced to. The public are sick of their privacy being abused like this and are starting to fight back.
The response from regulators (the ACCC a couple of weeks back, OVIC here) reflects that shift in public opinion.
At this point I had to wander off to do other things, but we’d covered the heart of things.
Stop releasing data without consulting people who know what they’re doing about the risks, and avoid unit-level data completely until you’ve built up capability in doing this kind of thing safely a few times first.
This is, in many way, going straight to human trials without doing lab tests first. Stop experimenting on people without their explicit, informed, ongoing consent!