The Electronic Police State

May 23, 2009

A USA-based company called Cryptohippie has recently published a report on The Electronic Police State. According to Cryptohippie,

An electronic police state is characterized by this:
State use of electronic technologies to record, organize,
search and distribute forensic evidence against its
citizens.

The report ranks 52 countries based on 17 factors, such as financial tracking, ISP data retention, data storage ability, cell phone records and so on. Based on ranking, nations were classified to red (most advanced electronic police state), orange (strongly developing), yellow (lagging but developing) and green (going towards electronic state police, but not as quickly). Israel got place 8, and belongs to the “red” club.

image

While the report offers a link to the underlying raw data, it’s just an excel file with the 17 factors ranking. No further explanation is provided about where these numbers came from, which is quite disappointing.

PR stories aside, what does get me a bit worried is the fast paced work on the new national biometric database. Given the privacy protection failures mentioned in the 2009 Comptroller’s Report, I do wonder where are we headed with this…

Advertisements

Start panic? not really…

May 10, 2009

image

A coworker forwarded me a link to a site called Start Panic. If your browser supports jscript, it will show you recent sites you visited, or as one talkback phrased it:

So, what, this just prints a list of all the greatest porn sites on the web?

The site goes on to offer signing on a petition to patch privacy vulnerabilities (about 3000 visitors already signed it last time I looked), but that’s not really going to make any difference. You can always turn off scripts in the browser if you want to prevent this kind of trouble (though you might lose functionality in some sites). If the site owners were really interested in privacy issues (rather than the Google ads in their front page), they could have provided some useful info instead of that meaningless petition.

I didn’t look into the jscript code the site uses to grab the history, but others did. Basically, it uses a small database of URLs and checks the color coding the browser uses to distinguish between new links and visited links to see where your browser has landed before. This neat trick is compatible with all common browsers, of course. I guess usability and privacy/security will almost always clash…

Medical data held for ransom

May 5, 2009

 Wow:

An online thief compromised the network of the Commonwealth of Virginia’s Department of Health Professions, allegedly stealing healthcare data on nearly 8.3 million patients, according to reports.

The online attacker demanded $10 million for the data, according to both sources.

Reports: Thief holds Virginia medical data ransom

Implications of Privacy on Data Cleansing

April 27, 2009

Earlier this month, LA Times reported about online maps made available by LAPD, showing crime locations incorrectly:

The distortion — which the LAPD was not aware of until alerted by The Times — illustrates pitfalls in the growing number of products that depend on a computer process known as geocoding to convert written addresses into points on electronic maps.
In this instance, www.lapdcrimemaps.org is offered to the public as a way to track crimes near specific addresses in the city of Los Angeles. Most of the time that process worked fine. But when it failed, crimes were often shown miles from where they actually occurred.
Unable to parse the intersection of Paloma Street and Adams Boulevard, for instance, the computer used a default point for Los Angeles, roughly 1st and Spring streets.

Highest crime rate in L.A.? No, just an LAPD map glitch – Los Angeles Times

Data collection is often a process prone to errors and omissions. Some values may be missing, because they weren’t fed into the system in the first place. Some kinds of data, such as addresses, may appear in different forms because of spelling errors. Reports written by hand, like the ones lying at the source of crime reports, may be hard to decipher. On top of it, further processing, like the geocoding mentioned above, may introduce additional errors.

Data analyzers usually take this kind of errors into account. Quite often, the first step in analysis is data cleansing, which aims to find these errors and correct them (or at least remove them). When the data is available in raw form, data cleansing is easier – data points can be evaluated separately for errors. LA Times reporters could identify errors because they looked at specific addresses and could see that they make no sense.

It is not always practical to rely on the data provider to take care of data cleansing. For example, in this case – sure, LAPD will take care of the errors found by LA Times, but

A spokesman for the LAPD said the department will add a disclaimer to its site once it’s cleared by the city attorney.

It seems like the responsibility to check the data is laid on the data consumer. In the case described above, anyone who consumes the data can judge the quality of data and apply cleansing, since the data is available in its raw form.

Now let’s imagine a similar case, but with a little twist: what would happen when sharing data in a restricted way, due to privacy concerns? Many research papers on privacy study how data can be used to learn patterns or aggregates, while keeping individual records masked. Various approaches were proposed, including randomizing the input before processing, adding noise to the outcome of the processing, “Blurring” data by generalizing values so they are not too specific, and so on. All these approaches have at least one thing in common: there is no longer access to the raw data. In turn, this may impact the ability of the data consumer to cleanse the data.

Privacy research is still in the process of figuring out how various computations can take place on top of sensitive data. However, the output of each process is only as good as the input. Without the ability to identify bogus records, any analysis could lead to false conclusions. The problem here is a little different than that of privacy preserving aggregate calculation: it is unclear whether it will be possible to tell the difference between a unique but correct record (a legitimate outlier) and a bogus record, without scrutinizing the fine details.

Privacy and Publicness

March 26, 2009

Ironically, the first post on this blog about privacy, is actually about quite the opposite, publicness.

I have been thinking for quite a while now on writing a blog, but for some reason I never really got to it. Reading the post by Jeff Jarvis blogging the privacy panel at SXSW finally made the difference:

Publicness is about more than having a web site. It’s about taking actions in public so people can see what you do and react to it, make suggestions, and tell their friends. Living in public today is a matter of enlightened self-interest. You have to be public to be found. Every time you decide not to make something public, you create the risk of a customer not finding you or not trusting you because you’re keeping secrets. Publicness is also an ethic. The more public you are, the easier you can be found, the more opportunities you have.

The ethics and expectations of privacy have changed radically in Generation G. People my age and older fret at all the information young people make public about themselves. I try to explain that this sharing of personal information is a social act. It forms the basis of the connections Google makes possible. When we reveal something of ourselves publicly, we have tagged ourselves in such a way that we can be searched and found under that description. As I said in the chapter on health, I now can be found in a search for my heart condition, afib. That is how others came to me and how we shared information. Publicness brings me personal benefits that outweigh the risks.

SXSW: Privacy (and publicness) « BuzzMachine