Digital identity’s Pogo problem

T.Rob
CodeX
Published in
12 min readAug 4, 2021

--

Pogo famously said “We have met the enemy and he is us.” I was reminded of that today reading Tony Fish’s post #Identity. Are we (the industry) the problem? Maybe.

Plush giraffe silhouetted against a monitor displaying a blank spreadsheet.
Image by the author

Tony asks “How come, as an industry where 3.2bn people have a digital identity, are we so fragmented, uncoordinated and disagreeable?” In my experience identity itself tends to be fragmented, uncoordinated and disagreeable so I am unsurprised that in modeling it digitally we arrive at the same condition.

The singular problem with digital identity

If we start with a digital authentication requirement and work forward, the goal is to resolve a credential to a single, unique, specific person. The word identity is itself a singular noun, the plural being “identities.” So, when we talk about digital identity, this concept of singularity is baked in, beginning with our name for the concept. And why not? Does not the foundation of identity as a concept arise from “I think, therefore I am”? If the saying were “we think therefore we are” it would sound more like the Borg Collective.

In the industry the term “principal” refers to an entity requesting access to a resource. When I knock on your door, I am the principal and the access I am requesting is that which lies behind the door — access to enter your home perhaps, or the privilege to speak with you in the doorway. When I log into a web site, I am the principal and the service offered by the web site is the resource to which I want to gain access. If that web site is my bank the service only works if the credential that I present does not resolve to multiple people.

Principal is singular. Principal-to-person is 1:1.

It gets messier if we start with the person and work backward to all the resources that person needs to access, whether that’s a physical thing like a door or a virtual thing like a web site. From the inside looking out, we each have a multitude of identities. Our concept of identity is singular but our practice of it is plural.

Identity is plural. Person-to-identity is 1:many.

Though this may seem counterintuitive, it is in fact the prevailing practice. For example, anyone whose been using the Internet for longer than a minute has been told not to reuse passwords across web sites. From the perspective of any one web site your identity is your ID and password. From the user’s perspective, identity is an ID/password combination in the context of the one specific web site for which these are valid. Do you follow the best practice of using different passwords for different web sites? Congratulations! You are safer because you have many identities to represent you, with each applying in a narrow context. Intuitively we think of the identifier as the identity and that using the same identifier (i.e. email address) across systems is the same as having a single identity. From the system perspective, it is not. The combination of identifier and credential is the proxy for an identity.

Identity as practiced by the user is always plural. Identity as implemented for authentication is always singular. Identity as a topic of discussion rarely integrates the two successfully and that’s a singular problem.

The validation threshold problem with digital identity

When I applied for TSA Precheck I was asked to submit ahead of time various government-issued identity documents, then to physically appear with the original documents and sit for an interview. Prior to the interview, background and criminal records checks were performed. The TSA imposes that level of verification based on the risk and the traveler accepts the initial verification burden because of the benefit they receive. But the local public library would not require that level of verification since their risk is so much less, nor would they have any registered members if they imposed such a regimen. The rigor of the initial identity verification and that of the ongoing login authentication must be proportional to the risk and benefit of the protected resource.

This tiered authentication has been in practice since well before the digital era. The goal of system design was never to devise a totally secure credential, but rather that the cost to counterfeit a particular credential exceeded any benefit that might be gained. This is why government-issued college ID, driver license, and passport employ increasingly complex anti-counterfeiting measures.

This tiered model moved online with digital identity, insofar as each web site decided how much identity verification was required for their specific risk and threat model. Some sites require only registration. Typically, these are sites in which real world identity is less important than reputation built up over time, such as community forums. Other web sites require verification of a unique physical identity, based on submission of various high-value credentials such as driver license or passport.

The Internet hasn’t altered this model so much as expanded it. Many web sites let you sign in with Google, Facebook, PayPal, Amazon or some other top-tier vendor. Users first verify their real-world identity with one of these vendors, typically using some government issued ID, then the vendor acts as a trusted proxy for the real-world identity. Because these vendors provide integrations into popular content management and commerce packages at near-zero cost, the floor for basic authentication is considerably raised. A random forum that doesn’t take payments can accept digital identities verified for commerce because it’s cheaper and less risky than locally managed accounts.

The validation threshold problem is that we cannot escape the cost/benefit/risk calculus of authentication system design. We implement the least cost thing that meets the minimum requirement threshold of our risk model. Always. As long as users (correctly!) perceive their banking or email account to be more valuable/risky than, say, their blog reader account the validation threshold will remain tiered. The ability to “Sign in with [vendor name here]” can roll the lowest tiers up a level or two but can’t eliminate the tiers entirely.

Tiered validation thresholds are a feature according to this analysis. When we treat them as a bug to be eliminated rather than a feature to be enhanced digital identity is done a disservice.

The tunnel vision problem with digital identity

Much of the fragmentation, incoordination, and disagreeableness Tony laments in his post is due to the tension between the user-centric and the system-centric identity models. Recall that authentication resolves to a single principal whereas the user has many facets of identity, depending on the context. If we define fragmentation and incoordination as problems of digital identity, then we need to solve these in ways that do not infringe on the aspects users rely on.

Users who pick the “Sign in with” option increase the number of contexts in which a single identity is used. Looked at another way, the number of potential identities they present to the world is diminished and the ongoing incentive is to reduce that number to a value approaching one. If my identities are fly’s-eye facets through which I view the digital world, “Sign in with” induces a progressive sort of tunnel vision.

Because I’m a tin-foil hat type, my experience is a bit different. Like most users, I have many identities that I present to the various vendors and web sites I frequent online. Unlike most people, however, I use email addresses that are unique to each vendor or web site. So when I’m presented the choice to “Sign in with [vendor name here]” I usually pick Option #2 and establish a site-local account. The down side to this is that large swaths of the Internet that I would otherwise use become inaccessible to me.

For example, if I have a smart switch, a sensor, and a smart home hub, and these are from different vendors, then I cannot compose orchestrations involving the hardware devices since each of these vendors has a different email address for me.

The most repugnant aspect of the tunnel vision problem is that it coerces people to adapt their real-world identities to the constraints of the digital identity provider’s field format. The running joke is that young families need to pick a name for their first pet that has 8 or more characters, is a mix of upper and lower case and digits, has special characters, and is not a dictionary word. Clearly that’s a joke but in the real world, the dot in my real name (T.Rob) is usually an illegal character in a name field and innumerable colleagues over the years have told me that they modified their name to eliminate accented characters, shortened their name to fit, left out one or more names because the system does not allow for them, or were given an X for a middle initial because the system insisted they must have something in that field. For professional purposes, the identity as recorded by the system then gets imposed on the person, forcing them to conform in ways ranging from subtle to gross.

The natural experience of identity is that we each have many of them. The system requirement to resolve to a single identity and the industry goal of consolidating identity providers imposes an unnatural experience on the user, as well as an ongoing incentive to consolidate to fewer and fewer identities. The coercive nature of digital identity systems further ensures that as we consolidate identities, the system steers us to its approved version. This is not generally acknowledged as a problem.

The relativity problem with digital identity

I’m talking about the E=MC2 version of relativity here, not the weird uncle who disgraced the family a generation ago. When we talk about digital identity, we leave out the words “over time” but they are implied. The object of online authentication isn’t to know who you are at the instant of login, but rather to verify that you are the same person who registered the account and to provide assurance that you are the only person who uses it. What this fails to account for is that who we are, the character of each of us individually, changes over time.

Using myself as an example, the person I was as a young adult would be unhireable for the work I do today, three decades later. Sometimes the change is more sudden. After her brain tumor was removed, my mother was a completely different person. Sometimes people act out of character because they are coerced. For example, an employee whose bank accounts are hijacked might be coerced into giving up their login credentials to a company system.

A more pedestrian case for this is a user who changes roles or leaves a company. Being a consultant, I routinely change companies. Many of my clients authorize me to work IBM problem tickets on their behalf. It has been the case that my IBM Support entitlements for a particular company outlasted my employment with them, in one case for more than a decade.

We talk about digital identity as if it is a durable proxy for trust, when in fact a person’s identity is naturally ephemeral. When the system denies a person access because of their past character, an opportunity is created to contest this and possibly make allowances. It’s disagreeable but mostly works. But when a trusted person becomes untrustworthy over time and the system continues to allow access, that failure happens silently and is discovered only when the trust is breached.

I’m not suggesting that all of our identity systems need to re-verify the user frequently (though some high value systems should). But I am suggesting that modeling an ephemeral property as if it were durable doesn’t make for a very sound model. Yet this is the prevailing practice.

The bad custodian problem with digital identity

One of our biggest problems with digital identity is that so many custodians of our data are completely untrustworthy and we willfully ignore this so long as we get the functionality we want.

Authentication is based on secrets — something you know (i.e. password or phrase), something you are (biometrics), or something you have (i.e. RSA key fob). If other people have the secret then the authentication is unreliable.

So why is it standard on the Internet for site owners to give away our credential secrets like password, as well as other information like account recovery answers and credit card numbers? More important, why do we choose to ignore that they do this?

To be clear, I’m not talking about web sites that collect your data then sell it. Nor am I referring to those who lose your data to a breach. What I’m talking about are sites that give it away in bulk, for free, in real time as you type it in. These compromise the majority of web sites but you’ve probably never heard about it.

Don’t take my word for it, though. Install NoScript in your browser and go to the account creation page of any web site. At most banks and top tier sites like Google, Amazon and Yahoo you will likely find that the only domains running scripts in the account creation page are the vendor you are visiting. For example, you can expect to find GStatic scripts at most Google sites.

Now go to almost any newspaper web site and you are likely to find anywhere from five to twenty-five 3rd party domains running scripts in the account login page. (News and media sites are some of the worst.) For example, there are ten 3rd party domains running scripts in the account creation page at Orlando Sentinel. Any particular reason visualwebsiteoptimizer.com needs to know your password for Orlando Sentinel? Why?

Screen shot of NoScript dialog for OrlandoSentinel.com shows 12 different domains from 10 different companies running scripts on the account creation page. This allows all of the 3rd parties to sniff the user session, including capturing credentials in real time as they are entered.

Although I said top tier sites generally do this right, it tends to depend on the nature of the site. PornHub is currently ranked 10th worldwide in terms of traffic volume and they let hotjar.com, trafficjunky.com, and a few other 3rd parties snoop on your session as you create your account, and enter your password recovery info. I didn’t create an account so I don’t know if they let 3rd parties snoop on payment pages, but I do know that many sites expose credit card info this way.

A quick tour of favorite web sites with NoScript installed should horrify most people. With few exceptions, web sites and vendors you trust are letting random 3rd parties snoop on your web session while you login in, create an account, recover your account, or enter payment data. If you use a tool like NoScript to prevent these 3rd parties from snooping on your session the web site usually breaks. So not only is this practice of exposing your credentials ubiquitous, but the dependency means you can’t use much of the web unless you allow it to happen.

Believe it or not, this 3rd party snooping isn’t what I am referring to when I talk about “the bad custodian problem with digital identity”. The problem, as I see it, is that such a large vulnerability exists and we collectively don’t care. Princeton researchers published about this in their No Boundaries series back in 2017 but it barely made a blip in the #ITSecurity trades, let alone the mainstream news. My original research on this was in 2018 and despite approaching dozens of reporters I was unable to raise any interest. When I’ve demonstrated the problem to colleagues and friends, the response has generally been that it’s too baked in to do anything about. Nobody wants to hear that the Internet has such a fundamental and ubiquitous flaw with regard to digital identity. This news should have caught fire when it was first published in 2017. We should have had a problem trying to contain misreporting for something this big, not a problem getting it reported in the first place.

What digital identity problem could possibly overshadow the bad custodian problem? Or from the other side, if the bad custodian problem can’t be taken seriously, why should any digital identity problem be taken seriously?

Improving our models for digital identity

The more valuable we make digital identity, the more incentive 3rd parties with snooping access will have to exploit them. Nothing else we do in the digital identity space will be meaningful so long as the huge ecosystem of 3rd parties snooping on our credentials persists. Obviously the first priority should be to eliminate such snooping on credential pages so the secrets on which authentication is built actually stay secret. An authenticated identity is only valuable if it is scarce.

Assuming we can do that, it would help to model digital identity more closely after natural identity. That is to say, the multi-faceted, multi-tiered, ephemeral nature of identity would be a feature and not a bug.

A useful digital identity system will

  • Be as granular as the user prefers, including being a unique identifier for that user within that system or allowing reuse of an identity established by a trusted provider.
  • Provide tiers of validation where cost is appropriate to risk. Consolidate tiers by reducing cost of higher tiers to near zero but recognize that users naturally expect tiered credentialing based on differing risk.
  • Make the digital identity system conform to natural identity rather than coercing natural identity to fewer, more conformant digital identifier values.
  • Digital identity durability is a factor of the validation tiers. Higher risk tiers require more frequent re-validation so the system can deal with the ephemeral nature of natural identity.
  • Users should not be cut off from large swaths of the Internet unless they agree to consolidate their identities.
  • Users should not be cut off from large swaths of the Internet unless they agree to indiscriminately share their login credentials with dozens of 3rd parties.

Although Tony’s blog post precipitated this, my post is not intended to confirm or refute his points directly. Like him, this post summarizes my latest thinking on the subject. His analysis leans functional whereas mine tends toward structural, but there’s a lot of overlap. There’s also a lot I’ve left out in order to keep within the guardrails Tony set forth. With luck some of this will resonate with others but I’m not taking the tin foil hat off just yet.

--

--

T.Rob
CodeX

WebSphere MQ security guy! My Tweets/views are my own. Also blogging at http://ask-an-aspie.net and http://tdotrob.wordpress.com