Hashing vs Encoding vs Encrypting vs Signing

On the surface, everyone knows the difference between hashing, encoding, and encrypting. We all know the tradeoffs between them and when to use each.. or do we?

Whether I’m working with more junior developers, people experimenting with vibe coding, or even people who should know better, I’ve noticed that too many people mix them up or use the terms interchangeably, even if they use the right one in practice. So let’s dive in and make sure we understand both what Encoding, Hashing, and Encrypting are and when (not) to use each.

What is Encoding?

The purpose of encoding is the transform data from one format or character set to another, usually to move it between systems or process it more effectively. In web development, base64 encoding is the most common example because you can transform just about any type of data – including audio, images, etc – into plain text which is easy to transmit over the web and decode on the other side.

Unfortunately, encoding does not protect the message – or the systems that use it – from snooping systems or tampering. In other words, it is not designed to gaurantee message integrity. In this sense, “integrity” means that not only has a message NOT been tampered with but we can prove it.

In short, encoding gives you a bunch of data that you can move easily but anyone can read, anyone can modify, and you’ll never know if either happened.

What is Hashing?

Hashing is an irreversible process where you can’t reconstruct the input from the output. Hashing is primarily used when you want to store a representation of sensitive data, not the data itself. The most common scenario is password storage. You should NEVER store the plain text password but instead put the password through a strong hashing algorithm and store the result. If your system is breached, the attacker can’t get the password but a user can input their password, you can apply the same algorithm and inputs, and compare that result against the stored value. If they’re the same, the user proceeds.

Beyond protecting the original input, hashing has the benefit where tampering with the hash (or the input) makes the result not match. Even better, if you ONLY send a hash, snooping isn’t useful.

To put it another way, hashing ensures you can pass around a unique representation of data without putting the data itself at risk through snooping or tampering.

What is Encrypting?

Encrypting is the process of transforming data so that it cannot be snooped or tampered with BUT can be recovered later. In many cases, this uses a public-private key pair. By design, anyone can use the public key to encrypt the data but then only those who have the private key can decrypt it.

What is Signing?

And finally, we have a Signing is effectively a special combo-case of Hashing and Encrypting.

Signing is used similarly to Hashing to confirm the underlying data has not been modified. Signing is unique in that it goes a step further allowing you to confirm the source of the message. You can do this because of how it seems similar to Encrypting.

Just like Encrypting, Signing uses a public-private key pair but it works in reverse. In this case, the message creator the private key to sign the message generating a hash. Now anyone who has the public key and the message can generate their own hash to prove a) the message is unmodified and b) it originated from the owner of the private key.

Hashing vs Encoding vs Encrypting vs Signing

So which one should you use? The good news is that the use cases for each do not overlap so you’d never have to decide between them.

Good forBad for
EncodingMoving data between systems, storage formats, or mediumDealing with sensitive data
HashingCreating a unique, repeatable representation of dataRetrieving the original data
EncryptingProtecting sensitive dataSharing data
SigningConfirming data is unchanged;
Proving the source of data
(all of the above)

Hashing, Encoding, Encrypting, and Signing in Practice

One of the great things about these three concepts is that they are much more complementary than competitive. Let’s say you have a bunch of important data you want to pass from one system over an insecure channel.

A great example would be using a JSON Web Token (JWT) as an OAuth access token. For a deep dive, check out my LinkedIn Learning course on OAuth and OpenID Connect but I’ll cover the core bits here.

Note: An OAuth access token does not have to be a JWT, it’s just a common format. From here on, I will assume your access token is a JWT.

Before we choose which concepts to use, let’s make sure we know the use case and requirements:

  • A JWT access token is passed from the Authorization Server to the requesting Application for use on the Resource Server. Optionally, it may pass through the user’s browswer.
  • The contents of this JWT are critical as they describe the Authorization Server, the permissions granted, and potentially information about the user.
  • It’s uncommon but some contents of the JWT could be sensitive to share.

If you’ve read this far, the answer to “which concepts” should be clear.. all four.

First, we need to ensure the token can be passed from system to system through different medium and still be retrieved accurately on the other end. Therefore, we need Encoding. For JWTs, this will be base64 which moves smoothly over HTTP, can be stored as a cookie in a browser, and easily stored in a database.

Next, since we’ll make decisions based on the contents of the JWT, we need to ensure its integrity (aka it’s unmodified). Therefore, it looks like Hashing but that’s not good enough. We need to make sure the JWT came from a trusted source. Therefore, we actually need Signing.

Finally, if there are sensitive contents in the JWT, we will need Encrypting. This isn’t common or widely supported but it is a real thing.

In practice, when the server creates the JWT, it generates it in three pieces:

  • First, the server creates a header that describes both the structure of the token and how the signature should be calculated. Then this is base64 encoded.
  • Next, the server creates the contents of the token. This is a simple JSON structure which gets base64 encoded.
    • Optionally, there may be encrypted value pairs which would be created using the requesting application’s public key. This ensures the value can only be decrypted by the application’s private key and not random users or apps.
  • Finally, the server generates a signuature of the encoded header and the contents using its private key.

Then the server concatenates the three pieces using a periods as the separator.

When the requesting application receives the JWT, it decodes the header to understand how to calculate the signature. It uses the server’s public key to sign the message and compares it to the incldued signature. If the signature matches, the application confirms the token is both unmodified with and definitively from the server. Now you can use it.

Hashing vs Encoding vs Encrypting vs Signing

At the end of this, we need to realize three key aspects:

  • All four are useful;
  • their use cases are unique; and
  • they’re not interchangeable.

If you ever find yourself evaluating “which one should I use?” and the answer isn’t immediate and clear, stop. You’ve missed a piece of the use case and need to re-evaluate your needs.

Leave a Reply

Your email address will not be published. Required fields are marked *