Skip to main content

Command Palette

Search for a command to run...

Decoding AI Jargons

AI Vocabulary Made Simple: A Guide to Tokenization, Embeddings, and Attention

Updated
11 min read
Decoding AI Jargons

Ever feel like tech talk is a foreign language? You're not alone! With every shiny new technology comes a dictionary of jargon that can make your head spin. Fear not, this blog is here to translate tech-speak into plain English, with a dash of humor to keep things fun. Let's dive into the world of tech without the headache!

Transformers

In the realm of generative AI, "transformers" are a type of model architecture that has revolutionized natural language processing. Much like the Autobots and Decepticons from the Transformers universe, these models are powerful and versatile. They can transform input data into meaningful output, whether it's translating languages, generating text, or even creating images.

Transformers work by using a mechanism called "attention," which allows them to focus on different parts of the input data when producing an output. This is akin to how Autobots and Decepticons can change their forms to adapt to different situations. The attention mechanism helps the model understand context and relationships within the data, making it highly effective for tasks that require understanding complex patterns and sequences.

GPT

GPT stands for generative pre-trained transformer. Here's a simpler breakdown:

  • "Generative" means it can create new text (or other forms) based on the input it receives.

  • "Pre-trained" means it has already learned a lot about language from a big collection of text before being used for specific tasks.Since it is pre-trained it will have a cut-off date

  • "Transformer" is the type of model it uses, which helps it understand and generate text efficiently.

Together, these parts make GPT good at understanding and creating human-like text.

Encoder and Decoder

Transformers don't naturally understand English; they interpret data as numbers. To do this, we need to convert our text into numbers, a process called encoding. Encoding changes the input text into a numerical form that the transformer can work with.

After encoding, we also need a way to change it back to its original form. This reverse process, called decoding, lets the transformer reconstruct the initial input from the numbers, ensuring the information is accurately understood and used.

Think of it like a spy sending a message. They can't send it in plain English because they might get caught, so they encode the message. Even if someone else reads it, they won't understand it. At the receiving end, the message is converted back to normal English through decoding. Similarly, GPT understands numbers, so we encode our input into numbers, and GPT converts it back to the original format using decoding.

You can visualise this on tiktokenizer link.

I have created my own encoding and decoding program and you can access it on this link.

Here is a quick demo

Vector Embedding

Imagine every word in a sentence is like an item in a big super mart. Now, you want to group similar items together — all the fruits in one rack, all spices in another, and snacks in one corner. That’s exactly what vector embeddings do. They turn each word into a number-based location — kind of like giving it a spot on Google Maps. Words that are similar (like king and queen, or pani puri and chaat) are placed closer on this map. So when an AI model “looks” at words, it doesn’t just see random text — it sees them like neatly arranged grocery aisles. This helps it figure out meaning, tone, and relationships — just like you know that aloo and bhindi go in the same sabzi section.

Thanks to vector embeddings, AI can:

  • Understand what you’re talking about

  • Reply more naturally

  • Group similar things together (like in recommendation systems)

Want to see how this works visually? Check this out — it’s super cool!

Postional Encoding

Imagine you’re at a wedding buffet. You see roti, then dal, then paneer, then gulab jamun. Now imagine if someone randomly shuffled it to gulab jamun, dal, paneer, roti. Ufff. Total confusion, right?

That’s where positional encoding comes in for AI.

LLMs (like ChatGPT) don’t naturally understand word order. They just see a bunch of words. So we have to tell the model where each word is in the sentence — like giving it a seat number at a shaadi.

For example:

  • Ram ate mango” → makes sense.

  • Mango ate Ram” → now the mango is a cannibal?

So we assign positional values to each word to maintain meaning. Just like you know starters come before main course and dessert comes last, the AI understands sentence flow better when we tag word positions.It’s like adding a “row number” to your Excel sheet — the data’s the same, but now it’s organized. Without positional encoding, an AI model would treat “I love biryani” and “Biryani loves I” as the same thing.

Semantic Meaning

Let’s say you’re watching an India vs Pakistan match. Kohli walks out to bat, the stadium is roaring, expectations are sky-high. But then — he gets out on the very first ball. The next day, someone says, “Kohli broke millions of hearts yesterday.” Now if an AI model takes that sentence literally, it might think Kohli was involved in some emotional drama. But as an Indian cricket fan, you understand the real meaning — he got out early, and fans were hugely disappointed. That’s what semantic meaning is all about: understanding the real intent behind words based on context — not just what the words say, but what they actually mean in that situation.

This is a big deal in AI. To truly “understand” language, models need to figure out these hidden meanings. For example, the sentence “the crowd went silent” could mean people are bored, shocked, or deeply emotional — and the model needs to infer the right reason based on the overall context.

AI models learn this by studying how words appear together in huge amounts of text. They start seeing patterns — like “Kohli,” “match,” “out,” and “fans” often appearing together in emotional contexts. This helps the model connect the dots and guess the intended meaning.

Self Attention

Imagine you’re reading an Indian newspaper headline:

“Modi met Shah at Delhi airport after the G20 summit.”

Now, as a human, you naturally understand who did what, where, and when. You know that “Modi” and “Shah” are people, “met” is the action, “Delhi airport” is the place, and “G20 summit” gives you context. But for an AI model, it needs to pay attention to the right words at the right time. This is where Self-Attention comes in — it helps the model focus on important words in a sentence, depending on what it’s trying to understand or generate.

In AI, when a model reads a word like “met,” it looks at all the other words in the sentence and decides which ones are relevant. For “met,” the model might pay more attention to “Modi” and “Shah” to figure out who met whom.

Self-attention assigns weights to each word — higher weights to more relevant ones. That’s how the model keeps track of relationships in a sentence, no matter how long it is.

Multi Head Attention

Multi-Head Attention is like watching an IPL match with a group of friends—everyone’s watching the same game, but each person is noticing something different. One is focused on Kohli’s stance, another is watching the bowler’s grip, someone’s tracking the field setup, and someone else is reacting to the crowd. That’s exactly what multi-head attention does: it lets a transformer model look at the same sentence from different angles, all at once.

Each “head” in this mechanism acts like a specialist. One might focus on the relationship between subjects and verbs, another on sentiment, and yet another on word positions. These heads process the sentence in parallel, and their individual insights are later combined, giving the model a richer, more detailed understanding of the input. It’s like assembling different pieces of gossip from your cousins to figure out what really happened at the wedding.

For instance, in the sentence “Virat hit the ball and the crowd went wild,” one head might link “Virat” to “hit,” another might tie “ball” to “crowd,” and a third might interpret the emotional shift. This multi-perspective view makes language models incredibly good at picking up nuances—whether it’s translating Tamil to English or understanding sarcastic memes on Indian Twitter.

Softmax

Imagine you and your friends are choosing where to eat. You list your top 3 picks: pizza, biryani, and dosa. Everyone gives a score out of 10 for each option. Instead of just picking the one with the highest score, you want to see how likely each place is to be chosen based on everyone's preferences. This is where Softmax helps.

In machine learning, Softmax takes a bunch of raw scores (called logits) — which might be any number, positive or negative — and converts them into probabilities that are easy to compare. It does this by making the biggest numbers stand out even more and squashing the rest proportionally. So if biryani gets a score of 10, pizza gets 7, and dosa gets 2, Softmax will say something like: “Biryani has an 80% chance, pizza 18%, and dosa 2%.”

It’s like your brain thinking, “I’m mostly in the mood for biryani, but I could settle for pizza. Dosa? Not today.” This helps AI models make decisions confidently by understanding how strong each option is relative to the rest. Softmax doesn’t just look at the best option — it compares all of them to make a well-balanced decision.

Temperature

Imagine you’re using an AI chatbot to generate pickup lines (no judgment). Now, if you set the temperature to 0, the model plays it super safe — it’ll always choose the most predictable, bland line like, “Are you a library book? Because I want to check you out.” Meh. Reliable? Yes. Spicy? Not at all.

Now crank the temperature up to 1.0 or higher, and suddenly the AI gets a little wild. It might say something like, “Are you made of copper and tellurium? Because you’re Cu-Te, but also highly reactive in my neural circuits.” Riskier, funnier, and more creative — but it could also backfire and say something totally off-beat. That’s the trade-off.

In short: Temperature controls randomness. A low temperature makes the model stick to safe, high-confidence responses. A high temperature lets it explore less likely, but more diverse or creative outputs. Think of it like masala levels in your food — too low and it’s bland, too high and it might just burn your brain. The sweet spot depends on the vibe you’re going for.

Softmax vs Temperature

Imagine you’re at a chaat stall in Delhi. The vendor asks, “What would you like—Pani Puri, Dahi Puri, or Aloo Tikki?” Your cravings are like logits (raw scores)—you’re leaning more toward Pani Puri, but the other options are still tempting. Now, Softmax is like your brain calculating the odds: “Hmm, I love Pani Puri (60%), but I’m also in the mood for Aloo Tikki (30%), and maybe just a little Dahi Puri (10%).” Softmax takes your raw cravings and turns them into neat, sharable probabilities.

But what if your mood isn’t stable? That’s where Temperature comes in. Let’s say your hunger level is low (temperature = 0.3). You’ll go straight for Pani Puri, no second thoughts—zero randomness. But if you’re in an adventurous mood (temperature = 1.5), you might say, “Why not try Dahi Puri today, even if it wasn’t the top pick?” With higher temperature, you’re more open to exploring new tastes. Lower temperature? You’re sticking to tried-and-tested.

So in AI, Softmax is the logic that turns model guesses into probabilities. Temperature is the spice meter—it decides whether the model sticks to its top guess or experiments with something offbeat. Together, they help models balance between sensible and surprising, just like we do at a food stall!

Knowledge cutoff

A knowledge cutoff is like a student who stopped studying the news after a certain date—let’s say they last read the newspaper in December 2023. So, if you ask them about the 2024 Lok Sabha elections or who won IPL 2025, they’ll blank out. Similarly, AI models like me are trained on a snapshot of the internet up to a certain point (for me, it’s June 2024). Anything that happened after that—new tech releases, policy changes, cricket wins, viral memes—I won’t know unless I’m updated. So, when talking to an AI, always remember: it’s not ignoring you, it’s just stuck in the past like that friend who still thinks TikTok is banned in India.

Conclusion: AI Doesn’t Have to Be Rocket Science

If you’ve made it this far—congrats! You’ve just unpacked some of the trickiest AI jargons out there, and hopefully, they now feel more like familiar roadside dhabas than alien tech towers.

From tokenization slicing up text like your mom chops veggies for sabzi, to vector embeddings helping AI understand context like how we know “Kohli” means cricket and not chemistry, you’ve started peeling back the layers of how machines process language. Positional encoding showed us that order matters—just like your roll number did in school. Softmax and temperature taught us how models decide what to say and how creatively to say it, while self-attention and multi-head attention proved that models aren’t just looking at words—they’re analyzing the full picture, much like how you scan your friends’ faces before cracking a joke.

And finally, knowledge cutoff? That’s just the AI saying, “Bro, I stopped reading the news after June 2024.”

This is just the beginning. The more we break down these buzzwords, the more power we gain to understand—and maybe even shape—the AI shaping our world. Stay curious, ask dumb questions (they’re usually the best ones), and remember: you don’t need a PhD to decode AI. Sometimes, all it takes is a little masala and a lot of curiosity. 🚀

Gen AI

Part 1 of 1

I 'll cover around the gen ai topics