BowTied_Raptor|Data Science & Machine Learning 101 (@bowtied

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

3 years ago

What my Twitter Account is About.

thumb_up_off_alt63

chat_bubble_outline2

repeat6

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

3 years ago

Another Popular SQL Interview Question: What is a Relational Database? Give me an example of a Relational Database Why does SQL matter for Relational Databases?

thumb_up_off_alt44

chat_bubble_outline3

repeat5

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

3 years ago

Before you spend the next 2 hours watching more Data Science videos on youtube for... *Another* personal project Ask yourself this: What gives you more value? - More Videos - Or *Actually* Applying for roles Let's not BS each other yea? We both know the right answer.

thumb_up_off_alt21

chat_bubble_outline2

repeat2

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

hooray!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

In Traditional ML, you start off by collecting labels → train model → deploy. With Foundation models, you start off with a strong pretrained model, then steer it with prompts, retrieval, and sometimes fine-tuning.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

LLMs don’t really think in words. They think in tokens. A token can be a word, part of a word, or even a character. Tokenization is just how we chop text up. You can use the link below to play around with how GPT does tokenization. platform.openai.com/tokenizer

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

Wih Machine Learning, labeling our data was the old bottleneck. If it costs $0.05 to label one image, a million images costs $50,000. Scale that to more categories and the labeling bill goes boom boom. Self-supervision helps you scale without paying for every label.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

AI Engineering as a field exists because the data space is in the process of stopping training one model per task. With foundation models, you usually start with a strong general model and adapt it to your product. The “work” shifts from model building to integration,

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

When it comes to language model styles, as of now, we have 2 types... Autoregressive models predict the next token using only previous tokens. Masked models predict missing tokens using context on both sides. People usually mean autoregressive when they say “LLM.”

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

Foundation models are not just "really really big llms”. The idea is much broader, they are general purpose models you can build on.... They can be language-only or multimodal. The *main* point is they transfer across tasks

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

In the traditional world of MLE (that we all are very familiar with), it was task-specific models. You'd build one model for sentiment, another for translation. and another for classification. Foundation models mark the shift to general models that can do many tasks out right

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

When you are dealing with AI Engineering, here's top 3 most common adaptation tools that basically show up everywhere: prompt engineering, RAG, and fine-tuning. Prompt = instructions RAG = add external knowledge Fine-tune = change behavior via training

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

When messing around with LLMs, those weird tokens like <BOS> and <EOS> serve a purpose. They tell the model where a sequence starts and ends. The end token is especially useful because it helps the model learn when to stop generating, instead of rambling forever.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

When it comes to LLMs, tokens beat characters for a simple reason.. they let the model work with meaningful chunks. For example: “Cooking” can split into “cook” + “ing”. You keep meaning while using fewer units than full words, which helps efficiency and generalization.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

Here's a nice rule of thumb for some modern tokenizers: an “average token” is roughly 3/4 of a word.... So 100 tokens is around 75 words. This matters quite a bit when you’re thinking about context limits and cost.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

CLIP is a pretty good example of something that is not generative, but still quite powerful. It's basically an embedding model, which means that it learns a joint representation for images and text. Those embeddings became then go on and become useful building blocks for later

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

A lot of times, building an AI app is often a reaction to risk, not curiosity. If competitors can automate your core workflow, that is an “existential” use case. You should treat it like business continuity, not a side project.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

When the higher ups say the phrase “we need AI”, it can mean three different things. 1) The firm will become obsolete without it. 2) The firm will miss profit and productivity gains. 3) The firm is unsure at the moment, but can afford to, explore so you’re not late.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

4 months ago

AI Agents are just apps with tool access. If your model can search, call, write to a calendar, or file paperwork, you’ve moved from “chat” to “do.” The hard part to tackle usually means permissions, auditing, and setting up failure modes.

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

BowTied_Raptor|Data Science & Machine Learning 101

@bowtied_raptor

2 months ago

Hey BowTiedBull.eth - Read Pinned or NGMI read your post on kids. Ended up locking down a mid 20s attractive Italian girl whose from money. The post was basically spot on, many thanks :)

thumb_up_off_alt96

chat_bubble_outline9

repeat4

shareShare