Sign up OpenAI

Go https://platform.openai.com?utm_source=chatgpt.com

follow the instructions, give a name and project name, choose plan, API Key Name, get a API Key.

Once the API Key generated, you have to write it down immediately. this is the unique chance you can see the Secret Key.

Setup environment

Install OpenAI SDK and Environment Variable management tools

</> Bash
# Install OpenAI SDK
pip install openai
# install environment variable management tool
pip install python-dotenv


<python>
%pip install openai
%pip install python-dotenv

then we can call LLM

from openai import OpenAI

client = OpenAI(api_key="your_api_key")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Hello"}
    ]
)

print(response.choices[0].message.content)

create a .env file, a general text file, its files name is “.env“,under the same project folder.

then, we can load the key this way

</> Python

from dotenv import load_dotenv
from openai import OpenAI
import os


# load openAI api key
load_dotenv()
my_api_key = os.getenv("OPENAI_API_KEY")


# A OpenAI LLM instant
client = OpenAI(
  api_key = my_api_key
)

Using OpenRouter

OpenRouter is an AI gateway/platform that lets you access many different LLMs (Large Language Models) through one API.

Traditional WayOpenRouter Way
OpenAI API โ†’ GPT modelsOpenRouter API โ†’ GPT + Claude + Gemini + DeepSeek + Llama + many others
Need separate accounts/API keysOne API key
Different API endpointsOne endpoint
Different billing systemsOne billing system

Why People Use It

  • Try Many Models
    For example:
model="openai/gpt-5"

# Later change to:
model="anthropic/claude-opus"

# or
model="deepseek/deepseek-chat"

without changing much code.

  • Lower Cost
  • One API Key
    Instead of: OpenAI Key, Anthropic Key, Google Key, DeepSeek Key, you only manage OpenRouter Key

Go https://openrouter.ai/ to open a OpenRouter account, and get Key.

using openRouter

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-xxxxxxxx"
)

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain RAG simply"}
    ]
)

print(response.choices[0].message.content)

what’s the different?
OpenAI๏ผš

client = OpenAI(
  api_key = my openAI api_key
)

OpenRouter:

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="my openRouter API_key"
)

Phase II – Azure AI Foundry service

Step 1: Setup Azure AI Foundry

Assuring you have known how to use Azure Portal and add azure service. I will skip the adding Azure OpenAI service.

Once you add Azure OpenAI service, open “Explore Foundry port” to open Foundry dashboard. the alternative uses https://ai.azure.com/

Recommend you switch to new Foundry. You will be asked either select a existed project or create a new project.

Creating new project is easy, simply follow the screen steps. you cannot miss it.

you can see that you are able to create agents, Explore playgrounds and Find modules and you recent done works.

1) Create Agents

Create Agents = Build your own AI assistant
ๅˆ›ๅปบไธ€ไธชโ€œAIๅŠฉๆ‰‹/ๆ™บ่ƒฝไฝ“โ€

You use this when you want to:

  • define a role (e.g. โ€œData Analyst Agentโ€)
  • add instructions (system prompt) / ๅ†™่ง„ๅˆ™
  • connect tools (SQL, API, files) / ๅŠ ๅทฅๅ…ท๏ผˆSQL / API / ๆ–‡ไปถ๏ผ‰
  • add knowledge (RAG) / ๅŠ ็Ÿฅ่ฏ†ๅบ“๏ผˆRAG
  • make it do tasks automatically

๐Ÿ‘‰ Result: a custom AI agent / ้€ ไธ€ไธชAIๅ‘˜ๅทฅ, ไธ€ไธชโ€œ่ƒฝๅนฒๆดป็š„AIโ€

2) Explore Playgrounds

Playgrounds = Testing area for models
็ ‚็ฎฑ็ณป็ปŸ๏ผŒ ๆจกๅž‹่ฏ•้ชŒๅฎค

You use it to:

  • chat with models (GPT, DeepSeek, etc.) / ๆต‹่ฏ•ไธๅŒๆจกๅž‹๏ผŒ GPT๏ผŒ DeepSeek .
  • test prompts / ๅ†™ prompt ็œ‹ๆ•ˆๆžœ
  • try settings (temperature, tokens) / ่ฐƒๅ‚ๆ•ฐ๏ผˆtemperature ็ญ‰๏ผ‰
  • compare responses / ๅšๅฎž้ชŒ

๐Ÿ‘‰ It is NOT production
๐Ÿ‘‰ It is for experimenting
๐Ÿ‘‰ It is a โ€œSandbox / practice roomโ€

3) Find Models

Find Models = Choose AI model

You use it to:

  • browse available models (GPT-4.x, GPT-5.x, DeepSeek, etc.) / ็œ‹ๆœ‰ๅ“ชไบ›ๅฏไปฅ็”จ็š„ๆจกๅž‹๏ผˆGPTใ€DeepSeek็ญ‰๏ผ‰
  • check capabilities / ๅฏนๆฏ”่ƒฝๅŠ›
  • compare cost/performance / ็œ‹ไปทๆ ผ๏ผŒๆฏ”ๆ€ง่ƒฝ
  • decide which model to deploy

๐Ÿ‘‰ It is the โ€œmodel catalogโ€ / ๅฐฑๆ˜ฏโ€œๆŒ‘AIๅคง่„‘โ€

simply think as:

  • Find Models โ†’ choose the brain / ้€‰ๅ‘˜ๅทฅๅ€™้€‰ไบบ
  • Playgrounds โ†’ test the brain / ้ข่ฏ•ๆต‹่ฏ•
  • Create Agents โ†’ build a worker using the brain / ๆญฃๅผ้›‡ไฝฃ + ๅˆ†้…ๅทฅไฝœ

Step 2: Deploy model

From Project dashboard, click “Find models”, you will find many models over there to be selected. e.g. gpt-chat5.4, DeepSeek-V4-Flash, etc.

็›ด็™ฝๅœฐ่ฏดไบบ่ฏ๏ผšๅฎ‰่ฃ…ไธ€ไธชmodelใ€‚

Choose a one you like, then click “Deploy”, “deploy” is done.



Step 3: Create Agent

From Project dashboard, click Create agents

follow steps to create a agent. It is straight forward. no any confusing. Fill in agent Name.

you create the agent. looks this:


Tool = giving the AI external capabilities.
็ป™ AI ๅŠ โ€œๅค–้ƒจ่ƒฝๅŠ›โ€. ๆฒกๆœ‰tools๏ผŒ AIๅช่ƒฝ่Šๅคฉ-Chat๏ผŒ ๆœ‰tools๏ผŒ AIๆ‰่ƒฝๅšไบ‹ใ€‚

From Agent UI, you can see:

What is “Create toolbox”

Create toolbox = create a container/group for tools

Inside toolbox you can later add:

  • APIs
  • Functions
  • Search
  • Database tools
  • Custom tools
What is “Connect a tool”

Connect a tool = connect an actual usable tool/service/API

Examples:

  • Bing Search
  • Azure AI Search
  • Function API
  • REST API
  • SQL
  • OpenAPI service

First – Create toolbox

cleck “Create toolbox”

Second add tools to toolsBox

Then click Add to add tools into the toolBox

Let’s add a “Bing Search” as example.
cleck “Web search” –> Add tools

Add another Tool – Function / REST API, let Agent call external servicers.

{
  "openapi": "3.0.0",
  "info": {
    "title": "Users API",
    "version": "1.0.0"
  },
  "servers": [
    {
      "url": "https://jsonplaceholder.typicode.com"
    }
  ],
  "paths": {
    "/users": {
      "get": {
        "operationId": "getUsers",
        "summary": "Get list of users",
        "responses": {
          "200": {
            "description": "Successful response"
          }
        }
      }
    }
  }
}

now we have added 2 tools

Test the tools

Go to Agent playground: Agent โ†’ Chat / Playground / Test panel

Test 1: “REST API”

Typing “Use the REST API tool to get all users and show them.”

Test 2: Bing Search

Typing “Use Bing Search tool to find latest information about Azure AI Foundry.”

FORCE the Agent to call the tool

Assuming we have 100 REST API endpoints, each one will return different data, such as the userโ€™s name or the companyโ€™s name, sale’s amount ……
When we add each API Endpoint to ToolBox, we have to give clearly, specifically descriptions. Agent will scan description, it will choose the most specific one to call,

In actual AI project, most case is using “Tag”,

e.g.
Tool Registry:
– name
– description
– schema
– tags

Tool: get_company_financials
Tags: finance, company, revenue, kpi

Tool: get_user_profile
Tags: user, identity, profile


Add below “instructure” On the “Agent UI” –> “Instructions”

“You have access to multiple tools.
Each tool has a description that defines its purpose.
Always:
– Read tool descriptions carefully
– Select the most relevant tool based on semantic meaning of the user request
– Do NOT rely on hardcoded routing rules
– If multiple tools are relevant, choose the most specific one”



RAG = AI answers using retrieved documents instead of memory.

AI retrieves real documents first, then generates answer


Documents (PDF / Word / Wiki)
        โ†“
   Chunking (ๅˆ‡ๅ—)
        โ†“
   Embeddings (ๅ‘้‡ๅŒ–)
        โ†“
   Vector Search (็›ธไผผๅบฆๆฃ€็ดข)
        โ†“
   Retrieved Context
        โ†“
   LLM Answer

1: From Agent UI

FoundryIQ is Microsoftโ€™s managed knowledge system for RAG (Retrieval-Augmented Generation) inside Azure AI Foundry.

click “Connect to Foundry IQ”

1. Create a AI Search:

If you have not Create Azure AI Search Resource, or says Create an Azure AI Search service from Azure Portal, this is the 1st step.

โ€œAI Search Resourceโ€ = search engine server
Azure AI Search Index” = searchable dataset inside it

AI Search itself is acting as the Vector Database.

from azure portal –> AI search

after successfully created AI search resource, will see

We can see 3 parts from AI Search dasjboard:

  • Build your knowledge base
  • Connect your data
  • Monitor and scale

Build your knowledge base: Build a RAG-ready knowledge system
Including:

  • document ingestion
  • indexing
  • embeddings
  • retrieval
  • grounded chat playground

Connect your data: This is where you IMPORT your enterprise data.
e.g.

  • Cosmos DB
  • PDF
  • Blob storage
  • SharePoint
  • SQL

This step creates search indexes.

Monitor and scale: Infrastructure management: scaling, replicas, partitions, performance

2. Build your knowledge base

This step we will create knowledge source. Turn your data into an agentic knowledge base.

To Build your knowledge base, from AI Search Service dashboard, click “Build”

click “Create new” to create knowledge source.

Indexed = Azure stores/searches your processed data locally
Remote = Azure queries external systems live at runtime

Let’s use Azure blob (indexed) as example.

3. Enable text vectorization

This creates:

  • embeddings
  • vector fields
  • semantic retrieval capability

save it, then we see this:

Now, we have successfully built:

  • Blob Storage ingestion
  • Azure AI Search indexing
  • Knowledge Base connection
  • Vectorization enabled (semantic search ready)

๐Ÿ‘‰ In short: your RAG data layer is READY.

Attach Knowledge Source to Agent

From Project UI

create a new base in mainri-ai-search

return Agent UI –> click Add (Knowledge) –> connect to Foundry IQ
now, click “Create a new base in Mainri-ai-search”

Knowledge Base (Index creation wizard)

Test the RAG

Since we have upload company’s “return” and “policy” to blob, let’s test. it works. Agent read the company’s policy doc, and used it to answer my question
“What WFH – please answering in both EN and CN”


Phase 2 โ€” Azure OpenAI

Azure OpenAI Service is a collaboration between Microsoft and OpenAI, providing access to OpenAI’s powerful models on Microsoft’s enterprise-grade Azure cloud platform.

It combines theย same modelsย as OpenAI (like GPT-4o, o3, GPT-5, etc.) with the security, compliance, and private networking capabilities of Azure.

ๅฐฑๆ˜ฏๆŠŠ OpenAI ็š„ๆจกๅž‹่ƒฝๅŠ›โ€œๆ”พ่ฟ› Azure ไบ‘ๅนณๅฐโ€๏ผŒ้€š่ฟ‡ Azure ๆฅ่ฐƒ็”จ LLMใ€‚

It can build:
chatbots (ChatGPT-style apps)
Summarize documents
Extract information from text
Build RAG systems (LLM + enterprise data)
Generate code or SQL
Embed AI into enterprise apps securely

Azure OpenAI Service itself is specifically designed for OpenAI models (GPT-4o, o3, etc.).ย However, the broaderย Azure AI Foundryย (which includes Azure OpenAI as a component) allows you to work with many other models, including DeepSeek, Llama, Mistral, and even Claudeโ€”butย not Geminiย from Google.

Where Each Model Stands on Azure

ModelAvailable on Azure?How to Access
Azure OpenAI (GPT-4o, etc.)โœ… Yes (native)Direct Azure OpenAI API or Azure AI Foundry
DeepSeekโœ… YesDeploy via Azure AI Foundry model catalog (Microsoft + DeepSeek collaboration)
Claude (Anthropic)โœ… YesAccessible via Azure AI Foundry using Anthropic’s native endpoint
Llama (Meta)โœ… YesDeploy via Azure AI Foundry model catalog
Mistralโœ… YesDeploy via Azure AI Foundry model catalog
Gemini (Google)โŒ Not availableCurrently not offered in Azure AI Foundry

Appendix

Microsoft Foundry documentation

Azure OpenAI Quickstart

Azure OpenAI in Microsoft Foundry Models REST API reference

Microsoft Learn

Microsoft Foundry documentation

LLM Fundamentals

Azure AI Foundry is a Microsoft’s unified Azure platform-as-a-service offering for enterprise AI operations, model builders, and application development. 
ๅพฎ่ฝฏๆ–ฐ็š„ไผไธš็บง AI ๅนณๅฐ๏ผŒไธป่ฆ็”จไบŽๅผ€ๅ‘ใ€‚

  • AI apps / AI ๅบ”็”จ
  • Copilots / Copilot
  • AI agents / AI Agent (ๆ™บ่ƒฝ็ณป็ปŸ)
  • RAG systems / RAG ็ณป็ปŸ
  • enterprise AI workflows / ไผไธšๆ™บ่ƒฝๅทฅไฝœๆต

It is becoming Microsoft’s main AI engineering platform. Think of it as
ๅฎƒๆญฃๅœจๅ˜ๆˆๅพฎ่ฝฏไธป่ฆ็š„AIๅทฅ็จ‹ๅนณๅฐ๏ผŒๆœฌ่ดจไธŠๅฏไปฅ็†่งฃๆˆ

Azure AI Foundry = 
  Azure OpenAI
    + Prompt ็ฎก็†
    + AI Orchestration
    + Agent Framework
    + RAG
    + Evaluation
    + Deployment
    + Monitoring

What does it do? It helps companies

  • build GenAI apps / ๆž„ๅปบ AI ็ณป็ปŸ
  • connect enterprise data / ่ฟžๆŽฅไผไธšๆ•ฐๆฎ
  • orchestrate AI workflows
  • RAG / ๅš RAG
  • manage prompts / ็ฎก็† Prompt
  • Mange Agent / ็ฎก็†ๆ™บ่ƒฝ็ณป็ปŸ
  • evaluate AI quality / ็›‘ๆŽง AI ่ดจ้‡
  • deploy AI safely / ้ƒจ็ฝฒ AI

Key Components / ๆ ธๅฟƒ็ป„ๆˆ

A. Model Access / ๆจกๅž‹็ฎก็†

via / ้€š่ฟ‡:

  • Azure OpenAI
  • model catalog

Use models like / ่ฐƒ็”จๆจกๅž‹:

  • GPT-4
  • GPT-4o
  • open-source models
B. Prompt Flow

Visual orchestration for:

  • prompts / ้“พๆŽฅPrompt
  • workflows / ็ป„็ป‡ๅทฅไฝœๆต
  • chaining / ่ฐƒ่ฏ•
  • testing / ๆต‹่ฏ•
C. RAG

Connect AI to:

  • SharePoint
  • PDFs / ๆ–‡ๆกฃ
  • databases / ไผไธšๆ•ฐๆฎๅบ“
  • enterprise documents / ไผไธšๆ–‡ๆกฃ
D. AI Agents

Build agents that can /ๆž„ๅปบๅฏ่‡ชๅŠจๆ‰ง่กŒไปปๅŠก็š„ๆ™บ่ƒฝ็ณป็ปŸ๏ผˆAgent๏ผ‰:

  • use tools / Tool calling
  • call APIs / ่ฐƒ็”จAPI
  • automate workflows / ่‡ชๅŠจๅทฅไฝœๆต
  • reason across tasks / ๆŽจ็†๏ผŒ่‡ชๅŠจๅˆ†ๆž
E. Evaluation & Monitoring
็›‘ๆŽง

Measure:

  • hallucination
  • safety
  • quality
  • groundedness

Enterprise companies care about this heavily / ไผไธšๆžๅ…ถ้‡่ง†่ฟ™ไธช.


An Agent = LLM + Tools + Memory + Planning

Agent can:

  • decide steps / ่‡ชๅŠจๆ‹†่งฃไปปๅŠก
  • call tools (search, DB, API, code)
  • store memory
  • execute workflows

๐Ÿ‘‰ Think:

โ€œYou give goal โ†’ agent figures out how to achieve itโ€


What is LLM?
Large language models, also known as LLMs, are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The encoder and decoder extract meanings from a sequence of text and understand the relationships between words and phrases in it.

ๅคงๅž‹่ฏญ่จ€ๆจกๅž‹๏ผˆ่‹ฑ่ฏญ๏ผšlarge language model๏ผŒLLM๏ผ‰๏ผŒไนŸ็งฐๅคง่ฏญ่จ€ๆจกๅž‹๏ผŒ็ฎ€็งฐๅคงๆจกๅž‹๏ผŒๆ˜ฏไธ€็งๅŸบไบŽไบบๅทฅ็ฅž็ป็ฝ‘็ปœ็š„ๅทฒ็ป่ฎญ็ปƒ่ฟ‡็š„่ฏญ่จ€ๆจกๅž‹ใ€‚ๅคง่ฏญ่จ€ๆจกๅž‹ไธ“ไธบ่‡ช็„ถ่ฏญ่จ€ๅค„็†ไปปๅŠก่€Œ่ฎพ่ฎก๏ผŒๅฐคๅ…ถ้€‚็”จไบŽ่ฏญ่จ€็”Ÿๆˆใ€‚

ไป–ไปฌๅ…ณ็ณปๅŸบๆœฌๅฆ‚่ฟ™ไธชๅฑ‚็บง็ป“ๆž„/ๅŒ…ๅซๅ…ณ็ณป๏ผš
ไบบๅทฅๆ™บ่ƒฝ (AI) > ๆจกๅž‹ (Model) > ็”Ÿๆˆๅผ AI (Generative AI) > ๅคง่ฏญ่จ€ๆจกๅž‹ (LLM)


Tools = โ€œexternal capabilities for LLMโ€ / LLM ็š„โ€œๅค–้ƒจ่ƒฝๅŠ›โ€

Examples:

  • ๐Ÿ” Search (Bing / web) / ๆœ็ดขๅผ•ๆ“Ž
  • ๐Ÿ—„๏ธ Database query (SQL)
  • ๐Ÿ“Š Data processing (Python)
  • ๐Ÿ”— APIs (CRM, ERP)
  • ๐Ÿ“ File reading
Why tools matter?

Because LLM alone:

  • cannot access real-time data
  • cannot query enterprise systems
  • cannot execute actions

๐Ÿ‘‰ Tools = โ€œhands of the modelโ€ / model ็š„โ€œๆ‰‹โ€


Memory = system that stores user/context over time

Types:

๐Ÿ”น Short-term memory
  • current conversation context
๐Ÿ”น Long-term memory
  • user preferences
  • past interactions
  • profile data

Why important?
  • every chat is โ€œresetโ€ / ๆฒกๆœ‰memory,ๆฏๆฌก้ƒฝๆ˜ฏๆ–ฐ็”จๆˆท
  • personalized AI experience / ๆœ‰memory AI ๅ˜ๆˆโ€œไธชไบบๅŠฉ็†โ€

Workflow Rules = logic that controls how an agent behaves
ๆŽงๅˆถ Agent ่กŒไธบ็š„โ€œๆต็จ‹่ง„ๅˆ™โ€

Examples:

  • Step ordering
  • Tool selection rules
  • Approval conditions
  • Safety constraints

Example

1. Understand intent / ็†่งฃ้—ฎ้ข˜
2. Check memory / ๆŸฅ็œ‹่ฎฐๅฟ†
3. Decide if tool is needed / ๅˆคๆ–ญๆ˜ฏๅฆ่ฆๅทฅๅ…ท
4. Call tool (if needed) / ่ฐƒ็”จๅทฅๅ…ท
5. Combine results / ๆฑ‡ๆ€ป็ป“ๆžœ
6. Generate final answer / ่พ“ๅ‡บ็ญ”ๆกˆ


Deployment = making the model callable/useable.

Without deployment:

  • model exists in catalog
  • but your app cannot use it

After deployment, Azure gives endpoint + API access

VERY important

Deployment โ‰  Agent

A deployment is: an exposed model service
ๅฏไปฅ็ฎ€ๅ•็†่งฃไธบ๏ผšๅฐ†ไธ€ไธชModel, ไพ‹ๅฆ‚ๅฐ† GPT ๆˆ– Deep Seek๏ผŒ่ฐƒ่ฟ›ๆˆ‘็š„็ณป็ปŸ๏ผŒๅนถๆฟ€ๆดปๅฎƒ๏ผŒ่ฎฉ่ฟ™ไธชmodel ๅœจๆˆ‘็š„็ณป็ปŸ้‡Œๅ˜ไธบโ€œๅฏไฝฟ็”จไบ†โ€ใ€‚

What is Prompt? A Prompt is the instruction, question, context, or input you give to an AI model (LLM) to tell it what you want it to do.
ๅฐฑๆ˜ฏไฝ ็ป™ AI ็š„โ€œๆŒ‡ไปค/่พ“ๅ…ฅโ€๏ผŒ ๅ‘Š่ฏ‰AI๏ผš ่ฆๅšไป€ไนˆ๏ผŒ็”จไป€ไนˆๆ–นๆณ•ๅš๏ผŒ่พ“ๅ‡บไป€ไนˆใ€‚

e.g.

Summarize this document in 5 bullet points.

That sentence is a prompt.

Another example:

You are a senior Azure architect.
Explain Medallion Architecture for a banking platform.

The prompt tells the AI:

  • its role
  • the task
  • the expected output
  • sometimes the tone/style

Basic Prompt Structure

A prompt often contains:

PartPurpose
InstructionWhat to do / ๅšไป€ไนˆ
ContextBackground information / ่ƒŒๆ™ฏไฟกๆฏ
ConstraintsRules/limits/ ้™ๅˆถๆกไปถ
ExamplesDemonstrations / ็คบไพ‹
Output formatExpected response structure / ่ฆๆฑ‚็š„่พ“ๅ‡บๆ ผๅผ

Example

You are a data architect.

Context:
The company uses Azure Databricks and Synapse.

Task:
Design a metadata-driven ingestion framework.

Output:
Provide architecture, components, and best practices.

This is a more structured prompt.

Core components of Good prompt

A good prompt usually includs:

ENCN
Goal็›ฎๆ ‡
Context่ƒŒๆ™ฏ
Constraints้™ๅˆถๆกไปถ
Input่พ“ๅ…ฅๆ•ฐๆฎ
Output Format่พ“ๅ‡บๆ ผๅผ
Examples็คบไพ‹

What is Prompt Engineering?

Prompt Engineering = the practice of designing prompts to get better AI outputs.
ๆ็คบ่ฏๅทฅ็จ‹ๅฐฑๆ˜ฏ๏ผš่ฎพ่ฎก Prompt ๆฅ่Žทๅพ—ๆ›ดๅฅฝ AI ่พ“ๅ‡บโ€็š„ๆŠ€ๆœฏใ€‚
ๅ…ถๅฎž็ฎ€ๅ•่ฏดๅฐฑๆ˜ฏ โ€œไผš้—ฎ AI ้—ฎ้ข˜โ€

It is:

  • writing prompts strategically / ๆ›ด่ชๆ˜Žๅœฐๅ†™ Prompt
  • structuring context correctly / ๆ›ดๅˆ็†ๅœฐ็ป„็ป‡ Context
  • controlling AI behavior / ๆ›ด็จณๅฎšๅœฐๆŽงๅˆถ AI ่กŒไธบ
  • improving reliability and quality / ๆ้ซ˜ๅฏ้ ๆ€งๅ’Œ่ดจ้‡

Think of it as:

Programming with language instead of code.
ๅฏ็†่งฃๆˆ๏ผš็”จ่‡ช็„ถ่ฏญ่จ€โ€œ็ผ–็จ‹โ€๏ผŒ่€Œไธๆ˜ฏ็”จไปฃ็ ็ผ–็จ‹

Why Prompt Engineering Matters / ไธบไป€ไนˆ้‡่ฆ

LLMs are highly sensitive to / LLMๅฏน่ฟ™ไบ›้ซ˜ๅบฆๆ•ๆ„Ÿ:

  • wording / ๆŽช่พž
  • context / ่ƒŒๆ™ฏไฟกๆฏ
  • instructions / ่ฆๆฑ‚
  • examples / ็คบไพ‹
  • formatting / ่พ“ๅ‡บ่ฆๆฑ‚

Small prompt changes can dramatically affect / ๅฏน promptไธŠ่ฟฐ่ฟ™ไบ›ๅ“ชๆ€•ๆ˜ฏๅฐ็š„ๆ”นๅŠจ้ƒฝไผšๅฝฑๅ“ๅˆฐ็ป“ๆžœ:

  • accuracy / ๅ‡†็กฎ็އ
  • reasoning / ๆŽจ็†่ƒฝๅŠ›
  • hallucination / ๅนป่ง‰, ๆ— ๆ นๆฎ็š„็ป“่ฎบ
  • consistency / ็จณๅฎšๆ€ง
  • output quality / ่พ“ๅ‡บ่ดจ้‡

Common Prompt Engineering Techniques

common prompt technical include:

TechCN
Role PromptingๆŒ‡ๅฎš AI ่บซไปฝ
Few-shot็ป™ๅคšไธชไพ‹ๅญ
Chain of Thoughtๅผ•ๅฏผ AI ไธ€ๆญฅไธ€ๆญฅๆ€่€ƒ
Output ControlๆŽงๅˆถ่พ“ๅ‡บๆ ผๅผ
ConstraintsๅŠ ้™ๅˆถๆกไปถ
Context Injectionๆณจๅ…ฅไธšๅŠก่ƒŒๆ™ฏ

1) Role Prompting

Tell the AI who it is.

Example:

You are a senior enterprise architect.

This changes response style and depth.


2) Context Injection

Provide necessary information / ๆไพ›/ๆณจๅ…ฅๅฟ…่ฆ็š„่ƒŒๆ™ฏไฟกๆฏ๏ผŒไปฅๆ้ซ˜็ป“ๆžœ็š„ๅ‡†็กฎๆ€ง

Example:

The environment uses:
- Azure Databricks
- Delta Lake
- Unity Catalog

Without context, AI guesses / ไธๆไพ›่ฟ™ไบ›่ƒŒๆ™ฏ่ต„ๆ–™๏ผŒAIไผšๅŽปไนฑ็Œœใ€‚ๅฝฑๅ“็ป“ๆžœ็š„ๅ‡†็กฎๆ€ง.


3) Output Formatting

Specify desired structure / ่พ“ๅ‡บๆ ผๅผๆŽงๅˆถ๏ผŒ ็ป™AIๆๅ‡บ่พ“ๅ‡บ็š„ๆ ผๅผ่ฆๆฑ‚๏ผŒ ๅฏไปฅๅธฎๅŠฉๆ้ซ˜็ป“ๆžœ็š„ๅ‡†็กฎๆ€ง

Example:

Return the answer as:
- architecture diagram
- bullet points
- implementation steps

4) Few-Shot Prompting

Give examples of desired behavior.
Few-Shot Learning / (ๅฐ‘ๆ ทๆœฌๅญฆไน ) ๆ˜ฏไธ€็งไบบๅทฅๆ™บ่ƒฝๆŠ€ๆœฏใ€‚ๅฎƒๆŒ‡็š„ๆ˜ฏๅœจ็ป™ๆจกๅž‹็š„ๆ็คบ่ฏ๏ผˆPrompt๏ผ‰ไธญๆไพ›ๅฐ‘้‡๏ผˆ้€šๅธธ 2 ๅˆฐ 5 ไธช๏ผ‰็คบไพ‹๏ผŒๅธฎๅŠฉๆจกๅž‹็†่งฃไปปๅŠก่ฆๆฑ‚๏ผŒไปŽ่€Œ็”Ÿๆˆๆ›ดๅ‡†็กฎ็š„ๅ›žๅคใ€‚

Example:

I want to classify sentiment.
Example 1: "I love this food!" -> Positive
Example 2: "This is the worst day ever." -> Negative
Example 3: "The movie was okay." -> Neutral

Now classify this: "The weather is quite nice today." ->

output:  Positive

AI learns pattern/style from examples.


5) Chain-of-Thought Prompting

Encourage step-by-step reasoning / ่ฎฉ๏ผŒ่ฆๆฑ‚ AI ๅˆ†ๆญฅๆฅๆ€่€ƒ๏ผŒๆฅๅ›ž็ญ”๏ผŒ ไปฅๆ้ซ˜็ป“ๆžœ็š„ๅ‡†็กฎๆ€ง.

Example:

Think step-by-step before answering.

Useful for:

  • logic
  • architecture
  • math
  • troubleshooting

Context – understanding the word in Chinese

Context ็š„ๆ ธๅฟƒๆ„ๆ€ๅ…ถๅฎžๆ˜ฏ๏ผšๆจกๅž‹ๅœจ็”Ÿๆˆๅ›ž็ญ”ๆ—ถๆ‰€ไพๆฎ็š„ใ€ๅฏน่ฏๆˆ–ไปปๅŠกไธญๅทฒ็ปๅญ˜ๅœจ็š„ๅ…จ้ƒจๆœ‰ๆ•ˆไฟกๆฏ๏ผˆๅŒ…ๆ‹ฌๅކๅฒๅฏน่ฏใ€ๅฝ“ๅ‰้—ฎ้ข˜ใ€้šๅซๆกไปถใ€็”จๆˆทๅๅฅฝ็ญ‰๏ผ‰ใ€‚่ฟ™ไธช่ฏไฝœไธบ โ€œ่ฏญๅขƒโ€๏ผˆๆœ€ๆŽจ่๏ผ‰,โ€œ่ƒŒๆ™ฏไฟกๆฏโ€ , โ€œๅ‰ๆ–‡่ƒŒๆ™ฏโ€๏ผŒ โ€œๅ…ณ่”ไฟกๆฏโ€ , โ€œไพๆ‰˜ไฟกๆฏโ€, โ€œๅฏน่ฏ่ฎฐๅฟ†โ€๏ผˆ้’ˆๅฏนๅฏน่ฏ็ณป็ปŸ๏ผ‰ ๆฏ”่พƒๅฅฝๅฏนๅบ”ไธญๆ–‡ใ€‚

ๆˆ‘ไธชไบบ่ง‰ๅพ—โ€œ่ฏญๅขƒโ€ๆฏ”่พƒๅฅฝใ€‚

What is Context window๏ผŸ

A Context Window is the amount of information an LLM can โ€œseeโ€ or โ€œrememberโ€ during a conversation or request. Think of it as: The AI model’s working memory. Everything inside the context window can influence the AIโ€™s response.
ๆจกๅž‹ๅช่ƒฝๅŸบไบŽโ€œContext Window ๅ†…็š„ไฟกๆฏโ€ๆฅๅ›ž็ญ”้—ฎ้ข˜๏ผŒ LLM ไธ€ๆฌก่ƒฝโ€œ็œ‹ๅˆฐ/่ฎฐไฝโ€็š„ไฟกๆฏ้‡ๅฐฑๆ˜ฏContext windowsใ€‚ๅฏไปฅ็†่งฃไธบAI็š„โ€ๅ†…ๅญ˜ๅฎน้‡โ€œ

Important Understanding

The context window includes BOTH:

Included in Context WindowExamples
Input tokensprompts, chat history, RAG docs
Output tokensmodel response / AI ่พ“ๅ‡บ๏ผŒๅ›ž็ญ”
Total tokens = Input + Output

Context Engineering

Meaning:

  • deciding WHAT information goes into the context window
  • optimizing token usage
  • ranking retrieved documents
  • summarizing history
  • removing irrelevant content

Context Engineering ไนŸๅฐฑๆ˜ฏ๏ผšโ€œๅ†ณๅฎšไป€ไนˆไฟกๆฏ่ฟ›ๅ…ฅ Context Windowโ€ใ€‚ๅŒ…ๆ‹ฌ๏ผš

  • ๅ“ชไบ›ๆ–‡ๆกฃๆœ€้‡่ฆ
  • ๅฆ‚ไฝ•่Š‚็œ token
  • ๅฆ‚ไฝ•ๅŽ‹็ผฉๅކๅฒ
  • ๅฆ‚ไฝ•ๆŽ’ๅบ RAG ็ป“ๆžœ
  • ๅฆ‚ไฝ•ๅŽปๆމๆ— ๅ…ณไฟกๆฏ

่ฟ™ๆ˜ฏ Enterprise AI ้žๅธธๆ ธๅฟƒ็š„่ƒฝๅŠ›ใ€‚


What is Hallucination in AI / LLM?

Hallucination in AI / LLM refers to the phenomenon where the model generates content that is factually incorrect, nonsensical, or completely unrelated to the real world or the provided source, while presenting it with high confidence as if it were true. This is one of the BIGGEST concerns in enterprise AI systems.

Common examples include:

  • Inventing nonโ€‘existent references, laws, or historical events.
  • Incorrectly calculating simple arithmetic.
  • Misinterpreting the userโ€™s input and fabricating plausibleโ€‘sounding but false information.

่™šๅ‡็”Ÿๆˆ, ๆจกๅž‹็ผ–้€  AI ็”Ÿๆˆไบ†้”™่ฏฏ็š„๏ผŒ็ผ–้€ ็š„๏ผŒไธ็œŸๅฎž็š„๏ผŒๆฒกไพๆฎ็š„็š„ไฟกๆฏใ€‚ไฝ†AIๅดโ€œๅพˆ่‡ชไฟกโ€ๅœฐ่ฏดๅ‡บๆฅใ€‚ๅณ๏ผš ๆจกๅž‹่‡ชไฟกๅœฐ่พ“ๅ‡บ้”™่ฏฏๆˆ–ๅ‡ญ็ฉบๆ้€ ็š„ไฟกๆฏ

Why Hallucinations Happen๏ผŸ

LLMs Predict Language, Not Truth๏ผ›
2) Missing Context. If the model lacks:
  • sufficient information
  • enterprise data
  • current data

it may โ€œfill in the gaps.โ€

3) Ambiguous Prompts. Poor prompts can cause:
  • assumptions
  • invented details
  • unstable outputs
4) Outdated Training Data. Models have training cutoffs. They may:
  • not know recent events
  • generate outdated answers
  • guess newer information
5) Weak RAG / Retrieval. In enterprise AI:
  • bad retrieval
  • irrelevant documents
  • incomplete grounding

can produce hallucinated answers.

Types of Hallucinations

  • A. Factual Hallucination: Wrong facts. /ไบ‹ๅฎžๅนป่ง‰, ไบ‹ๅฎž้”™่ฏฏ, ็ผ–้€ ๅ…ฌๅธๆ”ฟ็ญ–
  • B. Citation Hallucination: Fake sources or references. / ๅผ•็”จๅนป่ง‰, ๅ‡่ฎบๆ–‡ใ€ๅ‡ๆฅๆบใ€‚
  • C. Logical Hallucination: Reasoning errors / ๆŽจ็†ๅนป่ง‰, ้€ป่พ‘ๆŽจ็†้”™่ฏฏ
  • D. Tool/API Hallucination: Inventing APIs, functions, parameters, libraries / ็ผ–้€ API็ญ‰

How Enterprises Reduce Hallucinations

1) RAG (Retrieval-Augmented Generation)
  • RAG: Most important technique. Instead of relying only on model memory / ๆœ€ๆ ธๅฟƒ
  • Better Prompt Engineering, Clear prompts reduce ambiguity.
  • Context Engineering Control: what information enters context, retrieval quality, ranking, chunking, summarization.
  • Evaluation Systems: AI outputs are tested for: factual accuracy, roundedness, consistency, safety.
  • Human-in-the-Loop, Humans validate :sensitive outputs, approvals, critical decisions.


Grounding = making the AI answer based on real external evidence, not memory.
่ฎฉ AI ็š„ๅ›ž็ญ”โ€œๆœ‰ไพๆฎโ€๏ผŒไธๆ˜ฏ้ ่ฎฐๅฟ†ไนฑ็Œœใ€‚ๆˆ–่€…่ฏดโ€œ็ป™ AI ็œ‹่ต„ๆ–™โ€๏ผŒ ไธๆ˜ฏ่ฎฉๅฎƒ่‡ชๅทฑๆƒณ็ญ”ๆกˆ

What are Tokens in AI / LLM?

Tokens in AI / LLM are the basic units of text that the model reads and generates. Instead of processing raw text characterโ€‘byโ€‘character or wordโ€‘byโ€‘word, the model breaks text into smaller, meaningful pieces called tokens.

ๅœจ AI / ๅคง่ฏญ่จ€ๆจกๅž‹ไธญ๏ผŒToken ๆ˜ฏๆจกๅž‹ๅค„็†ๆ–‡ๆœฌๆ—ถ็š„ๆœ€ๅŸบๆœฌๅ•ๅ…ƒใ€‚ๆจกๅž‹ไธไผšไธ€ไธชๅญ—็ฌฆไธ€ไธชๅญ—็ฌฆๅœฐ่ฏป๏ผŒไนŸไธไผšๆŒ‰ๅฎŒๆ•ดๅ•่ฏ่ฏป๏ผŒ่€Œๆ˜ฏๆŠŠๆ–‡ๆœฌๅˆ‡ๅˆ†ๆˆๆœ‰ๆ„ไน‰็š„็‰‡ๆฎต๏ผŒๆฏไธช็‰‡ๆฎตๅฐฑๆ˜ฏไธ€ไธช Tokenใ€‚

Token ๅฐฑๆ˜ฏๆŠŠไธ€ๅฅ่ฏๅˆ‡ๆˆๆจกๅž‹่ƒฝโ€œๆถˆๅŒ–โ€็š„ๆœ€ๅฐ็ขŽ็‰‡๏ผŒๆฏไธช็ขŽ็‰‡ๆœ‰็›ธๅฏน็‹ฌ็ซ‹็š„ๆ„ไน‰ใ€‚ๅˆ‡็š„ๆ–นๅผๅ–ๅ†ณไบŽๅˆ†่ฏๅ™จ๏ผŒไธๅŒๆจกๅž‹ๅˆ‡ๆณ•ๅฏ่ƒฝไธไธ€ๆ ทใ€‚

Key points:

  • A token is not always a whole word, nor a single character. It can be:
    • A short common word: "cat" โ†’ 1 token
    • Part of a longer word: "unhappiness" โ†’ "un" + "happiness" (2 tokens)
    • A single character: "a" โ†’ 1 token
    • A punctuation mark: "." โ†’ 1 token
    • A space or part of a space (depending on the tokenizer)
  • Examples (using OpenAI’s tokenizer):
    • "Hello, world!" โ†’ ["Hello", ",", " world", "!"] (4 tokens)
    • "I love you" โ†’ ["I", " love", " you"] (3 tokens)
    • A long Chinese sentence โ†’ often 1 Chinese character = 1โ€“2 tokens (less efficient than English)

Why Tokens Matter

  • Context length is measured in tokens (e.g., “this model has an 8K token context”).
  • Cost is usually based on tokens (input tokens + output tokens).
  • Speed depends on how many tokens the model processes.

What are Embeddings in AI / LLM?

Embeddings in AI / LLM are numerical representations of text (or other data like images, audio) in a highโ€‘dimensional vector space. Simply put, they turn words, sentences, or documents into lists of numbers so that computers can โ€œunderstandโ€ their meaning mathematically.

ๅœจ AI / ๅคง่ฏญ่จ€ๆจกๅž‹ไธญ๏ผŒEmbedding ๆ˜ฏๆŠŠๆ–‡ๆœฌ๏ผˆๆˆ–ๅ›พๅƒใ€้Ÿณ้ข‘็ญ‰๏ผ‰่ฝฌๆขๆˆๆ•ฐๅญ—ๅˆ—่กจ๏ผˆๅ‘้‡๏ผ‰ ็š„ๆŠ€ๆœฏใ€‚็ฎ€ๅ•่ฏด๏ผŒๅฐฑๆ˜ฏ่ฎฉ่ฎก็ฎ—ๆœบ้€š่ฟ‡ไธ€ไธฒๆ•ฐๅญ—ๆฅโ€œ็†่งฃโ€ๆ–‡ๅญ—็š„ๅซไน‰ใ€‚

Key points:

  • What it looks like:
    A word like "king" might be represented as a vector:
    [0.25, -0.78, 0.43, โ€ฆ, 0.12] (e.g., 300โ€“4096 dimensions).
  • How it works:
    Words or phrases with similar meanings are placed close together in this vector space.
    • "king" and "queen" are close.
    • "apple" (fruit) and "apple" (company) have different vectors depending on context.
  • Why embeddings matter:
    • They capture semantic meaning โ€“ relationships like king โˆ’ man + woman โ‰ˆ queen.
    • They enable search (find similar texts), clustering (group topics), and recommendation.
    • LLMs use embeddings internally to process every token you feed into the model.

Vector Databases

A Vector Database is a database designed to store and search embeddings (vectors). Vector DB stores semantic meaning vectors

Common Vector Databases

  • Pinecone / ๅ…จๆ‰˜็ฎกใ€ๆ— ๆœๅŠกๅ™จใ€ไฝŽๅปถ่ฟŸ
  • Weaviate / ๅ†…็ฝฎๆททๅˆๆœ็ดข + ๆจกๅ—ๅŒ–
  • FAISS / ๅบ“๏ผˆ้žๆ•ฐๆฎๅบ“๏ผ‰๏ผŒ้ซ˜ๅบฆไผ˜ๅŒ–็š„ANN
  • Azure AI Search
  • Databricks Vector Search
  • Milvus / ไบ‘ๅŽŸ็”Ÿใ€GPUๅŠ ้€Ÿใ€ๅไบฟ็บง่ง„ๆจก
  • Chroma / ่ฝป้‡็บงใ€ๅตŒๅ…ฅๅผใ€ๅŽŸ็”ŸPython

These databases optimize:
nearest neighbor search
semantic retrieval
high-dimensional vector operations

Traditional Database vs Vector Database

Traditional DatabaseVector Database
Stores rows/columnsStores vectors
SQL queriesSimilarity search
Exact matchingSemantic matching
Keyword searchMeaning search
Structured dataEmbeddings

Example, Suppose company documents contain: “Employees may work remotely twice weekly.”

User asks: “What is the work from home policy?”. Traditional keyword search may fail because โ€œremoteโ€ โ‰  โ€œwork from homeโ€. But embedding vectors capture semantic similarity.

Semantic Search

Similarity search is a technique that finds items in a dataset that are most similar to a given query vector, based on distance metrics in a high-dimensional embedding space โ€” enabling semantic matching rather than exact keyword matching.
็›ธไผผๆ€งๆœ็ดข(Similarity search) ๆ˜ฏไธ€็งๆŠ€ๆœฏ๏ผŒๅŸบไบŽ้ซ˜็ปดๅตŒๅ…ฅ็ฉบ้—ดไธญ็š„่ท็ฆปๅบฆ้‡๏ผŒๅœจๆ•ฐๆฎ้›†ไธญๆ‰พๅˆฐไธŽ็ป™ๅฎšๆŸฅ่ฏขๅ‘้‡ๆœ€็›ธไผผ็š„้กน็›ฎ โ€” ๅฎž็Žฐ่ฏญไน‰ๅŒน้…่€Œ้ž็ฒพ็กฎๅ…ณ้”ฎ่ฏๅŒน้…ใ€‚


What is Temperature?

Temperature is a hyperparameter that controls the randomness or creativity of an LLM’s output. It scales the logits (raw prediction scores) before the softmax function that converts them into probabilities โ€” lower temperatures make the model more deterministic and focused, while higher temperatures make it more diverse and exploratory.
Temperature ๆ˜ฏไธ€ไธช่ถ…ๅ‚ๆ•ฐ๏ผŒ็”จไบŽๆŽงๅˆถๅคง่ฏญ่จ€ๆจกๅž‹่พ“ๅ‡บ็š„้šๆœบๆ€งๆˆ–ๅˆ›้€ ๆ€งใ€‚ๅฎƒๅœจ softmax ๅ‡ฝๆ•ฐ๏ผˆๅฐ†ๅŽŸๅง‹้ข„ๆต‹ๅˆ†ๆ•ฐ่ฝฌๆขไธบๆฆ‚็އ๏ผ‰ไน‹ๅ‰ๅฏน่ฟ™ไบ› logits ่ฟ›่กŒ็ผฉๆ”พ โ€” ่พƒไฝŽ็š„ๆธฉๅบฆไฝฟๆจกๅž‹ๆ›ด็กฎๅฎšใ€ๆ›ดไธ“ๆณจ๏ผŒ่€Œ่พƒ้ซ˜็š„ๆธฉๅบฆไฝฟๅ…ถๆ›ดๅคšๆ ทๅŒ–ใ€ๆ›ดๅ…ทๆŽข็ดขๆ€งใ€‚

 Imagine you’re at a restaurant with a menu of 10 dishes. Temperature controls how likely you are to pick your absolute favorite vs. trying something new.
 ๆƒณ่ฑกไฝ ๅœจไธ€ไธชๆœ‰ 10 ้“่œ็š„้คๅŽ…้‡Œใ€‚ๆธฉๅบฆๆŽงๅˆถ็€ไฝ ้€‰ๆ‹ฉๆœ€็ˆฑ็š„่œ vs. ๅฐ่ฏ•ๆ–ฐ่œ็š„ๅฏ่ƒฝๆ€งใ€‚

TemperatureAnalogy (English)Analogy (ไธญๆ–‡)
Low (0.1 ~ 0.3)You always order your #1 favorite dish. Very predictable.ไฝ ๆ€ปๆ˜ฏ็‚นไฝ ๆœ€็ˆฑ็š„็ฌฌไธ€้“่œใ€‚้žๅธธๅฏ้ข„ๆต‹ใ€‚
Medium (0.7 ~ 1.0)You usually pick your top dish, but sometimes try #2 or #3. Balanced.ไฝ ้€šๅธธ้€‰ๆœ€็ˆฑ็š„่œ๏ผŒไฝ†ๆœ‰ๆ—ถๅฐ่ฏ•็ฌฌไบŒๆˆ–็ฌฌไธ‰ๅ–œๆฌข็š„ใ€‚ๅนณ่กกใ€‚
High (1.5+)You randomly pick any dish, even ones you don’t know. Very unpredictable.ไฝ ้šๆœบ้€‰ไปปไฝ•่œ๏ผŒ็”š่‡ณไฝ ไธ่ฎค่ฏ†็š„่œใ€‚้žๅธธไธๅฏ้ข„ๆต‹ใ€‚


What is Completion in AI/LLM?

Completion is the fundamental, raw operation of an LLM where the model takes an input text prompt and generates the most likely continuation of that text, token by token, in an autoregressive manner. It has no concept of roles or conversation history โ€” just text in, text out.

Completion ๆ˜ฏ LLM ๆœ€ๅŸบ็ก€ใ€ๆœ€ๅŽŸๅง‹็š„ๆ“ไฝœ๏ผšๆจกๅž‹ๆŽฅๆ”ถไธ€ๆฎต่พ“ๅ…ฅๆ–‡ๆœฌๆ็คบ๏ผŒ็„ถๅŽไปฅ่‡ชๅ›žๅฝ’็š„ๆ–นๅผ้€ token ็”Ÿๆˆ่ฏฅๆ–‡ๆœฌๆœ€ๅฏ่ƒฝ็š„ๅปถ็ปญๅ†…ๅฎนใ€‚ๅฎƒๆฒกๆœ‰่ง’่‰ฒๆˆ–ๅฏน่ฏๅކๅฒ็š„ๆฆ‚ๅฟต โ€” ไป…ไป…ๆ˜ฏๆ–‡ๆœฌ่พ“ๅ…ฅใ€ๆ–‡ๆœฌ่พ“ๅ‡บใ€‚

Key Characteristics (ๅ…ณ้”ฎ็‰นๅพ)

AspectEnglishChinese
InputSingle string promptๅ•ไธชๅญ—็ฌฆไธฒๆ็คบ่ฏ
OutputRaw text continuationๅŽŸๅง‹ๆ–‡ๆœฌๅปถ็ปญ
RolesNoneๆ— 
HistoryMust be manually managedๅฟ…้กปๆ‰‹ๅŠจ็ฎก็†
Underlying mechanismAutoregressive token prediction่‡ชๅ›žๅฝ’ token ้ข„ๆต‹
Modern statusLegacy (GPT-3, Davinci era)้—็•™ๆจกๅผ๏ผˆGPT-3ใ€Davinci ๆ—ถไปฃ๏ผ‰

Simple Example

Prompt (ๆ็คบ่ฏ):     "The capital of France is"
Completion (่กฅๅ…จ):   " Paris."

Prompt (ๆ็คบ่ฏ):     "def fibonacci(n):"
Completion (่กฅๅ…จ):   "\n    if n <= 1:\n        return n\n    else:\n        return fibonacci(n-1) + fibonacci(n-2)"

What is Chat in AI/LLM?

Chat is a structured, turn-based interaction paradigm built on top of completion. It adds role awareness (system, user, assistant) and automatic conversation history management. Each chat interaction is internally converted into a completion with special formatting tokens.

Chatๆ˜ฏๆž„ๅปบๅœจCompletionไน‹ไธŠ็š„็ป“ๆž„ๅŒ–ใ€ๅŸบไบŽ่ฝฎๆฌก็š„ไบคไบ’่Œƒๅผใ€‚ๅฎƒๅขžๅŠ ไบ†่ง’่‰ฒๆ„Ÿ็Ÿฅ๏ผˆ็ณป็ปŸใ€็”จๆˆทใ€ๅŠฉๆ‰‹๏ผ‰ๅ’Œ่‡ชๅŠจๅฏน่ฏๅކๅฒ็ฎก็†ใ€‚ๆฏๆฌกๅฏน่ฏไบคไบ’ๅœจๅ†…้ƒจ้ƒฝ่ขซ่ฝฌๆขไธบๅธฆๆœ‰็‰นๆฎŠๆ ผๅผๆ ‡่ฎฐ็š„่กฅๅ…จใ€‚

Key Characteristics

AspectEnglishChinese
InputArray of messages with rolesๅธฆ่ง’่‰ฒ็š„ๆถˆๆฏๆ•ฐ็ป„
OutputRole-labeled assistant responseๅธฆ่ง’่‰ฒๆ ‡็ญพ็š„ๅŠฉๆ‰‹ๅ›žๅค
RolesSystem, User, Assistant็ณป็ปŸใ€็”จๆˆทใ€ๅŠฉๆ‰‹
HistoryAutomatically managed in message arrayๅœจๆถˆๆฏๆ•ฐ็ป„ไธญ่‡ชๅŠจ็ฎก็†
Underlying mechanismStill completion (with special tokens)ไป็„ถๆ˜ฏ่กฅๅ…จ๏ผˆๅธฆ็‰นๆฎŠๆ ‡่ฎฐ๏ผ‰
Modern statusStandard (GPT-4, Claude, DeepSeek)ๆ ‡ๅ‡†ๆจกๅผ๏ผˆGPT-4ใ€Claudeใ€DeepSeek๏ผ‰
e.g.
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."}
]

Completion vs. Chat

Schema enforcement = forcing data to follow a fixed structure (schema), not free-form text.

ๅผบๅˆถ AI ๆˆ– API ่พ“ๅ‡บโ€œ็ฌฆๅˆๆ ผๅผ็š„ๆ•ฐๆฎโ€๏ผŒไธ่ƒฝไนฑๅ†™ใ€‚ไพ‹ๅฆ‚
John is 30 years old and lives in Toronto

AI ๅฏ่ƒฝ่พ“ๅ‡บ๏ผšJohn is 30 years old and lives in Toronto
ไนŸๅฏ่ƒฝๆ˜ฏ๏ผš
name: John
age: thirty
location: Toronto Canada maybe

ไธ็จณๅฎšใ€ไธๅฏๆœบๅ™จๅค„็†ใ€‚ ็”จSchema enforcement ๅผบๅˆถๅฎƒ่พ“ๅ‡บ่ฟ™ๆ ท
{
“name”: “John”,
“age”: 30,
“location”: “Toronto”
}

Appendix

OpenAI Platform Doc – OpenAI Developers

Azure OpenAI Documentations

LLM Reasoning vs Retrieval (RAG)


๐ŸŸจ 1. What is LLM Reasoning?

LLM Reasoning is the ability of a Large Language Model to understand a userโ€™s input, interpret meaning, and generate logical outputs based on patterns learned during training. It does not directly access external data during reasoning (unless tools are used). It mainly relies on internal parameters learned from training data.

LLM = trained knowledge

LLM ๆŽจ็†่ƒฝๅŠ›ๆŒ‡็š„ๆ˜ฏๅคง่ฏญ่จ€ๆจกๅž‹ๅŸบไบŽ่ฎญ็ปƒๆ—ถๅญฆๅˆฐ็š„็Ÿฅ่ฏ†๏ผŒๅฏน็”จๆˆท่พ“ๅ…ฅ่ฟ›่กŒ็†่งฃใ€ๅˆ†ๆž๏ผŒๅนถ็”Ÿๆˆๆœ‰้€ป่พ‘็š„ๅ›ž็ญ”ใ€‚ๅฎƒๆœฌ่ดจไธŠๆ˜ฏโ€œๅœจ่„‘ๅญ้‡Œๆ€่€ƒโ€๏ผŒไธไพ่ต–ๅฎžๆ—ถๅค–้ƒจๆ•ฐๆฎ๏ผˆ้™ค้ž้ขๅค–ๆŽฅๅ…ฅๅทฅๅ…ท๏ผ‰ใ€‚

LLM ๆ˜ฏๅœจ่ฎญ็ปƒ้˜ถๆฎต๏ผˆtraining phase๏ผ‰้€š่ฟ‡ๅคง้‡ๆ•ฐๆฎๅญฆไน ๅˆฐ็š„ๅ‚ๆ•ฐๅŒ–็Ÿฅ่ฏ†๏ผˆparametric knowledge๏ผ‰๏ผŒๅญ˜ๅ‚จๅœจๆจกๅž‹ๆƒ้‡้‡Œใ€‚


๐ŸŸจ 2. What is Retrieval (RAG)?

RAG = search new knowledge and append to LLM
Retrieval-Augmented Generation (RAG) is a method where the system first searches external knowledge sources (such as databases, documents, or enterprise knowledge bases) and then provides the retrieved information to the LLM to generate a grounded answer.

ๆฃ€็ดขๅขžๅผบ็”Ÿๆˆ๏ผˆRAG๏ผ‰ๆ˜ฏไธ€็งๆœบๅˆถ๏ผš็ณป็ปŸๅ…ˆๅŽปๅค–้ƒจ็Ÿฅ่ฏ†ๅบ“๏ผˆๆ–‡ๆกฃใ€ๆ•ฐๆฎๅบ“ใ€ไผไธš่ต„ๆ–™็ญ‰๏ผ‰โ€œๆŸฅ่ต„ๆ–™โ€๏ผŒ็„ถๅŽๆŠŠๆŸฅๅˆฐ็š„ๅ†…ๅฎนไบค็ป™ LLM๏ผŒๅ†็”ฑ LLM ๅŸบไบŽ่ฟ™ไบ›็œŸๅฎž่ต„ๆ–™็”Ÿๆˆ็ญ”ๆกˆใ€‚

RAG retrieves external knowledge and injects it into the prompt context at runtime.

RAG ๅœจ่ฟ่กŒๆ—ถไปŽๅค–้ƒจๆฃ€็ดขไฟกๆฏ๏ผŒๅนถๆŠŠ็ป“ๆžœโ€œไธดๆ—ถๆ”พ่ฟ›ไธŠไธ‹ๆ–‡โ€๏ผŒ่ฎฉ LLM ไฝฟ็”จใ€‚


๐Ÿ”— 3. Relationship between LLM Reasoning and RAG


๐Ÿงฉ Core relationship

LLM Reasoning is the thinking engine, while RAG is the information supply system. RAG provides external factual knowledge, and LLM reasoning interprets and synthesizes that information into a final answer.

LLM ๆŽจ็†ๆ˜ฏโ€œๆ€่€ƒๅคง่„‘โ€๏ผŒRAG ๆ˜ฏโ€œๅค–้ƒจ็Ÿฅ่ฏ†ๆฅๆบโ€ใ€‚RAG ๆไพ›็œŸๅฎž่ต„ๆ–™๏ผŒLLM ๆŽจ็†่ดŸ่ดฃ็†่งฃใ€ๅˆ†ๆžๅนถ็ป„็ป‡่ฟ™ไบ›่ต„ๆ–™๏ผŒๆœ€็ปˆ็”Ÿๆˆ็ญ”ๆกˆใ€‚

LLM provides reasoning based on pre-trained knowledge, while RAG supplies external, up-to-date information at inference time; together they enable grounded and accurate responses.

LLM ๅŸบไบŽ่ฎญ็ปƒๅฅฝ็š„ๅ†…้ƒจ็Ÿฅ่ฏ†่ฟ›่กŒๆŽจ็†๏ผŒRAG ๅœจๆŽจ็†ๆ—ถๆไพ›ๅค–้ƒจๆœ€ๆ–ฐไฟกๆฏ๏ผŒไธค่€…็ป“ๅˆ่ฎฉ็ณป็ปŸๅ›ž็ญ”ๆ›ดๅŠ ๅ‡†็กฎใ€ๅฏ่ฟฝๆบฏๅ’ŒๅŸบไบŽไบ‹ๅฎžใ€‚


๐Ÿ”„ How they work together (flow)

  1. User asks a question
  2. RAG retrieves relevant documents
  3. Retrieved data is passed to the LLM
  4. LLM performs reasoning over both the question + retrieved context
  5. Final answer is generated
  1. ็”จๆˆทๆๅ‡บ้—ฎ้ข˜
  2. RAG ๅŽป็Ÿฅ่ฏ†ๅบ“ๆฃ€็ดข็›ธๅ…ณ่ต„ๆ–™
  3. ๆŠŠๆŸฅๅˆฐ็š„ไฟกๆฏไบค็ป™ LLM
  4. LLM ็ป“ๅˆ้—ฎ้ข˜ + ่ต„ๆ–™่ฟ›่กŒๆŽจ็†
  5. ็”Ÿๆˆๆœ€็ปˆ็ญ”ๆกˆ

โš–๏ธ Key difference (very important)

Englishไธญๆ–‡
LLM Reasoning = internal thinking based on learned knowledgeLLM ๆŽจ็† = ๅŸบไบŽๆจกๅž‹ๅ†…้ƒจๅทฒๅญฆไน ็Ÿฅ่ฏ†่ฟ›่กŒๆ€่€ƒ
RAG = external knowledge retrieval from real data sourcesRAG = ไปŽๅค–้ƒจ็œŸๅฎžๆ•ฐๆฎๆบ่Žทๅ–ไฟกๆฏ
Reasoning answers โ€œhow to thinkโ€ๆŽจ็†ๅ›ž็ญ”โ€œๆ€Žไนˆๆ€่€ƒโ€
Retrieval answers โ€œwhat facts to useโ€ๆฃ€็ดขๅ›ž็ญ”โ€œ็”จๅ“ชไบ›ไบ‹ๅฎžโ€

Enterprise AI / LLM System

1. User & Input Layer

    Englishไธญๆ–‡
    User Request: The user asks the AI to perform a task such as writing, analyzing, or generating content.็”จๆˆท่ฏทๆฑ‚๏ผš็”จๆˆท่ฆๆฑ‚ AI ๆ‰ง่กŒไปปๅŠก๏ผŒไพ‹ๅฆ‚ๅ†™ไฝœใ€ๅˆ†ๆžๆˆ–็”Ÿๆˆๅ†…ๅฎน๏ผŒ่ฟ™ๆ˜ฏๆ•ดไธช AI ๆต็จ‹็š„่ตท็‚นใ€‚
    Prompt: The detailed instruction given to the AI describing what to do and any constraints.ๆ็คบ่ฏ๏ผš็”จๆˆท็ป™ AI ็š„ๅ…ทไฝ“ๆŒ‡ไปค๏ผŒ่ฏดๆ˜Ž่ฆๅšไป€ไนˆไปฅๅŠ้™ๅˆถๆกไปถ๏ผŒ็”จๆฅ็ฒพ็กฎๆŽงๅˆถ่พ“ๅ‡บใ€‚
    Prompt System: Defines the AIโ€™s role, tone, and behavior rules.Prompt ็ณป็ปŸ๏ผšๅฎšไน‰ AI ็š„่ง’่‰ฒใ€่ฏญๆฐ”ๅ’Œ่กŒไธบ่ง„ๅˆ™๏ผŒไพ‹ๅฆ‚โ€œไฝ ๆ˜ฏๆ•ฐๆฎๅทฅ็จ‹ๅธˆ๏ผŒ่ฆ็”จไธ“ไธš่ฏญๆฐ”ๅ›ž็ญ”โ€ใ€‚

    2. AI Core Reasoning Layer๏ผˆAI ๆ ธๅฟƒๆŽจ็†ๅฑ‚๏ผ‰

    Englishไธญๆ–‡
    LLM Reasoning: The model interprets the request, understands intent, and performs logical reasoning.LLM ๆŽจ็†่ƒฝๅŠ›๏ผšๆจกๅž‹็†่งฃ็”จๆˆทๆ„ๅ›พ๏ผŒๅนถ่ฟ›่กŒ่ฏญไน‰่งฃๆžไธŽ้€ป่พ‘ๆŽจ็†๏ผŒๆ˜ฏ AI ็š„ๆ ธๅฟƒโ€œๆ€่€ƒ่ƒฝๅŠ›โ€ใ€‚
    AI Brain: The central decision-making unit that determines how to solve the task step by step.AI ๅคง่„‘ๅฑ‚๏ผš็ณป็ปŸ็บงๅ†ณ็ญ–ๆ ธๅฟƒ๏ผŒ่ดŸ่ดฃๅˆคๆ–ญไปปๅŠกๆ€Žไนˆๅšใ€ๆ˜ฏๅฆ้œ€่ฆๆŸฅ่ต„ๆ–™ใ€ๆ˜ฏๅฆ่ฐƒ็”จๅทฅๅ…ทใ€‚
    Planning Module: Breaks complex tasks into structured steps for execution.่ง„ๅˆ’ๆจกๅ—๏ผšๅฐ†ๅคๆ‚ไปปๅŠกๆ‹†่งฃๆˆๅคšไธชๆญฅ้ชค๏ผŒไพ‹ๅฆ‚โ€œๅ…ˆๆŸฅๆ•ฐๆฎ โ†’ ๅ†ๅˆ†ๆž โ†’ ๅ†ๆ€ป็ป“โ€ใ€‚

    3. Knowledge & Information Layer๏ผˆ็Ÿฅ่ฏ†ไธŽไฟกๆฏๅฑ‚๏ผ‰

    Englishไธญๆ–‡
    Context: Background information such as conversation history and input data used to understand the request.ไธŠไธ‹ๆ–‡ไฟกๆฏ๏ผšๅŒ…ๆ‹ฌ่Šๅคฉๅކๅฒๅ’Œ่ƒŒๆ™ฏๆ•ฐๆฎ๏ผŒ็”จๆฅๅธฎๅŠฉ AI ็†่งฃๅฝ“ๅ‰้—ฎ้ข˜็š„ๆฅ้พ™ๅŽป่„‰ใ€‚
    Retrieval (RAG): Retrieves relevant information from enterprise knowledge bases before generating answers.ๆฃ€็ดขๅขžๅผบ็”Ÿๆˆ๏ผˆRAG๏ผ‰๏ผšๅœจ็”Ÿๆˆ็ญ”ๆกˆๅ‰๏ผŒไปŽไผไธš็Ÿฅ่ฏ†ๅบ“ๆˆ–ๆ–‡ๆกฃไธญๆฃ€็ดข็›ธๅ…ณไฟกๆฏ๏ผŒ้ฟๅ…โ€œๅ‡ญ็ฉบ็ผ–้€ โ€ใ€‚
    Enterprise Knowledge: Internal company data such as documents, databases, and policies used as trusted sources.ไผไธš็Ÿฅ่ฏ†ๅบ“๏ผšไผไธšๅ†…้ƒจๆ•ฐๆฎๆฅๆบ๏ผŒไพ‹ๅฆ‚ๆ–‡ๆกฃใ€ๆ•ฐๆฎๅบ“ใ€ๅˆถๅบฆ็ญ‰๏ผŒๆ˜ฏ AI ๅ›ž็ญ”็š„็œŸๅฎžไพๆฎใ€‚
    Memory & State: Tracks conversation history, task progress, and user context over time.่ฎฐๅฟ†ไธŽ็Šถๆ€๏ผš่ฎฐๅฝ•ๅฏน่ฏๅކๅฒๅ’ŒไปปๅŠก่ฟ›ๅบฆ๏ผŒ่ฎฉ AI ่ƒฝโ€œ่ฎฐไฝไน‹ๅ‰ๅ‘็”Ÿ็š„ไบ‹ๆƒ…โ€ใ€‚

    4. Tools & Execution Layer๏ผˆๅทฅๅ…ทไธŽๆ‰ง่กŒๅฑ‚๏ผ‰

    Englishไธญๆ–‡
    Tool Calling: Allows the AI to interact with external systems such as APIs, databases, or enterprise tools.ๅทฅๅ…ท่ฐƒ็”จ๏ผš่ฎฉ AI ่ƒฝ่ฐƒ็”จๅค–้ƒจ็ณป็ปŸ๏ผˆAPIใ€ๆ•ฐๆฎๅบ“ใ€ไผไธšๅทฅๅ…ท๏ผ‰ๆฅๅฎŒๆˆ็œŸๅฎžๆ“ไฝœใ€‚
    Enterprise Actions: Enables AI to perform real-world actions such as writing data, sending emails, or triggering workflows.ไผไธšๅŠจไฝœ่ƒฝๅŠ›๏ผšAI ๅฏไปฅๆ‰ง่กŒ็œŸๅฎžๅŠจไฝœ๏ผŒไพ‹ๅฆ‚ๅ†™ๅ…ฅๆ•ฐๆฎใ€ๅ‘้€้‚ฎไปถๆˆ–่งฆๅ‘็ณป็ปŸๆต็จ‹ใ€‚
    Enterprise Workflow: Structured business processes where multiple steps are automated and orchestrated by AI.ไผไธšๆต็จ‹็ผ–ๆŽ’๏ผšๅฐ†ๅคšไธชไธšๅŠกๆญฅ้ชคไธฒ่”่ตทๆฅ็š„่‡ชๅŠจๅŒ–ๆต็จ‹๏ผŒไพ‹ๅฆ‚ๅฎกๆ‰นๆตๆˆ–ๆ•ฐๆฎๅค„็†ๆต็จ‹ใ€‚
    Enterprise Operations: System-level operations including monitoring, scheduling, and resource management.ไผไธš่ฟ่ฅ็ณป็ปŸ๏ผš่ดŸ่ดฃ็ณป็ปŸ่ฟ่กŒ็š„ๅบ•ๅฑ‚่ƒฝๅŠ›๏ผŒๅŒ…ๆ‹ฌ็›‘ๆŽงใ€่ฐƒๅบฆๅ’Œ่ต„ๆบ็ฎก็†ใ€‚

    5. Safety, Quality & Governance Layer๏ผˆๅฎ‰ๅ…จไธŽๆฒป็†ๅฑ‚๏ผ‰

    Englishไธญๆ–‡
    Guardrails: Rules that enforce safety, compliance, and prevent harmful or invalid outputs.ๅฎ‰ๅ…จๆŠคๆ ๏ผšๆŽงๅˆถ AI ่กŒไธบ่พน็•Œ๏ผŒ้˜ฒๆญข่ฟ่ง„ๅ†…ๅฎนใ€ๆ•ๆ„Ÿไฟกๆฏๆณ„้œฒๆˆ–้”™่ฏฏๆ“ไฝœใ€‚
    Evaluation: Measures output quality, correctness, and detects hallucinations.่ดจ้‡่ฏ„ไผฐ๏ผšๆฃ€ๆŸฅ AI ่พ“ๅ‡บๆ˜ฏๅฆๆญฃ็กฎ๏ผŒๆ˜ฏๅฆๅญ˜ๅœจโ€œๅนป่ง‰้—ฎ้ข˜โ€๏ผŒไปฅๅŠๆ˜ฏๅฆ็ฌฆๅˆๆ ‡ๅ‡†ใ€‚
    Observability: Monitors system performance, latency, errors, and overall AI pipeline health.ๅฏ่ง‚ๆต‹ๆ€ง๏ผš็›‘ๆŽงๆ•ดไธช AI ็ณป็ปŸ่ฟ่กŒ็Šถๆ€๏ผŒๅŒ…ๆ‹ฌๅปถ่ฟŸใ€้”™่ฏฏ็އๅ’Œๆ€ง่ƒฝๆŒ‡ๆ ‡ใ€‚

    Overall System Summary๏ผˆๆ•ดไฝ“ๆ€ป็ป“๏ผ‰

    Englishไธญๆ–‡
    User Request triggers the system, Prompt System defines behavior, LLM performs reasoning, Planning breaks tasks into steps, RAG retrieves knowledge, Tools execute actions, and Governance ensures safety and quality.็”จๆˆท่ฏทๆฑ‚่งฆๅ‘็ณป็ปŸ๏ผŒPrompt ็ณป็ปŸๅฎšไน‰่กŒไธบ่ง„ๅˆ™๏ผŒLLM ่ดŸ่ดฃๆŽจ็†๏ผŒPlanning ่ดŸ่ดฃๆ‹†่งฃไปปๅŠก๏ผŒRAG ๆไพ›็Ÿฅ่ฏ†๏ผŒๅทฅๅ…ทๆ‰ง่กŒๅŠจไฝœ๏ผŒๆœ€ๅŽ็”ฑๅฎ‰ๅ…จไธŽๆฒป็†ๅฑ‚ไฟ่ฏ็ณป็ปŸ็š„ๅฎ‰ๅ…จไธŽ่ดจ้‡ใ€‚

    Agent Harness and its 12 Core Modules

    To date, AI Engineering has evolved through three primary stages: Prompt Engineering, Context Engineering, and Harness Engineering.

    Prompt Engineering:

    Core Question: Did the model understand what you were saying?
    ๆ ธๅฟƒ้—ฎ้ข˜๏ผšๆจกๅž‹ๆœ‰ๆฒกๆœ‰ๅฌๆ‡‚ไฝ ๅœจ่ฏดไป€ไนˆ๏ผŸ

    • Perfecting Instructions: Transforming vague requests into precise, step-by-step commands to eliminate ambiguity.
      ๅฎŒๅ–„ๆŒ‡ไปค๏ผšๅฐ†ๆจก็ณŠ็š„่ฏทๆฑ‚่ฝฌๅ˜ไธบ็ฒพ็กฎ็š„ใ€ๅˆ†ๆญฅ้ชค็š„ๅ‘ฝไปค๏ผŒไปฅๆถˆ้™คๆญงไน‰ใ€‚
    • Persona-Setting: Assigning a specific identity to the model, such as “Senior Data Architect,” to calibrate its professional depth and tone.
      ่ง’่‰ฒ่ฎพๅฎš๏ผšไธบๆจกๅž‹่ต‹ไบˆ็‰นๅฎš่บซไปฝ๏ผŒไพ‹ๅฆ‚โ€œ่ต„ๆทฑๆ•ฐๆฎๆžถๆž„ๅธˆโ€๏ผŒไปฅ่ฐƒๆ•ดๅ…ถไธ“ไธšๆทฑๅบฆๅ’Œ่ฏญๆฐ”ใ€‚
    • Formatting: Mandating outputs in specific structures like JSON or SQL to ensure they are machine-readable and ready for downstream systems.
      ๆ ผๅผๅŒ–่ง„่Œƒ๏ผš่ง„ๅฎš่พ“ๅ‡บๅฟ…้กปไธบ JSON ๆˆ– SQL ็ญ‰็‰นๅฎš็ป“ๆž„๏ผŒ็กฎไฟๆœบๅ™จๅฏ่ฏปๆ€งๅนถ่ƒฝ็›ดๆŽฅ่ขซไธ‹ๆธธ็ณป็ปŸ่ฐƒ็”จใ€‚
    Context Engineering:

    Core Question: Does the model have enoughโ€”and correctโ€”information?
    ๆ ธๅฟƒ้—ฎ้ข˜๏ผšๆจกๅž‹ๆœ‰ๆฒกๆœ‰ๆ‹ฟๅˆฐ่ถณๅคŸ่€Œไธ”ๆญฃ็กฎ็š„ไฟกๆฏ๏ผŸ

    • Reducing Hallucination: Grounding the model’s responses in private enterprise data to ensure it doesn’t “hallucinate” or invent facts.
      ๅ‡ๅฐ‘ๅนป่ง‰๏ผšๅฐ†ๆจกๅž‹็š„ๅ›ž็ญ”้”šๅฎšๅœจไผไธš็งๆœ‰ๆ•ฐๆฎไธญ๏ผŒ็กฎไฟๅฎƒไธไผšๅ‡ญ็ฉบ็Œœๆต‹ๆˆ–ๆ้€ ไบ‹ๅฎžใ€‚
    • Retrieval-Augmented Generation (RAG): Using vector databases to provide the model with real-time, relevant document snippets during the generation process.
      ๆฃ€็ดขๅขžๅผบ็”Ÿๆˆ (RAG)๏ผšๅˆฉ็”จๅ‘้‡ๆ•ฐๆฎๅบ“ๅœจ็”Ÿๆˆ่ฟ‡็จ‹ไธญไธบๆจกๅž‹ๆไพ›ๅฎžๆ—ถ็š„ใ€็›ธๅ…ณ็š„ๆ–‡ๆกฃ็‰‡ๆฎตใ€‚
    • Knowledge Management: Organizing enterprise data so the model understands the relationships between different business entities.
      ็Ÿฅ่ฏ†็ฎก็†๏ผš็ป„็ป‡ไผไธšๆ•ฐๆฎ๏ผŒไฝฟๆจกๅž‹่ƒฝๅคŸ็†่งฃไธๅŒไธšๅŠกๅฎžไฝ“ไน‹้—ด็š„ๅ…ณ่”้€ป่พ‘ใ€‚
    Harness Engineering:

    Core Question: Can the model consistently execute correctly in a real-world environment?
    ๆ ธๅฟƒ้—ฎ้ข˜๏ผšๆจกๅž‹ๅœจ็œŸๅฎž็š„ๆ‰ง่กŒ้‡Œ่ƒฝไธ่ƒฝๆŒ็ปญๅšๅฏน๏ผŸ

    • Reliability Evaluation: Building automated testing frameworks to verify that the model remains stable and accurate across thousands of requests.
      ๅฏ้ ๆ€ง่ฏ„ไผฐ๏ผšๅปบ็ซ‹่‡ชๅŠจๅŒ–็š„ๆต‹่ฏ•ๆก†ๆžถ๏ผŒ้ชŒ่ฏๆจกๅž‹ๅœจๆ•ฐๅƒๆฌก่ฏทๆฑ‚ไธญ่ƒฝๅฆไฟๆŒ็จณๅฎšๅ’Œๅ‡†็กฎใ€‚
    • Tool-Calling Verification: Ensuring the API calls or database queries generated by the model are syntactically correct and safe to execute.
      ๅทฅๅ…ท่ฐƒ็”จ้ชŒ่ฏ๏ผš็กฎไฟๆจกๅž‹็”Ÿๆˆ็š„ API ่ฐƒ็”จๆˆ–ๆ•ฐๆฎๅบ“ๆŸฅ่ฏขๆŒ‡ไปคๅœจ่ฏญๆณ•ไธŠๆ˜ฏๆญฃ็กฎ็š„๏ผŒไธ”ๆ‰ง่กŒ่ตทๆฅๆ˜ฏๅฎ‰ๅ…จ็š„ใ€‚
    • Operational Monitoring (LLMOps): Tracking AI performance, latency, and drift in production, similar to how we monitor traditional data pipelines.
      ็”Ÿไบง็›‘ๆŽง (LLMOps)๏ผšๅœจ็”Ÿไบง็Žฏๅขƒไธญ่ทŸ่ธช AI ็š„ๆ€ง่ƒฝใ€ๅปถ่ฟŸๅ’Œๆผ‚็งป๏ผŒๅฐฑๅƒ็›‘ๆŽงไผ ็ปŸๆ•ฐๆฎๆตๆฐด็บฟไธ€ๆ ทใ€‚

    What is Agent Harness

    Agent Harness is an orchestration framework or runtime environment that connects, manages, and controls the various components needed to run an autonomous AI agent โ€” including the LLM, tools (APIs), memory, context windows, parsing logic, and safety guardrails โ€” allowing the agent to execute complex, multi-step tasks reliably.

    Agent = Model + Harness

    Agent Harness ๆ˜ฏไธ€ไธช็ผ–ๆŽ’ๆก†ๆžถๆˆ–่ฟ่กŒ็Žฏๅขƒ๏ผŒ็”จไบŽ่ฟžๆŽฅใ€็ฎก็†ๅ’ŒๆŽงๅˆถ่ฟ่กŒ่‡ชไธป AI ๆ™บ่ƒฝไฝ“ๆ‰€้œ€็š„ๅ„็ง็ป„ไปถ๏ผŒๅŒ…ๆ‹ฌๅคง่ฏญ่จ€ๆจกๅž‹ใ€ๅทฅๅ…ท๏ผˆAPI๏ผ‰ใ€่ฎฐๅฟ†ใ€ไธŠไธ‹ๆ–‡็ช—ๅฃใ€่งฃๆž้€ป่พ‘ไปฅๅŠๅฎ‰ๅ…จๆŠคๆ ๏ผŒไปŽ่€Œไฝฟๆ™บ่ƒฝไฝ“่ƒฝๅคŸๅฏ้ ๅœฐๆ‰ง่กŒๅคๆ‚็š„ๅคšๆญฅ้ชคไปปๅŠกใ€‚

    ๆ™บ่ƒฝไฝ“ = ๆจกๅž‹ + Harness

    The landscape of AI Agent architecture is currently in a “pre-standardization” phase, very similar to the early days of cloud computing between 2008 and 2012. We are in an era where everyone is inventing terminology as they go. There is no simple, standard, or unified definition yet. Instead, interpretations vary widely depending on an individual’s standpoint, specific interests, and areas of focus. Donโ€™t focus too much on memorizing exact module names. Focus on understanding the functional responsibilities instead. So I emphasized the operational AI platform side. That means Retrieval, Tool Calling, Workflow, Evaluation, and Observability were emphasized because these are the core of Enterprise AI-ready Platforms today. I focus on:

    • RAG
    • Vector DB
    • Tool Calling
    • Workflow
    • Evaluation
    • Observability

    AI Agent ๆžถๆž„็›ฎๅ‰่ฟ˜ๅค„ไบŽโ€œๆ ‡ๅ‡†ๅŒ–ไน‹ๅ‰โ€็š„้˜ถๆฎต๏ผŒๅพˆๅƒ 2008โ€“2012 ๅนดๆ—ฉๆœŸไบ‘่ฎก็ฎ—ๆ—ถๆœŸ๏ผŒๅคงๅฎถ้ƒฝๅœจ่พนๅ‘ๅฑ•่พนๅ‘ๆ˜Žๆœฏ่ฏญใ€‚ๆฒกๆœ‰ไป€ไนˆ็ฎ€ๅ•็š„๏ผŒๆ ‡ๅ‡†็š„๏ผŒ็ปŸไธ€็š„ไธœ่ฅฟใ€‚ๅ„่‡ชไปŽๅ„่‡ช็š„็ซ‹ๅœบ่ง’ๅบฆ๏ผŒๅ…ด่ถฃๅ’Œๅ…ณๆณจ็‚นๅ‡บๅ‘้ƒฝๆœ‰ไธๅŒ็š„่งฃ้‡Šใ€‚ไธ่ฆๅคชๆ‰ง็€ไบŽๅ…ทไฝ“ๆจกๅ—ๅๅญ—ใ€‚ๆ›ด้‡่ฆ็š„ๆ˜ฏ็†่งฃโ€œๅŠŸ่ƒฝ่Œ่ดฃโ€ใ€‚ ๆ‰€ไปฅๆˆ‘้‡็‚น่ฎฒ็š„ๆ˜ฏโ€œไผไธš AI ๅนณๅฐ่ฟ่กŒๅฑ‚โ€๏ผŒ ๆ›ด็ชๅ‡บ Retrievalใ€Tool Callingใ€Workflowใ€Evaluationใ€Observability๏ผŒๅ› ไธบ่ฟ™ไบ›ๆ˜ฏ็ŽฐๅœจไผไธšAI้›†ๆˆ็š„ๆ ธๅฟƒใ€‚

    Agent Harness 12 Core Modules

    StepCategoryModuleEnglish Explanationไธญๆ–‡่งฃ้‡Š
    1AI BrainPrompt SystemDefines the AIโ€™s role, goals, instructions, constraints, and response behavior so the model knows what it should do.ๅฎšไน‰ AI ็š„่ง’่‰ฒใ€็›ฎๆ ‡ใ€่ง„ๅˆ™ๅ’Œ่กŒไธบๆ–นๅผ๏ผŒๅ‘Š่ฏ‰ๆจกๅž‹โ€œไฝ ๆ˜ฏ่ฐใ€ๅบ”่ฏฅๅšไป€ไนˆโ€ใ€‚
    2AI BrainLLM ReasoningThe LLM understands the user request, performs reasoning, generates ideas, and decides how to respond.LLM ็†่งฃ็”จๆˆท่ฏทๆฑ‚๏ผŒ่ฟ›่กŒๆŽจ็†ใ€ๅˆ†ๆž๏ผŒๅนถๅ†ณๅฎšๅฆ‚ไฝ•ๅ›ž็ญ”ๆˆ–ๆ‰ง่กŒไปปๅŠกใ€‚
    3Enterprise KnowledgeContext ManagementSelects, filters, compresses, and organizes the most relevant context within token limits so the AI has the right information.้€‰ๆ‹ฉใ€่ฟ‡ๆปคใ€ๅŽ‹็ผฉๅนถ็ป„็ป‡ๆœ€็›ธๅ…ณ็š„ไธŠไธ‹ๆ–‡๏ผŒ็กฎไฟ AI ๅœจ token ้™ๅˆถๅ†…ๆ‹ฅๆœ‰ๆญฃ็กฎ็š„ไฟกๆฏใ€‚
    4Enterprise KnowledgeMemoryLoads conversation history, user preferences, and long-term memory so the AI can maintain continuity and personalization.ๅŠ ่ฝฝๅކๅฒๅฏน่ฏใ€็”จๆˆทๅๅฅฝๅ’Œ้•ฟๆœŸ่ฎฐๅฟ†๏ผŒ่ฎฉ AI ไฟๆŒ่ฟž็ปญๆ€งๅ’Œไธชๆ€งๅŒ–ใ€‚
    5Enterprise KnowledgeRetrieval/RAG Pipeline/Knowledge BaseSearches enterprise documents, databases, and knowledge sources to retrieve external information the LLM does not already know.ไปŽไผไธšๆ–‡ๆกฃใ€ๆ•ฐๆฎๅบ“ๅ’Œ็Ÿฅ่ฏ†ๅบ“ไธญๆฃ€็ดขไฟกๆฏ๏ผŒ่กฅๅ…… LLM ๆœฌ่บซไธ็Ÿฅ้“็š„ไผไธš็Ÿฅ่ฏ†ใ€‚
    6Enterprise ActionsPlanning/Agent LoopBreaks large or complex tasks into smaller executable steps and determines the execution strategy.ๅฐ†ๅคๆ‚ไปปๅŠกๆ‹†่งฃๆˆๅคšไธชๅฏๆ‰ง่กŒๆญฅ้ชค๏ผŒๅนถๅˆถๅฎšๆ‰ง่กŒ็ญ–็•ฅใ€‚
    7Enterprise ActionsTool CallingAllows the AI to call APIs, SQL, Python, enterprise systems, search engines, or external applications to perform real actions.่ฎฉ AI ่ฐƒ็”จ APIใ€SQLใ€Pythonใ€ไผไธš็ณป็ปŸๆˆ–ๅค–้ƒจๅทฅๅ…ท๏ผŒ็œŸๆญฃๆ‰ง่กŒๅฎž้™…ๆ“ไฝœใ€‚
    8Enterprise ActionsState ManagementTracks execution progress, workflow status, retries, temporary variables, and current task state during runtime.่ทŸ่ธช่ฟ่กŒ่ฟ‡็จ‹ไธญ็š„ๆ‰ง่กŒ่ฟ›ๅบฆใ€็Šถๆ€ใ€้‡่ฏ•ใ€ไธดๆ—ถๅ˜้‡ๅ’Œๅฝ“ๅ‰ไปปๅŠกๆƒ…ๅ†ตใ€‚
    9Enterprise WorkflowOrchestration/Multi-Agent OrchestrationCoordinates multiple tools, workflows, agents, and execution paths so the overall system works together correctly.ๅ่ฐƒๅคšไธชๅทฅๅ…ทใ€ๅทฅไฝœๆตใ€Agent ๅ’Œๆ‰ง่กŒ่ทฏๅพ„๏ผŒ่ฎฉๆ•ดไธช็ณป็ปŸๅๅŒๅทฅไฝœใ€‚
    10Enterprise WorkflowEvaluationEvaluates answer quality, correctness, relevance, task completion, and hallucination risk before returning results.ๅœจ่ฟ”ๅ›ž็ป“ๆžœๅ‰่ฏ„ไผฐ็ญ”ๆกˆ่ดจ้‡ใ€ๆญฃ็กฎๆ€งใ€็›ธๅ…ณๆ€งใ€ไปปๅŠกๅฎŒๆˆๅบฆๅ’Œๅนป่ง‰้ฃŽ้™ฉใ€‚
    11Enterprise OperationsGuardrailsEnforces security rules, permissions, compliance policies, risk controls, and safe AI behavior.ๆ‰ง่กŒๅฎ‰ๅ…จ่ง„ๅˆ™ใ€ๆƒ้™ๆŽงๅˆถใ€ๅˆ่ง„่ฆๆฑ‚ๅ’Œ้ฃŽ้™ฉๆŽงๅˆถ๏ผŒ้˜ฒๆญข AI ๅšๅฑ้™ฉๆ“ไฝœใ€‚
    12Enterprise OperationsObservabilityMonitors logs, traces, token usage, latency, failures, and overall AI system health for debugging and operations.็›‘ๆŽงๆ—ฅๅฟ—ใ€้“พ่ทฏใ€token ็”จ้‡ใ€ๅปถ่ฟŸใ€้”™่ฏฏๅ’Œ็ณป็ปŸๅฅๅบท็Šถๆ€๏ผŒ็”จไบŽ่ฟ็ปดๅ’Œ่ฐƒ่ฏ•ใ€‚

    The Core Logic of Enterprise AI
    ไผไธš AI ็š„ๆ ธๅฟƒ้€ป่พ‘

    CategoryPurposeไธญๆ–‡
    AI BrainMakes the AI understand and reason่ฎฉ AI ่ƒฝ็†่งฃๅ’ŒๆŽจ็†
    Enterprise KnowledgeGives AI the right enterprise information็ป™ AI ๆญฃ็กฎ็š„ไผไธš็Ÿฅ่ฏ†
    Enterprise ActionsAllows AI to perform actual work่ฎฉ AI ็œŸๆญฃๆ‰ง่กŒไปปๅŠก
    Enterprise WorkflowCoordinates complex execution flowsๅ่ฐƒๅคๆ‚ๅทฅไฝœๆต
    Enterprise OperationsKeeps the AI system safe, stable, and observableไฟๆŒ็ณป็ปŸๅฎ‰ๅ…จใ€็จณๅฎšใ€ๅฏ็›‘ๆŽง

    Simple Enterprise Agent Harness Architecture
    ไผไธš Agent Harness ็ฎ€ๅŒ–ๆžถๆž„ๅ›พ

    User
    โ†“
    Prompt System
    โ†“
    LLM (GPT/Claude/Gemini)
    โ†“
    Planning Engine
    โ†“
    Tool Calling / Retrieval
    โ†“
    Enterprise Systems
    (SQL / API / SharePoint / Databricks)
    โ†“
    Memory + State Tracking
    โ†“
    Guardrails + Evaluation
    โ†“
    Monitoring / Observability
    โ†“
    Final AI Response
    Englishไธญๆ–‡
    The user asks the AI to perform a task.็”จๆˆท่ฆๆฑ‚ AI ๆ‰ง่กŒไปปๅŠกใ€‚
    The Prompt System defines the AIโ€™s role and behavior.Prompt System ๅฎšไน‰ AI ็š„่ง’่‰ฒๅ’Œ่กŒไธบใ€‚
    The LLM understands the request and reasons about it.LLM ็†่งฃ็”จๆˆท่ฏทๆฑ‚ๅนถ่ฟ›่กŒๆŽจ็†ใ€‚
    The Planning module breaks the task into steps.Planning ๆจกๅ—ๆŠŠไปปๅŠกๆ‹†ๆˆๅคšไธชๆญฅ้ชคใ€‚
    Tool Calling lets the AI access databases, APIs, or enterprise systems.Tool Calling ่ฎฉ AI ่ฐƒ็”จๆ•ฐๆฎๅบ“ใ€API ๆˆ–ไผไธš็ณป็ปŸใ€‚
    Retrieval searches enterprise documents and knowledge bases.Retrieval ๆฃ€็ดขไผไธšๆ–‡ๆกฃๅ’Œ็Ÿฅ่ฏ†ๅบ“ใ€‚
    Memory and State track progress and conversation history.Memory ๅ’Œ State ็ฎก็†ๅކๅฒๅ’ŒไปปๅŠก็Šถๆ€ใ€‚
    Guardrails enforce security and compliance rules.Guardrails ๆ‰ง่กŒๅฎ‰ๅ…จไธŽๅˆ่ง„้™ๅˆถใ€‚
    Evaluation checks answer quality and hallucinations.Evaluation ๆฃ€ๆŸฅ AI ๅ›ž็ญ”่ดจ้‡ๅ’Œๅนป่ง‰้—ฎ้ข˜ใ€‚
    Observability monitors the entire AI workflow and system health.Observability ็›‘ๆŽงๆ•ดไธช AI ๅทฅไฝœๆตๅ’Œ็ณป็ปŸ็Šถๆ€ใ€‚

    Summary of AI, ML, LLM

    Summary of AI, ML, LLM

    AI (Artificial Intelligence) contains ML (Machine Learning), which contains LLM (Large Language Models) focused on languageโ€” like nested Russian dolls.

    AI๏ผˆไบบๅทฅๆ™บ่ƒฝ๏ผ‰ๅŒ…ๅซ ML๏ผˆๆœบๅ™จๅญฆไน ๏ผ‰๏ผŒML ๅ†ๅŒ…ๅซ LLM๏ผˆๅคง่ฏญ่จ€ๆจกๅž‹๏ผ‰ไธ“ๅš่ฏญ่จ€็š„ๆจกๅž‹โ€”โ€”ๅฐฑๅƒไฟ„็ฝ—ๆ–ฏๅฅ—ๅจƒไธ€ๆ ทๅฑ‚ๅฑ‚ๅตŒๅฅ—ใ€‚

    TermEnglish (One-line Human Explanation)ไธญๆ–‡๏ผˆไธ€ๅฅ่ฏไบบ่ฏ่งฃ้‡Š๏ผ‰
    AI (Artificial Intelligence)The broad field of making computers behave intelligently like humans.AI ๆ˜ฏโ€œ่ฎฉ็”ต่„‘ๅƒไบบไธ€ๆ ทไผšๆ€่€ƒใ€ไผšๅšไบ‹โ€็š„ๆ€ป้ข†ๅŸŸใ€‚
    ML (Machine Learning)A subset of AI where computers learn patterns from data instead of hard-coded rules.ML ๆ˜ฏ AI ็š„ไธ€็งๆ–นๅผ๏ผš่ฎฉ็”ต่„‘ไปŽๆ•ฐๆฎ้‡Œโ€œ่‡ชๅทฑๅญฆ่ง„ๅพ‹โ€๏ผŒ่€Œไธๆ˜ฏไบบๆ‰‹ๅ†™่ง„ๅˆ™ใ€‚
    LLM (Large Language Model)A type of ML model trained on huge amounts of text to understand and generate language.LLM ๆ˜ฏ ML ็š„ไธ€็งๅคงๅž‹่ฏญ่จ€ๆจกๅž‹๏ผŒไธ“้—จๅญฆไน ๆตท้‡ๆ–‡ๅญ—๏ผŒไปŽ่€Œไผšโ€œ่Šๅคฉใ€ๅ†™ไฝœใ€ๅ›ž็ญ”้—ฎ้ข˜โ€ใ€‚

    Frequently Used GenAI & LLM Concepts (Simple Explanations)

    Agent

    An AI Agent is a system that can autonomously break down tasks, make decisions, and execute actions using tools and reasoning.

    Agentic Workflow

    An agentic workflow is a multi-step autonomous process where an AI system completes tasks without continuous human intervention.
    or says: AI auto-complete entire process without human intervention.
    e.g.
    apply for –> validation –> calculation –> output results
    human intervention.

    Chunking

    Splitting big documents into small pieces so AI can handle them better.

    e.g. There are 200 pages in a PDF file, AI cannot read all at once, so splitting file into many small pieces/chunks.
    1st piece: 1- 500 words;
    2nd piece/chunk: 501 – 1000 words;
    3rd piece/chunk: 1001 – 1500 words;
    …..
    each chunk will become embedding.
    It is commonly used in RAG systems to prepare documents for embedding and retrieval.

    Cosine Similarity

    Cosine similarity measures how similar two vectors are in meaning by comparing their direction in vector space.
    or says: A way to measure how similar two pieces of meaning are.
    e.g.
    apple vs banana : yes, they are very similar.
    apple vs car: no, they are not similar at all.

    Context Window

    Context window is the maximum amount of text an LLM can process at once.

    Embedding

    Embeddings convert text into numerical vectors that represent meaning.

    โ€œappleโ€ become [0.12, -0.98, 0.33, ……]
    โ€œorangeโ€ become [0.12, -0.98, 0.456, ,,,,,,,] too,
    so AI will find
    apple = Fruit,
    apple != car
    or says “simile to a fruit”, and it is not a car. Similar meanings result in closer vector distances, allowing machines to compare semantic similarity instead of exact words.

    Fine-tuning

    Fine-tuning is the process of further training a pre-trained model on domain-specific data to improve performance in a specialized area.

    Hallucination

    Hallucination occurs when an LLM generates incorrect or fabricated information while sounding confident.

    LangChain

    LangChain is a framework for building applications powered by LLMs by connecting models with tools, APIs, and data sources.
    in short, chaining interlinkage/link AI , Data, Tools …….
    or says “A tool to connect LLMs, data, and tools into applications.”

    LangGraph

    LangGraph is a framework for building stateful, graph-based AI workflows where agents can loop, branch, and maintain memory across steps.
    or says: A workflow system that lets AI follow multi-step flows with loops and decisions.
    e.g. SQL Agent,
    Write SQL script –> Execute –> Error Alert –> Fix –> Re-try

    LLM

    Large Language Model. The AI brain that can understand and generate language.

    e.g. user asks AI “please write an email”, then output a completed email.
    Company uses it to generate Report, analyst Data, auto reply client, …..

    MCP

    MCP (Model Context Protocol) defines a standardized way for LLMs to interact with external tools, APIs, and data systems.
    or says: A standard way for AI to use tools and data systems.

    Model Drift

    Model drift occurs when a deployed modelโ€™s performance degrades due to changes in real-world data over time.
    or says: After the AI was put into use, it started to make mistakes.
    why/what’s happened?
    maybe, training used old data, now data has changed/updated.

    Prompt

    A prompt is the instruction given to an LLM.
    Well-designed prompts significantly improve the quality and accuracy of model outputs.
    e.g.
    bad prompt: “write a letter”, — not clearly, what letter you need, thank you letter? complaining letter? ,,,,
    good prompt: “Please write a thank you letter to Mary since she gave me a gift.”

    Prompt Engineering

    Prompt engineering is the practice of designing effective prompts to guide LLM behavior and improve output quality.
    or say: Designing better instructions to improve AI responses.

    RAG

    Retrieval-Augmented Generation. Retrieval-Augmented Generation combines retrieval and generation.
    The system first retrieves relevant documents, then uses an LLM to generate an answer based on that information.
    e.g. look up HR documents –> pass documents to GPT –> GPT summary then answer question.

    Retrieval

    Retrieval is the process of searching a knowledge base or vector database to find relevant information before generating an answer.
    e.g.
    “what is the return policy?” , AI system will look up in “vector DB”, find out “policy document”, pass it to GPT, then answer the question – what is the return policy?

    Token

    Tokens are the smallest units of text that an LLM processes.

    e.g. a sentence like โ€œI love Torontoโ€, AI splits โ€œI love Torontoโ€ into smaller pieces before the model can understand it.

    • I
    • love
    • Toronto

    these are tokens,
    Token count also determines cost and context limits in LLM systems.

    Tool Calling

    Tool calling allows LLMs to execute external functions such as APIs, databases, or code to perform real-world actions.
    e.g. “AI can “take action”.
    >search order
    > search database

    Vector DB

    A database that stores meaning-based vectors for similarity search. It allows AI systems to retrieve semantically relevant documents instead of keyword-based search.
    e.g.
    there are 10000 file,
    HR policy,
    IT manual,
    Finance report,
    ……
    all of those files become embedding saved in Vector DB. When user asks question, AI will not use “Key-words” to seek, it uses “mean” to match.

    Vector Search

    Vector search retrieves results based on semantic similarity rather than keyword matching.
    or says: Searching by meaning instead of exact words.

    Comparison of Fabric, Azure Databricks and Synapse Analytics

    Microsoft Fabric vs Databricks vs Synapse

    Microsoft Fabric is an all-in-one SaaS analytics platform with integrated BI.
    Databricks is a Spark-based platform mainly used for large-scale data engineering and machine learning.
    Synapse is an enterprise analytics service combining SQL data warehousing and big data processing.

    PlatformDescription (English)
    Microsoft FabricAn all-in-one SaaS data platform that integrates data engineering, data science, warehousing, real-time analytics, and BI.
    Azure DatabricksA Spark-based analytics and AI platform optimized for large-scale data engineering and machine learning.
    Azure Synapse AnalyticsAn analytics service combining data warehousing and big data analytics.

    Architecture

    1. Microsoft Fabric: Fully integrated SaaS platform built around OneLake.
      single data lake, unified workspace, built-in Power BI
    2. Databricks: Spark-native architecture optimized for big data processing.
      Delta Lake, Spark clusters, ML workloads
    3. Synapse: Hybrid analytics platform integrating SQL data warehouse and big data tools.

    Main Use Cases

    PlatformBest For
    FabricEnd-to-end analytics platform
    DatabricksAdvanced data engineering & ML
    SynapseEnterprise data warehouse