Python Machine learning
figonkingx  

Python AI Stack 2024: Tổng kết công cụ và thư viện thiết yếu

Năm 2024 chứng kiến sự bùng nổ của các công cụ AI trong Python. Bài viết này tổng kết những libraries và tools thiết yếu mà mọi AI developer cần biết, từ LLM orchestration đến vector databases.

Nội dung chính

LLM Orchestration Frameworks

LangChain

Framework phổ biến nhất cho việc xây dựng LLM applications:

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain

# Basic usage
llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant specialized in {topic}"),
    ("user", "{question}")
])

chain = prompt | llm
result = chain.invoke({"topic": "Python", "question": "Explain decorators"})

# With memory
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory()

# Agents với tools
from langchain.agents import create_react_agent
from langchain.tools import DuckDuckGoSearchRun

tools = [DuckDuckGoSearchRun()]
agent = create_react_agent(llm, tools, prompt)

Khi nào dùng: Khi cần flexibility, nhiều integrations, complex chains

LlamaIndex

Tối ưu cho RAG (Retrieval Augmented Generation):

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

# Load documents
documents = SimpleDirectoryReader("data/").load_data()

# Create index
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("What is the company's refund policy?")
print(response)

# Advanced: Custom LLM
llm = OpenAI(model="gpt-4-turbo", temperature=0)
query_engine = index.as_query_engine(llm=llm)

Khi nào dùng: Document Q&A, knowledge bases, RAG systems

Haystack

Production-ready NLP framework:

from haystack import Pipeline
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.generators import OpenAIGenerator

pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store))
pipe.add_component("generator", OpenAIGenerator())
pipe.connect("retriever", "generator")

result = pipe.run({"retriever": {"query": "What is RAG?"}})

Vector Databases

ChromaDB

import chromadb
from chromadb.utils import embedding_functions

# Create client
client = chromadb.Client()

# Create collection với OpenAI embeddings
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="your-key",
    model_name="text-embedding-3-small"
)

collection = client.create_collection(
    name="documents",
    embedding_function=openai_ef
)

# Add documents
collection.add(
    documents=["Document 1 content", "Document 2 content"],
    metadatas=[{"source": "doc1"}, {"source": "doc2"}],
    ids=["doc1", "doc2"]
)

# Query
results = collection.query(
    query_texts=["search query"],
    n_results=3
)

Pinecone

from pinecone import Pinecone

pc = Pinecone(api_key="your-key")
index = pc.Index("my-index")

# Upsert
index.upsert(vectors=[
    {"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"text": "..."}}
])

# Query
results = index.query(vector=[0.1, 0.2, ...], top_k=3, include_metadata=True)

So sánh Vector DBs

DatabaseTypeBest forPricing
ChromaDBEmbeddedPrototyping, small projectsFree
PineconeManagedProduction, scale$$$
WeaviateSelf-hosted/ManagedFlexibilityFree/$$
QdrantSelf-hostedPerformanceFree
pgvectorPostgres extensionExisting PostgresFree

Model Inference

Transformers (Hugging Face)

from transformers import pipeline

# Text generation
generator = pipeline("text-generation", model="meta-llama/Llama-3-8B")
result = generator("Hello, I am", max_length=50)

# Embeddings
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(["Hello world", "How are you"])

# Classification
classifier = pipeline("sentiment-analysis")
result = classifier("I love this product!")

vLLM – Fast Inference Server

# Start server
vllm serve meta-llama/Llama-3-8B --port 8000

# Client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

response = client.chat.completions.create(
    model="meta-llama/Llama-3-8B",
    messages=[{"role": "user", "content": "Hello!"}]
)

Ollama – Local LLMs

# Install & run
ollama run llama3

# Python client
import ollama
response = ollama.chat(model='llama3', messages=[
    {'role': 'user', 'content': 'Hello!'}
])

Observability & Monitoring

LangSmith

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-key"

# Automatic tracing của LangChain calls

Weights & Biases

import wandb
wandb.init(project="llm-experiment")

# Log metrics
wandb.log({"accuracy": 0.95, "loss": 0.05})

# Log LLM calls
wandb.log({"prompt": prompt, "response": response, "tokens": token_count})

MLflow

import mlflow

with mlflow.start_run():
    mlflow.log_param("model", "gpt-4")
    mlflow.log_metric("latency", 1.5)
    mlflow.log_artifact("model_config.json")

Data Processing

unstructured – Document parsing

from unstructured.partition.pdf import partition_pdf
from unstructured.partition.docx import partition_docx

elements = partition_pdf("document.pdf")
for element in elements:
    print(type(element).__name__, element.text[:100])

tiktoken – Token counting

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4")
tokens = enc.encode("Hello, world!")
print(f"Token count: {len(tokens)}")  # Token count: 4

Development Tools

Instructor – Structured outputs

import instructor
from pydantic import BaseModel
from openai import OpenAI

client = instructor.patch(OpenAI())

class User(BaseModel):
    name: str
    age: int
    email: str

user = client.chat.completions.create(
    model="gpt-4-turbo",
    response_model=User,
    messages=[{"role": "user", "content": "Extract: John is 30, email john@example.com"}]
)
# user.name = "John", user.age = 30, user.email = "john@example.com"

Guidance – Constrained generation

import guidance

program = guidance("""
The answer is {{select 'answer' options=['yes', 'no', 'maybe']}}
""")

result = program(llm=llm)

Recommended Stack cho 2024

Beginner Stack

  • OpenAI API + LangChain
  • ChromaDB (vector store)
  • Streamlit (UI)

Production Stack

  • LangChain + LlamaIndex
  • Pinecone hoặc Qdrant
  • LangSmith (monitoring)
  • FastAPI (backend)

Cost-optimized Stack

  • Ollama + Llama 3 (local)
  • ChromaDB (embedded)
  • Gradio (UI)

Fullstack Station Tips

Python AI stack 2024 đã mature đáng kể. Lời khuyên của mình:

  • Start with LangChain: Có learning curve nhưng đáng đầu tư
  • Prototype với ChromaDB: Sau đó migrate sang production DB
  • Local models với Ollama: Test miễn phí trước khi dùng API
  • Monitor từ đầu: LangSmith hoặc custom logging
  • Instructor cho structured output: Rất hữu ích cho data extraction

Tham khảo

Comments

Leave A Comment