dev_to 2026年3月7日

5 AI 網路資産プロジェクトが2026年に実際に採用され、就職を助けます

5 AI Portfolio Projects That Actually Get You Hired in 2026

Translated: 2026/3/7 14:04:48

pythonmachine-learningretrievalaugmentationaiportfoliolangchain

Japanese Translation

あなたのAI網羅性はチャットボットと感情分析器、ハッキング・フェイスの調査済みモデルがあります。他の候補者もそうです。

Original Content

Your AI portfolio has a chatbot, a sentiment analyzer, and a fine-tuned model on Hugging Face. So does every other candidate's. Hiring managers in 2026 scan for production signals. They want to see how you handle failures, structure data, connect systems, and ship working software. A Jupyter notebook with model.predict() tells them nothing about how you build real systems. These 5 projects are different. Each one teaches a specific production skill that AI engineering teams actually need. They're ordered from simplest to most complex. Build all 5 and you'll have a portfolio that demonstrates retrieval, structured outputs, autonomous agents, evaluation, and deployment — the exact skills showing up in job descriptions right now. What it proves: You can connect an LLM to real data, not just training data. RAG (Retrieval-Augmented Generation) is the most in-demand AI engineering skill in 2026. Every company building with LLMs needs someone who can ground model outputs in actual documents. # pip install langchain-chroma langchain-openai langchain-community from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings, ChatOpenAI from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter # Load and chunk documents loader = PyPDFLoader("company_handbook.pdf") docs = loader.load() splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, ) chunks = splitter.split_documents(docs) # Store in vector database vectorstore = Chroma.from_documents( documents=chunks, embedding=OpenAIEmbeddings(), persist_directory="./chroma_db", ) # Query with context retriever = vectorstore.as_retriever(search_kwargs={"k": 3}) results = retriever.invoke("What is the PTO policy?") llm = ChatOpenAI(model="gpt-4o-mini") context = "\n\n".join([doc.page_content for doc in results]) answer = llm.invoke( f"Answer based on this context only:\n{context}\n\nQuestion: What is the PTO policy?" ) print(answer.content) What makes this portfolio-worthy: Don't stop at the basic pipeline. Add these three features that separate you from tutorials: Chunk quality scoring — Log which chunks the retriever returns. Are they relevant? Add a relevance filter. Source attribution — Show the user exactly which document page the answer came from. Failure handling — What happens when no relevant chunks exist? Return "I don't have enough information" instead of hallucinating. Hiring manager signal: "This candidate understands that RAG isn't just similarity_search(). They handle edge cases." What it proves: You can get reliable, typed outputs from LLMs — not raw text. Every production AI system needs structured outputs. Parsing free text with regex is fragile. Modern LLMs can return validated Pydantic objects directly. # pip install langchain-openai pydantic from pydantic import BaseModel, Field from langchain_openai import ChatOpenAI class JobPosting(BaseModel): """Structured representation of a job posting.""" title: str = Field(description="Job title") company: str = Field(description="Company name") salary_min: int | None = Field( default=None, description="Minimum salary in USD" ) salary_max: int | None = Field( default=None, description="Maximum salary in USD" ) remote: bool = Field(description="Whether the role is remote") required_skills: list[str] = Field( description="List of required technical skills" ) llm = ChatOpenAI(model="gpt-4o-mini") structured_llm = llm.with_structured_output(JobPosting) raw_text = """ Senior ML Engineer at DataCorp. $180k-$220k. Fully remote. Must know Python, PyTorch, distributed training, and Kubernetes. """ result = structured_llm.invoke( f"Extract the job posting details:\n{raw_text}" ) print(result.title) # "Senior ML Engineer" print(result.salary_min) # 180000 print(result.required_skills) # ["Python", "PyTorch", ...] What makes this portfolio-worthy: Build a batch processor that extracts structured data from 100+ job postings. Show: Validation logic — What happens when the LLM returns salary_min as a string? Pydantic catches it. Batch efficiency — Process multiple postings concurrently with asyncio.gather(). Accuracy metrics — Compare extracted data against manually labeled samples. Report precision and recall. Hiring manager signal: "This candidate knows that LLM outputs are unreliable by default. They validate and measure." What it proves: You can build autonomous systems that take actions, not just generate text. Agents are the fastest-growing category in AI engineering. A tool-calling agent decides which function to call based on user input — and handles the result. # pip install langgraph langchain-openai from langchain_openai import ChatOpenAI from langchain_core.tools import tool from langgraph.prebuilt import create_react_agent @tool def get_weather(city: str) -> str: """Get current weather for a city.""" # In production: call a real weather API weather_data = { "London": "15°C, cloudy", "Tokyo": "22°C, sunny", "New York": "8°C, rain", } return weather_data.get(city, f"No data for {city}") @tool def convert_temperature(celsius: float) -> str: """Convert Celsius to Fahrenheit.""" fahrenheit = (celsius * 9 / 5) + 32 return f"{celsius}°C = {fahrenheit}°F" llm = ChatOpenAI(model="gpt-4o-mini") tools = [get_weather, convert_temperature] agent = create_react_agent(llm, tools) # The agent decides which tools to call result = agent.invoke( {"messages": [{"role": "user", "content": "What's the weather in Tokyo? Convert it to Fahrenheit."}]} ) for message in result["messages"]: print(f"{message.type}: {message.content}") What makes this portfolio-worthy: Add complexity that mirrors real production agents: Multi-step reasoning — Give it a task that requires 3+ tool calls in sequence. Error recovery — What happens when a tool call fails? Add retry logic with exponential backoff. Conversation memory — Use MemorySaver from LangGraph so the agent remembers previous interactions. from langgraph.checkpoint.memory import MemorySaver memory = MemorySaver() agent = create_react_agent(llm, tools, checkpointer=memory) # First message config = {"configurable": {"thread_id": "user-123"}} agent.invoke( {"messages": [{"role": "user", "content": "What's the weather in London?"}]}, config=config, ) # Follow-up — agent remembers the context agent.invoke( {"messages": [{"role": "user", "content": "Convert that to Fahrenheit"}]}, config=config, ) Hiring manager signal: "This candidate builds agents that recover from failures and maintain state. Not a one-shot demo." What it proves: You can measure whether your AI system actually works. Most AI engineers ship without evaluation. The ones who get hired know how to write tests for non-deterministic systems. DeepEval integrates with pytest to make this practical. # pip install deepeval from deepeval import assert_test from deepeval.test_case import LLMTestCase from deepeval.metrics import ( AnswerRelevancyMetric, HallucinationMetric, ) def test_rag_answer_relevancy(): """Test that RAG answers are relevant to the question.""" test_case = LLMTestCase( input="What is the refund policy?", actual_output="Our refund policy allows returns within 30 days of purchase with a valid receipt.", retrieval_context=[ "Refund Policy: Customers may return items within 30 days of purchase. A valid receipt is required.", ], ) metric = AnswerRelevancyMetric(threshold=0.7) assert_test(test_case, [metric]) def test_no_hallucination(): """Test that the model doesn't hallucinate beyond the context.""" test_case = LLMTestCase( input="What is the refund policy?", actual_output="Our refund policy allows returns within 30 days. We also offer free shipping on all orders.", retrieval_context=[ "Refund Policy: Customers may return items within 30 days of purchase.", ], ) metric = HallucinationMetric(threshold=0.5) # This should FAIL — "free shipping" is hallucinated assert_test(test_case, [metric]) Run it like any other test: deepeval test run test_evaluation.py What makes this portfolio-worthy: Test your own Project 1 — Write evaluation tests for your RAG pipeline. Measure hallucination rates across 50+ test cases. CI integration — Add DeepEval to a GitHub Actions workflow. Block merges when hallucination rate exceeds your threshold. Regression tracking — Show how your RAG pipeline improved over time. Before: 23% hallucination rate. After tuning chunk size and retriever: 4%. Hiring manager signal: "This candidate doesn't just build AI — they prove it works. They think about reliability." What it proves: You can ship AI as a service, not a notebook. The gap between "works on my laptop" and "deployed and callable" is where most candidates stop. Bridge it. # pip install fastapi uvicorn from fastapi import FastAPI, HTTPException from pydantic import BaseModel app = FastAPI(title="AI Portfolio API") class QuestionRequest(BaseModel): question: str document_id: str | None = None class AnswerResponse(BaseModel): answer: str sources: list[str] confidence: float @app.post("/ask", response_model=AnswerResponse) async def ask_question(request: QuestionRequest): try: # Connect to your RAG pipeline from Project 1 results = retriever.invoke(request.question) if not results: raise HTTPException( status_code=404, detail="No relevant documents found", ) context = "\n\n".join([doc.page_content for doc in results]) answer = llm.invoke( f"Answer based on this context only:\n{context}\n\nQuestion: {request.question}" ) return AnswerResponse( answer=answer.content, sources=[doc.metadata.get("source", "unknown") for doc in results], confidence=0.85, ) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) @app.get("/health") async def health_check(): return {"status": "healthy", "model": "gpt-4o-mini"} What makes this portfolio-worthy: Dockerfile — Containerize the entire stack. One docker compose up to run everything. Rate limiting — Add slowapi to prevent abuse. Show you think about production concerns. Monitoring — Add a /metrics endpoint that tracks request count, latency, and error rate. FROM python:3.12-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 8000 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] Hiring manager signal: "This candidate ships. They don't just prototype — they deploy." A GitHub repo with code is necessary but not sufficient. Here's what separates hired candidates from ignored ones: Write a README that answers three questions: What does this project do? (One sentence.) How do I run it? (pip install + one command.) What did you learn? (The interesting engineering decisions you made.) Include a DECISIONS.md file. Document why you chose ChromaDB over Pinecone. Why you used gpt-4o-mini instead of gpt-4o. Why your chunk size is 1,000 tokens. Hiring managers want to see your reasoning, not just your code. Record a 2-minute Loom video. Walk through the project running. Show the terminal output. Explain one interesting failure you encountered and how you fixed it. This alone puts you ahead of 90% of applicants. Link your projects together. Project 5 wraps Project 1. Project 4 tests Project 1. Project 3 uses the same patterns as Project 2. A portfolio with connected projects shows systems thinking. The AI job market in 2026 rewards builders over learners. Hiring managers see hundreds of "completed the LLM course" portfolios every week. They hire the candidates who show production instincts: error handling, evaluation, deployment, and structured thinking. Build these 5 projects. Connect them. Deploy them. Document your decisions. That's a portfolio that gets callbacks. Follow @klement_gunndu for more AI engineering content. We're building in public.