dev_to 2026年4月25日

IP-Adapter + LoRA を活用した製品カタログのレンダリング——ショップ商品を実在の AI 人物に装着させる方法

IP-Adapter + LoRA for product catalog rendering — putting shop items on AI characters

Translated: 2026/4/25 3:00:35

ip-adapterloracomfyuiai-image-generationstable-diffusion

Japanese Translation

📦 リンカブルなワークフロー：github.com/sm1ck/honeychat/tree/main/tutorial/04-ipadapter — ComfyUI の workflow.json（IP-Adapter の weight/end_at のプレースホルダー付き）と、ComfyUI インスタンスにこれを投稿し出力を保存する標準ライブラリ Python クライアント。前回の投稿では、LoRA が人物の視覚的アイデンティティに最も適していることを主張しました。しかし、そのキャラクターが特定のアイテム——ショップ商品、ユーザーアップロードした衣装、または他のユーザーからの贈り物——を着てレンダリングしたい場合、どうでしょうか？ LoRA はキャラクターを安定化させます。さらに任意の参照画像を保証するためには、IP-Adapter が一般的です。这两技术相互竞争し、慎重に設定しなければなりません。LoRA はキャラクターの顔を安定させます。IP-Adapter は参照画像から特徴を抽出します。両方がサンプリングの後半で強すぎると、顔が参照画像方向へ歪みます。バランス：IP-Adapter のウェイトを中等度（0–1 の前半）にし、早期ハンドオフ（デノイジングの最終ステップ前に IP-Adapter が制御を解く）を行います。最終ステップは LoRA に委ねます。利用すべきノード順序：Checkpoint → LoRA → FreeU → IP-Adapter → KSampler。LoRA に対して、IP-Adapter がモデルの条件化に与えるものを後付けすることで、LoRA が後期ステップで再主張できます。このセクションでは、クローンから生成された画像へ 10 分未満で進みます。 1. プレリミナーズ実行可能な ComfyUI インスタンス（ローカル GPU、レンタルボックス、または友人の） ComfyUI_IPAdapter_plus のインストール models/ipadapter/ に ip-adapter-plus_sdxl_vit-h.safetensors models/clip_vision/ に CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors ご自身の SDXL ベースチェックポイントキャラクター用の LoRA——もし持っていなければ、前回の記事を参照してください 2. クローンとクライアントのインストール git clone https://github.com/sm1ck/honeychat cd honeychat/tutorial/04-ipadapter pip install -e . 3. クライアントの隣に衣装の参照画像を配置 Flat-lay やクリーンバックグラウンドの画像が最も効果的です。 ./my-dress.png（今回の例） 4. 実行——両方の調整範囲の真ん中から開始 export COMFY_URL=http://localhost:8188 export REFERENCE_IMAGE=./my-dress.png export CHECKPOINT=your-sdxl-base.safetensors export LORA=your-character-v1.safetensors export IPADAPTER_WEIGHT=0.4 # 0–1 の前半 export IPADAPTER_END_AT=0.8 # 0–1 の後半 python client.py 出力は ./out/outfit_preview_.png に保存されます。最初の実行では、通常、参照するドレスが似ているキャラクターの姿が表示されます。 5. アジュストメント出力を点検します。2 つの失敗モードがあります： - 顔が歪む → IPADAPTER_WEIGHT を下げる、または IPADAPTER_END_AT を 0.05 下げて再実行 - アイテムが参照画像に似ていない → IPADAPTER_WEIGHT を 0.05 上げる、または IPADAPTER_END_AT を少し上げるステップ幅は 0.05 とし、0.1 とはしないこと。使える範囲は想定より狭いことがあり、新しいベースモデルの場合、バランスが安定するまで数回の調整スweep を要します。 6. workflow.json を pytest で検証 pip install -e ".[dev]" pytest -v 5 つのテストが、workflow.json の有効な JSON であることを保証し、すべてのノードクラスが参照され、プレースホルダーがリアルな値でコミットされていないことを確認します。あなたは Anna というキャラクターを持っています。カスタム LoRA で安定化され、代わり映えるものごとに現れます。今、ユーザーはあなたのショップで特定のドレスを買いました。ドレスは参照画像です。あなたは求めています： - Anna の顔——変更なし - この特定のドレス——Anna への忠実なレンダリングプロンプトエンジニアリングだけでこれを保証することはできません。「Anna wearing a red silk dress with a white collar」というプロンプトは、必然的にこの赤いシルクドレスを生成するわけではありません。SKU レベルの忠実性は、生成パスに参照画像を含める必要があります。 IP-Adapter は参照画像から特徴を抽出し、モデルのクロス注意に引き込みます。それを高く設定すると、参照画像を激しく保存する可能性があります——もしあれば、その顔さえも。参照画像が着用されない製品写真でも、IP-Adapter は照明、背景、スタイリングを参照写真から引き込む可能性があります。ウェイトが高い場合、Anna の顔は参照にある誰か（または何か）に似始めてしまいます。照明とポーズも参照に偏ります。ウェイトが低い場合、...

Original Content

📦 Runnable workflow: github.com/sm1ck/honeychat/tree/main/tutorial/04-ipadapter — a ComfyUI workflow.json (with placeholders for IP-Adapter weight/end_at) plus a stdlib Python client that posts it to your ComfyUI instance and saves the output. In the previous post I argued that LoRA per character is often the strongest fit for visual identity. But what happens when you want to render that character wearing a specific item — a shop product, a user-uploaded outfit, a gift from another user? LoRA helps stabilize the character. To also preserve an arbitrary reference image, IP-Adapter is a common fit. Those two techniques can compete unless you configure them carefully. LoRA stabilizes the character's face. IP-Adapter pulls features from a reference image. If both are too strong late in sampling, the face can drift toward the reference. Balance: moderate IP-Adapter weight (lower half of 0–1) with early handoff (IP-Adapter releases control before the final denoising steps). The final steps belong to the LoRA. A useful node order: Checkpoint → LoRA → FreeU → IP-Adapter → KSampler. Feeding IP-Adapter into the model conditioning after LoRA lets LoRA reassert on late steps. This section walks you from clone to a generated image in under ten minutes. 1. Prereqs A running ComfyUI instance (local GPU, rented box, or a friend's) ComfyUI_IPAdapter_plus installed in it ip-adapter-plus_sdxl_vit-h.safetensors in models/ipadapter/ CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors in models/clip_vision/ Your own SDXL base checkpoint A character LoRA — if you don't have one, go through the previous article first 2. Clone and install the client git clone https://github.com/sm1ck/honeychat cd honeychat/tutorial/04-ipadapter pip install -e . 3. Put your outfit reference next to the client Anything flat-lay, clean-background works best. ./my-dress.png for this example. 4. Run — start at the middle of both tuning ranges export COMFY_URL=http://localhost:8188 export REFERENCE_IMAGE=./my-dress.png export CHECKPOINT=your-sdxl-base.safetensors export LORA=your-character-v1.safetensors export IPADAPTER_WEIGHT=0.4 # lower half of 0–1 export IPADAPTER_END_AT=0.8 # upper half of 0–1 python client.py Output lands in ./out/outfit_preview_.png. First run should usually show your character wearing something that resembles the reference dress. 5. Tune Inspect the output. Two failure modes tell you how to adjust: Face drifted → lower IPADAPTER_WEIGHT or lower IPADAPTER_END_AT by 0.05 and re-run. Item doesn't resemble the reference → raise IPADAPTER_WEIGHT by 0.05, or raise IPADAPTER_END_AT slightly. Sweep in 0.05 steps, not 0.1. The usable range can be narrower than expected, and a new base model may take several tuning sweeps before the balance feels stable. 6. Validate the workflow JSON with pytest pip install -e ".[dev]" pytest -v Five tests make sure workflow.json stays valid JSON, every node class is still referenced, and placeholders haven't been accidentally committed with real values. You have a character (Anna) stabilized by a custom LoRA. She appears reasonably consistent across generations. Now the user buys a specific dress in your shop. The dress is a reference image. You want: Anna's face — unchanged. This specific dress — rendered faithfully on Anna. Prompt engineering usually can't guarantee this. "Anna wearing a red silk dress with a white collar" generates a red silk dress, not necessarily this red silk dress. SKU-level fidelity needs the reference image in the generation path. IP-Adapter pulls features from a reference image into the model's cross-attention. If you set it too high, it can preserve the reference image aggressively — including its face, if there is one. Even if the reference is an unworn product shot, IP-Adapter can pull in lighting, backdrop, and styling from the reference photo. At high weight: Anna's face may start looking more like whoever (or whatever) is in the reference. Lighting and pose can bias toward the reference. At low weight: The character is fine. The dress is approximately the right color and cut but not recognizable as this dress. Your product catalog becomes decorative rather than accurate. The two knobs that matter are weight and end_at. Weight — the multiplier on IP-Adapter's contribution to cross-attention. Below the lower-middle of the 0–1 range, the reference is a "mood" more than a fact. Above the upper-middle, the reference dominates. Somewhere in the lower half is where you find the range that preserves item identity without killing face identity. end_at — the fraction of denoising steps during which IP-Adapter is active. If it runs through all steps, it has a say in the final face details. If it ends earlier (say 70–90% of the way through), the last steps belong to the rest of the pipeline, and LoRA face features reassert. In rough terms: the item gets baked in during the middle of denoising, the face re-sharpens at the end. [Checkpoint Loader] → [LoRA Loader: character_lora] → [FreeU: quality touch-up] → [IPAdapter Advanced: reference, weight=W, end_at=E] → [KSampler] → [VAE Decode] Two things about this order: LoRA comes before IP-Adapter in the chain. The LoRA modifies the checkpoint weights; IP-Adapter modifies cross-attention during sampling. When IP-Adapter ends at step end_at, the remaining steps operate on the LoRA-modified weights without IP-Adapter influence — this is what lets the face reassert. FreeU is optional. It's a noise rebalance that improves quality without adding compute. The tutorial client takes the base workflow.json, rewrites the placeholders with env-supplied values, uploads the reference image to ComfyUI, and queues the prompt: def rewrite_workflow(wf: dict[str, Any], args: argparse.Namespace, ref_filename: str) -> dict[str, Any]: """Fill in the `` and `` placeholders with actual values.""" wf = json.loads(json.dumps(wf)) # deep copy if args.checkpoint: wf["1"]["inputs"]["ckpt_name"] = args.checkpoint if args.lora: wf["2"]["inputs"]["lora_name"] = args.lora wf["2"]["inputs"]["strength_model"] = args.lora_strength wf["2"]["inputs"]["strength_clip"] = args.lora_strength wf["5"]["inputs"]["image"] = ref_filename wf["6"]["inputs"]["weight"] = args.weight wf["6"]["inputs"]["end_at"] = args.end_at wf["7"]["inputs"]["text"] = args.prompt wf["10"]["inputs"]["seed"] = int(time.time()) & 0xFFFFFFFF return wf → full source The full workflow.json in the tutorial folder ships with placeholders on every field you should touch. The test suite asserts those placeholders stay in the template — a safety net against accidentally committing your tuned production values. The practical process: Pick a reference item with a clean product photo. Pick a character with a strong LoRA. Render around weight=0.3, end_at=0.8. Check face, check item. Face drifts → lower weight or lower end_at. Item doesn't resemble the reference → raise weight carefully, or leave weight and raise end_at. Sweep in 0.05 increments, not 0.1. The usable range is narrower than you'd expect. Several tuning sweeps on realistic and anime bases usually land you on a working pair. Outfit catalog as reference images. Each shop item has a reference image stored in object storage. At generation time, pass the reference URL to the GPU worker, which downloads it once and caches. Catalog pre-rendering for previews. When a user browses the shop, they see a preview of each item rendered on their active character. These previews don't need to happen on every page load — generate them asynchronously (Celery worker), store in S3, serve from cache. Consistency across image and video. The same IP-Adapter + LoRA pair used for images can often drive the start-frame of video generation (e.g., Kling). Tune the still-image path first, then reuse it carefully. Fallback when the item isn't visual. Some "items" in a shop are stats buffs, relationship flags, or dialogue unlocks — things without a visual. Gate the IP-Adapter pathway to items flagged as visual-only. Face drifted on a noticeable slice of catalog previews. Running IP-Adapter weight too high "for stronger outfit adherence." Rolled back to the lower-half range after face-drift complaints spiked. Lesson: tune one variable at a time, even when it feels slow. Cached reference URLs expired. Shop items in S3 had time-limited presigned URLs. Generation workers fetched the URL at queue-time, but the URL expired before ComfyUI actually downloaded it. Fix: pre-fetch on the worker side, pass the ComfyUI-side filename instead of the external URL. IP-Adapter model version mismatch with SDXL base. IP-Adapter Plus ships multiple weights keyed to specific SDXL base models. Mixing can produce worse output without an obvious runtime error — just lower fidelity. Pin the IP-Adapter version to the base in your deployment config. Non-visual shop items crashed the workflow. The API tried to render "stat boost" items through the image pipeline. Fix: a visual: true|false flag on catalog entries, checked at the API boundary before queuing. Start with a clean catalog. Reference images with consistent backgrounds, consistent lighting, no model already wearing the item if possible. Version the tuning. When you move base models, your IP-Adapter weight/end_at values probably move too. Treat them as part of the deployment, not as constants. Cache the pre-rendered previews aggressively. A character × item grid grows multiplicatively. Pre-render on character creation and on new item add. HoneyChat's shop renders outfits, accessories, and gifts on active characters using IP-Adapter Plus layered over per-character LoRA. Public architecture doc: github.com/sm1ck/honeychat/blob/main/docs/architecture.md. IP-Adapter (tencent-ailab) ComfyUI IPAdapter Plus extension FreeU paper SDXL base model If you've shipped an IP-Adapter + LoRA combo in production, I'm curious what weight / end_at pairs you landed on and for which base. The sweet spot seems to shift meaningfully between anime and realistic bases.