Blog · Buyer's Guide · 11 min read · April 20, 2026

How to Choose an AI Chatbot: 2026 Buyer's Guide & Checklist

Every vendor will tell you they're the best AI chatbot on the market. This guide is the opposite — a platform-neutral checklist for evaluating chatbots in 2026, built from real deployments, real regrets, and the questions vendors don't want on your RFP.

TL;DR — the 5 non-negotiables

  1. Real LLM (GPT-4/Claude/Gemini), not 2022-era intent classifiers.
  2. Source citation in every AI answer (no hallucination, no mystery).
  3. Shared inbox with full conversation context for human handoff.
  4. Native integration with the channel you actually use (website, Slack, Teams, WhatsApp).
  5. Transparent pricing — no surprise conversation-overage bills.

Step 1: Define the Use Case Before You Look at Vendors

Nine out of ten bad chatbot purchases come from skipping this step. Vendors look interchangeable in a demo — they all show the same impressive answer to "what's your refund policy?" But they're built for different jobs. Write down, on one page:

  • Primary use case — external customer support? sales lead qualification? internal IT helpdesk? internal HR?
  • Primary channel — website widget? Slack? Microsoft Teams? WhatsApp? all of the above?
  • Primary audience — existing customers? prospects? employees? patients?
  • Top 5 questions you expect the bot to answer — write them down verbatim.
  • Top 3 questions you DO NOT want the bot to attempt — medical advice, legal advice, pricing negotiation.

If you can't fill this page in under an hour, you're not ready to shortlist. Read the related AI chatbot definition guide to clarify scope.

Step 2: The 9 Evaluation Criteria That Matter

Ignore feature matrices with 100 checkmarks. Score each vendor, 1–5, on these nine dimensions:

  1. Answer quality — ask 10 real questions from your ticket backlog. Did it answer accurately, with citations? Did it admit uncertainty?
  2. Training ease — how long to index your content? Does it recrawl when content changes? Can you exclude pages?
  3. Hallucination rate — push the bot outside its scope. Does it fabricate or refuse cleanly?
  4. Human handoff — when the bot escalates, does the human agent see the full conversation with source links? Or does the customer have to repeat themselves?
  5. Channel fit — if you need Microsoft Teams, is it native or an afterthought? Test it, don't trust the feature list.
  6. Integrations — CRM (HubSpot, Salesforce), ticketing (Zendesk), e-commerce (Shopify), calendar. Test one live, not just the logo page.
  7. Analytics & reporting — can you see which questions the bot struggled with? Where it escalated? Which answers satisfied customers?
  8. Security & compliance — SOC 2? HIPAA (if healthcare)? GDPR data residency? SSO? A real security posture, not a marketing page.
  9. Total cost — base fee + conversation overages + seats + premium integrations. Calculate it for 6 months at your projected volume.

Step 3: The 22-Point Evaluation Checklist

Copy this into your shortlist scoring doc. Score each item Yes / No / Partial. A vendor needs at least 16 Yes before they're worth a pilot.

AI capability

  • Uses a current frontier LLM (GPT-4+, Claude 3.7+, Gemini 2+)
  • Cites source documents in every AI answer
  • Admits "I don't know" instead of fabricating
  • Supports RAG (retrieval-augmented generation) from your content
  • Multilingual — detects visitor language and replies in kind

Operations

  • Shared inbox for human handoff with full context
  • Routes conversations by topic, language, or business hours
  • Auto-crawls your website/docs on a schedule
  • Provides analytics on resolution rate, escalation reasons, CSAT
  • Allows custom prompts and persona configuration

Channels & integrations

  • Website widget with one-line embed
  • Native Microsoft Teams integration (if relevant)
  • Native Slack integration (if relevant)
  • WhatsApp Business API (if relevant)
  • CRM integration you already use
  • Zapier / webhook support for custom workflows

Security & commercial

  • SOC 2 Type II (or equivalent) audit report on request
  • HIPAA BAA available (if healthcare)
  • GDPR-compliant with data residency options
  • SSO (SAML / OIDC) for team auth
  • Transparent pricing with clear overage rules
  • Month-to-month option (no forced annual)

Step 4: Run a Real Pilot — Not a Sales Demo

Every vendor demo will be flawless — they've rehearsed it. The only way to know the truth is a 7–14 day pilot with real content and real tickets. Structure it like this:

  1. Train the bot on your actual knowledge base — not a sanitized demo dataset.
  2. Feed it 50 real questions from last month's tickets. Grade each: correct / wrong / unsure.
  3. Push it to fail — ask 10 out-of-scope questions. A good bot refuses; a bad bot fabricates.
  4. Test handoff — trigger 5 escalations. Did the human agent get the full context? How long did the first human reply take?
  5. Measure resolution rate — percent of conversations resolved without human handoff. Aim for 60%+ on simple queries.

Pricing Traps to Avoid

  • "Free trial" that requires a credit card — these auto-renew. Prefer truly free plans like the EzyConn free plan.
  • Per-message AI fees layered on top of per-seat fees — do the math at your real volume.
  • "Premium integrations" paywalled behind Enterprise when Salesforce is on every public comparison page.
  • Annual prepay discounts that lock you in before you've validated resolution rate.
  • "Contact sales for pricing" with no public tier — a sign the quote depends on how much they can extract.

For a detailed pricing breakdown across the market, read our AI chatbot pricing guide.

Questions Vendors Don't Want on Your RFP

  • "What percentage of conversations does your bot resolve without human handoff, across your customer base?"
  • "How often do you update the underlying LLM? At what cost to us?"
  • "Show me three recent customer cases where resolution rate dropped — and what you did about it."
  • "Who owns the conversation data? Can you train on it? What's the opt-out?"
  • "If we leave, do we get a complete export of conversations and training data within 30 days?"

Build vs Buy

Every 6 months a founder asks: "should we just build this with the OpenAI API?" Unless you have a dedicated ML team and a use case so unique no vendor fits, the answer is buy. Building covers the 10% of the surface area you see — the chat UI. It doesn't cover: RAG pipeline, vector DB, content recrawl, human handoff tooling, analytics, escalation rules, multi-channel delivery, compliance, model updates, prompt versioning, or the ten edge cases you'll discover the hard way. Total cost typically runs 5–10× over an equivalent SaaS.

If you're still curious, our engineering team wrote train an AI chatbot with RAG on what goes into the pipeline.

Final Shortlist — 3 vendors, 14 days, same tests

End the buying process by running 3 vendors through the same 22-point checklist, the same 50 real questions, the same pilot window, and the same total-cost model. The winner isn't always the most familiar name. It's the one with the highest resolution rate at the lowest total cost that fits your channels and security posture.

If EzyConn belongs on your shortlist, start with the free plan — no credit card, forever free, full widget, 100 AI conversations/month to test resolution rate with your real content.

Buyer's Guide FAQ

How long should an AI chatbot pilot run?

7–14 days with real content and real tickets is enough to measure resolution rate and handoff quality. Longer pilots tend to obscure decisions rather than clarify them.

What resolution rate is realistic?

60–80% for repetitive support queries. Lower than 60% usually means the bot needs better content, not a different vendor. Above 80% is achievable on narrow, well-documented use cases.

Should I buy on features or price?

Neither. Buy on resolution rate × total cost, weighted by integration fit with your stack.

Related resources