← Blog/Product

How We Built an AI Email Client That Never Sees Your Emails

A deep dive into UpInbox's client-side classification architecture: heuristic cascades, dual AI safety checks, self-learning correction loops, and why we open-sourced the classifier.

Greg Bibas

Greg Bibas

Founder & CEO·March 19, 2026·12 min read

Share

The privacy problem nobody talks about

In 2017, the FTC settled with Unroll.Me after it turned out they were selling user email data to Uber. The service that promised to "clean up your inbox" was quietly monetizing the contents of it. Uber used the data to track Lyft receipts and gauge competitor market share. Millions of users had no idea.

You'd think the industry learned something. It didn't.

Every major email productivity tool today processes your emails on their servers. Superhuman syncs your entire mailbox to their infrastructure. SaneBox routes emails through their classification servers. Clean Email does the same. Shortwave, the latest YC darling, ingests your full inbox into their AI pipeline. They all have privacy policies that say they won't misuse your data. Unroll.Me had one too.

When we started building UpInbox, we asked a different question: what if the AI never saw the emails at all?

This isn't a marketing claim. It's an architectural constraint we designed the entire system around. Your emails are classified inside your browser, in the Chrome extension process. The raw email content — subject lines, bodies, sender addresses — never leaves your machine. Not to our servers. Not to any server. The only thing that crosses a network boundary is an optional, user-initiated API call to an AI provider the user chose and configured themselves.

Here's how we pulled it off.

Architecture: how client-side classification works

The pipeline is simple to describe and surprisingly tricky to get right:

Gmail API → Chrome Extension → Local Heuristics → Optional AI → IndexedDB

The Chrome extension authenticates directly with Gmail's REST API using the user's OAuth token. We fetch message metadata and (when needed) partial bodies using Gmail's messages.get with format=metadata. The batch endpoint lets us process 50 emails in a single HTTP request, which matters when you're classifying a backlog of thousands.

Once the metadata lands in the extension's service worker, it hits our 7-rule heuristic cascade. Each rule evaluates against the email independently and produces a confidence score between 0 and 1. Rules are evaluated in priority order. The first high-confidence match (>0.85) wins.

PriorityRuleSignal
1NewsletterList-Unsubscribe header, list-unsubscribe mailto/URL, bulk precedence
2PromotionMarketing sender patterns, promotional keywords in subject, known ad domains
3Receipt / TransactionTransaction keywords (receipt, order confirmed, payment) + currency/amount patterns
4SocialKnown platform notification senders (LinkedIn, Twitter/X, Facebook, GitHub, etc.)
5Automatednoreply@, no-reply@, donotreply@ sender patterns, absence of personal signals
6ExpiredUnread + older than user-configured threshold (default: 14 days)
7Action RequiredDirect address (to: matches user), question marks in subject, reply-expected patterns

Here's a simplified version of the pipeline:

function classify(email: EmailMetadata): Classification {
  const rules = [
    newsletterRule,
    promotionRule,
    receiptRule,
    socialRule,
    automatedRule,
    expiredRule,
    actionRequiredRule,
  ];

  for (const rule of rules) {
    const result = rule.evaluate(email);
    if (result.confidence > 0.85) {
      return {
        category: result.category,
        confidence: result.confidence,
        source: "heuristic",
        ruleId: rule.id,
      };
    }
  }

  // No high-confidence match — falls to AI or "uncategorized"
  return { category: "uncategorized", confidence: 0, source: "none" };
}

This alone gets ~78% accuracy on a test set of 12,000 emails across 40 real inboxes. Most of the remaining 22% are ambiguous cases: is that Notion update a "social notification" or an "automated message"? Is that Calendly email a "receipt" or "action required"?

78% is fine for a demo. We wanted 95%+.

The AI Safety Check — dual classification

When the user has an AI provider configured (more on that in a moment), we don't just hand off the uncertain cases to the model. We run both the heuristic cascade and the AI classification on every email. Every single one.

This sounds wasteful. It's not. Here's why.

If the heuristic engine says "Newsletter, 0.92 confidence" and the AI says "Newsletter" — great, high confidence, move on. The AI call cost us a fraction of a cent and confirmed what we already knew.

If the heuristic engine says "Newsletter, 0.92" and the AI says "Action Required" — that's interesting. We flag it as an AI correction. We record the disagreement: the email metadata (sender domain, header patterns, subject structure), what the heuristic said, what the AI said, and what the AI's reasoning was.

Over time, these disagreements form a dataset. And that dataset trains the self-learning system.

We call this the Safety Check. It means the AI can never silently misclassify an email that the heuristic rules would have caught correctly. If the AI hallucinates — and models do hallucinate on classification tasks, more often than you'd think — the heuristic result is right there as a backstop. The user sees the heuristic classification by default, and the AI correction is surfaced as a suggestion they can accept or dismiss.

Nobody else does this. Every other AI email tool treats the model as the single source of truth. That's fine until the model confidently tells you that a wire transfer confirmation is a promotional email. We've seen it happen.

Dual classification with correction tracking is, as far as we can tell, novel in this space. If someone's published prior work on it, we'd genuinely love to know — email us.

Self-learning: how corrections become rules

Users correct classifications. They always do. Someone moves an email from "Newsletter" to "Action Required" because their accountant sends a monthly summary via Mailchimp (yes, really). Someone marks a GitHub notification as "Automated" instead of "Social" because they don't consider GitHub a social platform.

Every correction is a signal. Here's what we record:

  • Sender domain — e.g., notifications@github.com
  • Subject patterns — tokenized, not the raw subject line
  • Original category — what the system predicted
  • Corrected category — what the user chose
  • Timestamp and context — how many emails from this sender were corrected

All of this stays in IndexedDB, local to the user's browser. This is the per-user learning layer.

After N corrections for the same pattern (default: 3 for the same sender domain), the system suggests a new heuristic rule: "You've reclassified 5 emails from notifications@github.com as Automated. Create a rule?" The user can accept or dismiss.

Accepted rules get injected into the heuristic cascade at the appropriate priority level. Now every future email from that sender is classified correctly without an AI call. The system literally gets smarter by being wrong.

Community learning is the second layer, available to paid users. Here's how it works:

  • When a user creates a correction rule, we offer to share an anonymized mapping: domain → category
  • Example: noreply@notion.so → Automated (confidence: 0.94, based on 847 users)
  • When a domain mapping reaches high confidence across many users, it gets promoted to a global rule shipped with the extension

What's shared: sender domain, category, confidence score. That's it.

What's never shared: email content, subject lines, email addresses, anything personally identifiable.

The classifier gets smarter every day without us writing a single line of code. The global rule set grows organically from real user behavior, not from us manually curating sender lists.

BYOK: why we don't want your API key either

BYOK = Bring Your Own Key. Users paste their OpenAI, Anthropic, or Google AI API key into the extension settings. That key is stored in chrome.storage.local — encrypted, never synced, never transmitted to us.

When the extension needs an AI classification, it makes a direct API call from the browser to the provider. The request goes from your Chrome extension to api.openai.com or generativelanguage.googleapis.com. The traffic never routes through UpGPT's infrastructure. We couldn't read it if we wanted to.

We support four providers:

ProviderModelCost per 500 emails/moNotes
GoogleGemini Flash~$0.005Cheapest. Surprisingly good at classification.
OpenAIGPT-4o Mini~$0.01Solid all-rounder.
AnthropicClaude Haiku~$0.01Best at nuanced "is this actually important?" calls.
UpGPTProxy (managed)Included in paid planFor users who don't want to manage API keys.

Let's do the math. A typical user gets ~500 emails per month. Each classification prompt is roughly 200-300 tokens of metadata (we send headers and a truncated subject, not the full body). That's ~150K tokens/month total. At Gemini Flash pricing ($0.075 per 1M input tokens), that's about half a cent per month.

Most email tools charge $10-33/month for AI features. With BYOK, the actual AI cost is $0.005. The markup on server-side email AI is, to put it diplomatically, significant.

We charge for UpInbox premium features — advanced rules, community learning, priority support, the managed proxy. But we refuse to charge a 2,000x markup on AI inference and call it a product. That felt wrong.

Open-sourcing the classifier

We're publishing the heuristic classification pipeline as an open-source npm package: @upgpt/email-classifier.

The license is dual: free with "Powered by UpGPT.ai" attribution, or a paid commercial license if you want to white-label it. Standard open-core model.

Why open source?

  • Trust. If we claim emails never leave your browser, you should be able to verify that. Read the code.
  • Community contributions. Every new heuristic rule, every edge case fix, every sender domain mapping makes the system better for everyone.
  • Developer adoption. If you're building an email tool, you shouldn't have to reinvent the classification wheel. Use ours. Make it better.

What's included:

  • 7-rule heuristic cascade engine with confidence scoring
  • Pluggable AI provider interface (OpenAI, Anthropic, Google, or bring your own)
  • Safety Check dual-classification pattern
  • Self-learning correction tracker with rule suggestion
  • Full TypeScript, zero runtime dependencies
  • Comprehensive test suite (~200 test emails across all categories)

What's NOT included:

  • The Chrome extension UI and UX layer
  • Gmail API integration and OAuth handling
  • UpGPT proxy infrastructure
  • Community learning aggregation server

npm install @upgpt/email-classifier — and you've got a production-grade email classification engine in about 4 lines of code.

What we learned building this

Hard-won lessons from 3 months of building, testing, and iterating with real inboxes:

  • Gmail's REST API is surprisingly good for extension use. The batch endpoint handles 50 emails in one request. The format=metadata option gives you headers without downloading full bodies. Pagination with nextPageToken is rock-solid. If you're building a Gmail extension, the API is your friend — not the DOM.
  • IndexedDB is fast enough. We store 37,000+ email classifications in IndexedDB without noticeable lag. Reads are sub-millisecond for single records. Bulk writes of 50 classifications take ~15ms. The API is callback-hell, but libraries like idb make it tolerable. Don't let anyone tell you client-side storage can't scale.
  • Manifest V3 service workers die after 30 seconds. This is the single biggest pain point of modern Chrome extension development. Your background script is not persistent. It will be killed mid-operation. Design everything to be stateless and resumable. We checkpoint progress in chrome.storage.local every 10 emails so a killed worker can pick up where it left off.
  • Heuristics beat AI for obvious cases. Don't send a $0.001 API call to classify a Stripe receipt when the From header literally contains receipt@stripe.com. Rule-based systems aren't sexy, but they're fast, free, and deterministic. Save the AI budget for the cases that actually need judgment.
  • Users correct classifications more than you'd expect. This is a feature, not a bug. Every correction is training data. We built the entire self-learning system because our early beta users were so enthusiastic about telling us we were wrong. Lean into it.
  • The hardest part wasn't the AI. It was making "Archive" not feel like "Delete" to users who don't know the difference. We spent more time on the UX copy for the archive confirmation dialog than on the entire heuristic engine. "Move to Archive (you can always find it later)" tested 3x better than "Archive" alone.
  • Chrome Web Store review is a black box. Our first submission was rejected for "insufficient justification of host permissions." The second was rejected for the same reason with identical permissions but a longer description. The third was approved. We changed nothing substantive. Budget 2-3 weeks for review cycles.

Try it yourself

UpInbox is free to install from the Chrome Web Store. The free tier includes the full heuristic classification engine, bulk archive, and unsubscribe management. No account required. No API key required. It just works.

The classification pipeline is open source on GitHub at github.com/upgpt/email-classifier. Star it, fork it, file issues, submit PRs. We review everything.

If you're building an email tool, the @upgpt/email-classifier npm package might save you months of work. The heuristic engine alone handles ~78% of classifications without any AI. Add the Safety Check pattern and you're at 95%+ with minimal API spend.

We built this because we wanted an email tool we could actually trust — one where "we don't read your emails" is enforced by architecture, not by a privacy policy that can change next quarter. If you feel the same way, give it a try.

Questions? Feedback? Found a bug? greg@upgpt.ai or open an issue on GitHub.

Share
privacyemailchrome extensionaiopen sourceclient-side aibyok