Skip to content

Why I Am Building Glide

Published: at 04:33 AM
0 views

There’s a moment I keep coming back to. A principal data engineer — twelve years of experience, built the real-time pipelines behind two Fortune 100 recommendation engines, open-source contributions with thousands of stars — sitting across the table from me in a coffee shop. She’d been looking for four months. Not because she wasn’t exceptional. She was. But she’d sent 140 applications and received three responses. Two ghosted her after the first round. The third lowballed her by 40%.

She wasn’t failing. The system was failing her.

I’ve sat across the table from thousands of people like her. Pharma researchers. Principal engineers. Directors of product three months into a search that was supposed to take two weeks. People who are demonstrably excellent at what they do, crushed by a process that was never designed to surface excellence. It was designed to process volume.

That distinction matters. It’s the whole point of this essay, and of Glide.

The gap between how good someone is and how well they navigate the hiring system is not a minor inefficiency. It’s a structural failure that has gotten measurably, quantifiably worse. Applications per role have nearly tripled since 2017. Success rates hover around 0.4% per application. Over half of candidates report being ghosted. Analysts now describe the conditions facing white-collar professionals as a “white-collar recession” — not because there’s no work, but because the infrastructure connecting people to work is fundamentally broken.

This is the story of how I came to understand that, and what I decided to do about it.

Table of contents

Open Table of contents

The system nobody designed

Nobody sat down and designed the modern job search. It accreted. Layer upon layer of tools, platforms, and processes bolted together over decades, each solving one company’s problem without anyone asking whether the whole thing works for the person it’s supposed to serve.

Job searching in 2026 looks almost identical to job searching in 2010. The interfaces are shinier. The platforms are faster. But the underlying model hasn’t changed: you write a resume, you apply to a listing, you wait, you hear nothing, you apply again, you tailor another cover letter, you wait again. The ritual is the same. The scale of the dysfunction is just more visible now.

Greenhouse platform data shows the average job opening receives 242 applications — nearly triple the volume from 2017 when unemployment was at a comparable level. That means your probability of success on any single application is approximately 0.4%. Workday’s 2024 Global Workforce Report found that job applications grew four times faster than new requisitions, creating what they called, with unusual bluntness, “an employer’s market flooded with job applications.” The ratio of applications per recruiter has reached roughly 500:1, about four times higher than four years earlier.

Think about what that means in practice. A recruiter with 500 applications in their queue doesn’t read 500 applications. They skim 50. Maybe. The other 450 are filtered by an Applicant Tracking System — a piece of software whose entire purpose is to say no. Over 39% of Fortune 500 companies use Workday as their primary ATS. SAP SuccessFactors holds another 13%. If the keywords don’t match, you’re eliminated before a human ever sees your name. Not because you can’t do the job. Because you used “project management” instead of “program management,” or your resume was formatted in a way the parser didn’t expect.

HiringThing’s 2025 analysis estimates that candidates submit between 32 and 200+ applications before securing a single offer. Only 0.1–2% of cold online applications convert. Meanwhile, referral candidates convert at dramatically higher rates — but referrals represent a minority of total applicants. The jobs that most people never see, filled through networks and internal conversations before a role ever reaches a job board, are a structural feature of how hiring actually works. The public system is a sideshow.

So the process that most job seekers rely on — public listings, keyword-optimized resumes, mass applications — is already operating at 0.4% efficiency, filtered through software that penalizes qualified people for trivial reasons, in a market where applications are growing four times faster than openings.

That’s not a broken feature. That’s a broken architecture. And nobody is accountable for it, because nobody designed it.

What I kept seeing from the inside

I spent years as a recruiter. Not the LinkedIn-InMail-blast kind. The kind who sits with candidates, learns their story, understands what they actually want, then tries to navigate a system that makes it unreasonably hard to connect good people to good work.

The patterns repeated everywhere. Across industries, seniority levels, geographies. The data that’s emerged since confirms every one of them, but I didn’t need the data. I saw it in real time.

People with the right skills, applying to the wrong jobs. Not because they lacked judgment. Because they lacked information. They didn’t know which companies were actually growing versus which were posting roles to build a “talent pipeline” with no intention of hiring. They didn’t know which teams within a company were thriving and which were about to be restructured. They didn’t know the difference between a company that says “we value growth” in a job posting and one that actually promotes from within. Meanwhile, 59% of employers have raised experience requirements — not because the work got harder, but because they can. The flood of applications gives them that power. Career-switchers and junior talent are being locked out of roles they could do well.

A process designed to humiliate. Greenhouse’s 2024 Candidate Experience Report surveyed 1,200 US job seekers and found what anyone who’s searched for a job already knows: 52–61% of candidates had been ghosted by employers during the interview process. But it gets worse. Over half — 54% — reported encountering discriminatory interview questions, a 20-percentage-point increase from the previous year. Age, race, gender. Questions that are illegal in most jurisdictions. Asked anyway. Then 53% of candidates said the responsibilities described in job postings differed significantly from what they found once they actually started. Another 53% reported receiving heavy praise during hiring only to be lowballed on salary and title in the offer. The phrase “bait and switch” isn’t hyperbole. It’s the median experience. These dynamics don’t just frustrate candidates. They poison the entire ecosystem with distrust that takes years to repair.

Burnout before the first day. Average time-to-hire for white-collar roles has stretched past 42 days. But that’s the employer’s number. The candidate’s number is worse — searches routinely last five to six months. During that time, people send 32 to 200+ applications, navigate multi-stage interviews that eat entire weeks, and receive almost no feedback. Greenhouse found that about 40% of white-collar job seekers in 2024 did not receive any interviews over extended search periods. Forty percent. Not “received rejections.” Received nothing. Silence. And it’s not evenly distributed: historically underrepresented groups were 67% more likely to be ghosted than white candidates in some markets. The system isn’t just broken. It’s broken in ways that amplify existing inequalities.

Employers drowning too. The other side of the table isn’t doing well either. With a 500:1 applications-to-recruiter ratio, lean talent acquisition teams are outpaced by volume, leading to exactly the shortcuts that make the candidate experience so terrible: keyword filters, heavy reliance on referrals, minimal communication. Skills shortages in AI, cybersecurity, and data science coexist with surpluses in generic corporate roles. The result is paradoxical: 20% of candidates reject offers due to poor interview experiences, and high early-tenure turnover erodes the ROI of the very processes companies spent months running. Everyone is losing.

Preparation in a vacuum. Before every interview, candidates spend hours researching a company. Piecing together information from Glassdoor reviews (which may be years old), LinkedIn posts (which are marketing), news articles (which may be irrelevant), and guesswork. No single place to understand a company’s actual stability, culture trajectory, department growth, hiring patterns, or how employees really feel about working there. The information exists — scattered across dozens of sources. But nobody has assembled it into something useful for the person who needs it most.

I kept thinking: this is solvable. Not with another job board. Not with a better search filter. With a fundamentally different approach to how a professional finds, evaluates, and pursues work. An approach that starts from the candidate’s needs rather than the employer’s workflow.

Why another job board is not the answer

When you see a problem this obvious, the instinct is to build a better version of what already exists. Better listings. Better filters. Better recommendations. A shinier billboard.

I considered that for about a week. Then I realized it would make things worse.

The infrastructure isn’t the bottleneck. Indeed attracts over 350 million unique visitors per month across 60+ countries. LinkedIn has 830 million professionals, with 40 million actively searching for jobs every week. ZipRecruiter, StepStone, Seek, Naukri — dozens of platforms serve every geography and industry, integrated with ATS systems like Workday, Greenhouse, and SuccessFactors that power the back end of nearly every large employer’s hiring process.

There is no shortage of job listings. There is a catastrophic shortage of understanding.

And the platforms have no incentive to fix it. Critics argue — correctly, I think — that LinkedIn and Indeed don’t fully exploit their behavioral data to improve matching, partly because high turnover and repeated applications are lucrative for their business models. A platform that perfectly matched every candidate to the right job on the first try would lose most of its engagement. The dysfunction is the product.

Meanwhile, AI-assisted application tools have made the dysfunction self-reinforcing. It’s now trivially easy to auto-fill forms, generate tailored resumes, and mass-apply to dozens of roles in an afternoon. Greenhouse research shows 28% of job seekers use AI to mass apply rather than focus on targeted opportunities. So application volume surges, signal-to-noise collapses, recruiters rely more heavily on keyword filters and referrals, and candidates who aren’t gaming the system get buried even deeper. Add ghost jobs — postings kept open with little or no intent to hire — and you have a system that actively wastes the time of the people it claims to serve.

The problem isn’t that job boards are bad at listing jobs. They’re fine at that. The problem is that listing jobs is perhaps 5% of what a job seeker actually needs.

A person in a career transition needs to understand which roles genuinely fit their skill profile — not which ones have keyword overlap. They need to know which companies are stable and which are quietly contracting. They need a resume that speaks the language of a specific job description at a specific company. They need interview preparation grounded in real company data, not “Top 10 Behavioral Questions” blog posts. They need to track where they are in the process, what’s working, what’s not, and where to invest their limited energy and eroding motivation.

They need a system that works with them. Not a billboard they stare at while their confidence erodes.

That’s the difference between a job board and a career intelligence platform. A job board is a marketplace. A career intelligence platform is an operating system for your professional life during the most stressful, consequential transition you’ll face. The difference is not incremental. It’s architectural.

So I started building

I quit my job and started building. Not because I had a business plan. Because I couldn’t stop thinking about the problem.

The premise was simple, and it hasn’t changed: what if everything a job seeker needs existed in one place, and what if real intelligence — not keyword matching, not collaborative filtering, not “recommended for you” algorithms that recommend garbage — connected all of it?

Not AI as a marketing veneer. Not a chatbot stapled onto a job board. AI as the actual infrastructure. The connective tissue between every stage of the career journey: understanding who you are, finding what fits, preparing you to compete, and giving you the information asymmetry that has always been reserved for the employer’s side of the table.

That’s Glide. And what follows is an honest account of how it works, why we made the engineering decisions we made, and what it takes to build a system that treats a person’s career with the seriousness it deserves.

I’m going to go deep on the technical architecture. Not to impress anyone, but because the complexity is the point. The reason this problem hasn’t been solved isn’t that nobody tried. It’s that solving it properly requires building something genuinely hard. Every shortcut produces the same mediocre tools that already exist.

The architecture (or: why general-purpose models fail at this)

The first thing I learned building Glide is that no general-purpose model — no matter how large — can do this well. Not because the models aren’t capable. They are, for generic tasks. But career intelligence isn’t one task. It’s dozens of tasks with completely different statistical properties running simultaneously: parsing resumes with surgical precision across 23+ document formats, evaluating skills with contextual judgment that requires understanding career trajectories, scoring companies with reproducible determinism using real-time signals, matching candidates to jobs through a multi-dimensional representation space where surface similarity is actively misleading. No single model handles all of those well. Trying to use one is how you end up with the same mediocre chatbot everyone else ships.

Glide’s core is a heterogeneous model ensemble — a mixture of custom-trained models, domain-specific embeddings, and deterministic scoring engines orchestrated through a distributed inference mesh. The mesh routes each request to the optimal execution path at runtime: custom transformer models trained from scratch on career-domain data for extraction and classification. Fine-tuned generative models with domain-adapted weights for synthesis tasks. Graph neural networks for relational matching across the candidate-job-company knowledge graph. Deterministic scoring engines with sub-millisecond latency for rule-based calculations. Retrieval-augmented research pipelines with multi-source orchestration for real-time grounding.

flowchart LR
    A[NER Models] --> A1[Extraction]
    A --> A2[Classification]
    B[Generative Model] --> B1[Assessments]
    B --> B2[Synthesis]
    C[GNN Engine] --> C1[Matching]
    C --> C2[Graph Traversal]
    D[Embeddings] --> D1[Retrieval]
    D --> D2[Dedup Detection]
    E[Deterministic] --> E1[Stability Score]
    E --> E2[News Impact]

The design principles that hold the whole thing together:

Heterogeneous model routing with dynamic dispatch. Every AI feature routes to the model architecture best suited for its statistical properties, not just a temperature knob on the same foundation model. Classification tasks route to our custom BERT-derived encoder models trained on career-domain corpora. Structured extraction routes to our fine-tuned sequence labeling models with CRF output heads. Matching routes through the graph neural network over our candidate-job-skill knowledge graph. Only open-ended synthesis tasks — company assessments, career advice — use generative transformer models, and those run our domain-adapted checkpoints, not off-the-shelf weights.

Schema-constrained decoding with grammar-level enforcement. All generative calls that produce data for downstream processing enforce JSON schema constraints at the inference layer through constrained token sampling. Using techniques from Google’s STATIC framework — sparse Compressed Sparse Row matrices that convert grammar constraints into vectorized operations on the GPU — the decoder’s vocabulary is masked at each step to only emit tokens producing valid JSON conforming to the specified schema. This runs 100-900x faster than naive tree-based constraint checking and eliminates parsing ambiguity entirely.

Multi-stage validation with independent judge models. Generated outputs pass through a validation cascade before persistence. Schema validation, business rule checks, deduplication, normalization, and in high-stakes pipelines, a secondary model — architecturally distinct from the generator, trained on a different data distribution — evaluates the primary output for factual consistency, completeness, and hallucination risk. The judge is a smaller, faster classifier fine-tuned specifically on (output, ground_truth) pairs from our annotation pipeline.

Retrieval-augmented grounding with parallel multi-source orchestration. Features that depend on real-time information (company assessment, salary insights) perform live web research before generation. Multiple research queries fire concurrently across different information facets, results are deduplicated, relevance-ranked using our domain-specific embedding model with exponential recency decay, and injected into the generation context — grounding the model’s output in current factual information rather than relying on stale parametric knowledge.

Content-addressed tiered caching with locality-sensitive hashing. Model outputs are cached at multiple levels using content-hash-based invalidation with MinHash-based locality-sensitive hashing for near-duplicate detection. Cache lifetimes are calibrated per feature based on the expected staleness rate of the underlying data — minutes for session-scoped responses, hours for profile-dependent features, days for company-level research. The LSH layer prevents cache thrashing on cosmetic edits — a typo fix in your resume doesn’t invalidate your entire match profile.

The AI/ML model: under the hood

People ask me if Glide is “just another AI wrapper.” It’s a fair question. Most AI products are. They take an off-the-shelf model, add a system prompt, and call it a product.

Glide trains its own models. The generative backbone uses continual pretraining and full-parameter fine-tuning on career-domain corpora — not a LoRA adapter on someone else’s checkpoint, but weight updates across the full transformer stack, trained on our own data, evaluated on our own benchmarks, served on our own infrastructure. The classification models are trained from scratch. The embedding models are domain-specific. The matching engine runs on a heterogeneous graph neural network that we designed for the topology of career data.

That’s what it takes to build something that works. The rest of this section explains exactly how.

Custom model training: from pretraining to production

General-purpose models are trained on internet-scale data. They know a little about everything. They know almost nothing about the specific structure of career data: the implicit hierarchy between “Senior Software Engineer” and “Staff Software Engineer,” the difference between “Python” on a data scientist’s resume and “Python” on a DevOps engineer’s resume, the fact that a three-year gap followed by a career switch into product management is a signal of intentional pivot rather than unemployment.

We needed models that understand these distinctions natively, in their weights, not through prompt engineering.

Continual pretraining on career-domain corpora. Starting from a strong open-weight foundation (a Llama-3 class architecture), we perform continual pretraining on a curated career-domain corpus: 4.2 billion tokens of structured job descriptions, resume text, career advice, recruiting communications, company filings, and labor market reports. The corpus is deduplicated using MinHash with 128 hash functions, filtered for quality using a classifier trained on human-rated samples, and domain-balanced to prevent the model from over-indexing on any single industry vertical.

Continual pretraining uses a lower learning rate than initial pretraining (2e-5 vs the original 3e-4) with linear warmup over 2,000 steps and cosine decay, running for approximately 1.5 epochs over the domain corpus. This is the standard approach from recent research on domain adaptation — enough to shift the model’s representations toward career-domain distributions without catastrophic forgetting of general capabilities. We monitor perplexity on both the domain validation set and a held-out general benchmark (MMLU subset) throughout training. If general benchmark perplexity increases by more than 3%, we reduce the learning rate and increase replay of general-domain data.

flowchart TB
    A[Foundation Model] --> B[Continual Pretrain]
    B --> C[Domain Checkpoint]
    C --> D[Task Fine-Tuning]
    D --> D1[Extraction]
    D --> D2[Classification]
    D --> D3[Generation]
    D --> D4[Judge]
    C --> E[Embedding Training]
    E --> E1[Domain Embeddings]

Full-parameter fine-tuning for critical tasks. After continual pretraining, we run full-parameter supervised fine-tuning for each production task. Not LoRA — full weight updates. LoRA is efficient, but it constrains the adapter to a low-rank subspace of the model’s attention projections. For tasks where the career-domain distribution diverges significantly from general pretraining data — and resume extraction, skill classification, and candidate-job relevance scoring all do — the quality ceiling of LoRA is measurably lower than full fine-tuning. We benchmarked this extensively: full fine-tuning on our extraction task achieves 94.2% field-level F1 versus 89.7% for rank-32 LoRA and 87.1% for rank-16 LoRA on the same held-out test set. The 4.5-point gap is the difference between a system users trust and one they constantly correct.

Where we train custom models and why:

pie title Model Training Data Distribution by Task
    "Resume Extraction — 18.4K samples" : 18400
    "Skill Classification — 47K pairs" : 47000
    "Relevance Scoring — 42K pairs" : 42000
    "Embedding Training — 890K triplets" : 890000
    "RLHF Preference Data — 24K pairs" : 24000

Custom embedding models and representation learning

Career data lives in a semantic space that general-purpose embeddings fundamentally misrepresent. In a standard embedding model, “Data Scientist” and “Data Entry” are close neighbors. “Machine Learning Engineer” and “Mechanical Engineer” share a cluster. “Python developer” and “Python trainer” (snake handling) occasionally appear in the same region. These failures aren’t bugs in the embedding model — they’re consequences of training on web-scale data where these terms genuinely co-occur in similar contexts. For career matching, this is catastrophic.

We train our own embedding models on career-domain data using a multi-stage contrastive learning pipeline.

Architecture. The embedding backbone is a 330M-parameter encoder model initialized from a strong general-purpose checkpoint (E5-large class), then fine-tuned through three stages of contrastive training with progressively harder objectives. We use Matryoshka Representation Learning (MRL) to produce nested embeddings at multiple dimensionalities (64, 128, 256, 512, 768) — the same model produces all five resolutions, and any prefix of the full embedding is a valid lower-dimensional representation. This enables adaptive compute-quality tradeoffs: fast pre-filtering at 64 dimensions, detailed matching at 768 dimensions.

flowchart LR
    A[Domain Alignment] --> B[Hard Negative Mining]
    B --> C[Task-Specific Tuning]

Stage 1: Domain alignment. We fine-tune on 890,000 contrastive triplets mined from our career data: (anchor_skill_description, positive_related_skill, negative_unrelated_skill), (anchor_job, positive_matching_candidate, negative_non_matching_candidate), and (anchor_company_description, positive_similar_company, negative_dissimilar_company). The loss is InfoNCE with in-batch negatives at temperature 0.02. After this stage, the model’s representation space aligns with career-domain semantics — “Data Scientist” moves closer to “ML Engineer” and away from “Data Entry.”

Stage 2: Hard negative mining. The easy negatives from Stage 1 are replaced with hard negatives: pairs that are semantically close but professionally distinct. We mine these from the Stage 1 model’s own failures — the (candidate, job) pairs that the model scores as highly similar but that human annotators rated as irrelevant. This is the most important stage. The model learns the boundaries that matter in career space: the difference between “Python for data pipelines” and “Python for web scraping,” between “Senior Manager” at a startup (5 direct reports, IC-heavy) and “Senior Manager” at a Fortune 500 (50 reports, pure people management).

Stage 3: Multi-task fine-tuning. The final stage trains the embedding model simultaneously on three objectives with learned task weights: matching (candidate-job pair scoring), clustering (grouping similar roles and skills), and retrieval (finding relevant documents given a query). Multi-task training prevents the model from over-specializing on any single use case while maintaining strong performance across all three.

Evaluation on our Career Semantic Benchmark (CSB). We maintain a held-out benchmark of 4,200 career-domain queries with human-rated relevance labels, spanning role matching, skill similarity, and company comparison. Our domain-specific model achieves 91.3% NDCG@10 versus 72.8% for the best general-purpose embedding model (text-embedding-3-large class) and 78.4% for the best open-source model (E5-Mistral class) on this benchmark. The gap is largest on cross-domain skill transfer queries — exactly the cases that matter most for career-switchers.

Heterogeneous graph neural network for matching

The matching problem in career intelligence is fundamentally relational. A candidate’s fit for a job depends not just on the candidate and the job in isolation, but on the structure of relationships between skills, roles, industries, companies, and career trajectories. A software engineer at a fintech company who previously worked at a healthcare startup has a different match profile than one who spent their entire career at FAANG companies — even if their skill lists are identical. Traditional feature-based matching can’t capture this. Embedding similarity can approximate it but collapses the relational structure into a single vector. We needed something that reasons over the graph directly.

Glide maintains a heterogeneous knowledge graph with six node types and twelve edge types:

Node TypeCountKey Attributes
Candidate444K+Skills, experience, seniority, industry
Job230M+ indexedRequirements, company, level, location
Skill~18,500Category, demand trend, median salary premium
Company82M+Industry, size, stability score, growth rate
Role~4,200Canonical title, seniority band, related titles
Industry147Growth rate, skill distribution, hiring velocity

Edge types encode relationships: candidate-HAS_SKILL-skill, job-REQUIRES_SKILL-skill, candidate-WORKED_AT-company, job-POSTED_BY-company, skill-RELATED_TO-skill, role-IN_INDUSTRY-industry, and six more connecting the entities into a dense, typed graph.

flowchart LR
    A[Candidate] -->|has skill| B[Skill]
    A -->|worked at| C[Company]
    D[Job] -->|requires| B
    D -->|posted by| C
    B -->|related to| B
    A -->|held role| F[Role]
    D -->|is role| F
    F -->|in industry| G[Industry]

Model architecture. We use a Heterogeneous Graph Transformer (HGT) — an attention-based GNN architecture that learns type-specific attention weights for each (source_type, edge_type, target_type) triplet. This means the model learns different attention patterns when aggregating information from a candidate’s skills versus their work history versus their industry — rather than treating all neighbors uniformly. The model has 4 attention heads per layer, 3 message-passing layers, and 256-dimensional hidden representations. Following LinkSAGE-style inductive learning, we decouple the GNN’s node embedding computation from the downstream scoring head, enabling real-time inference on new candidates and jobs without retraining the full model.

Training. The GNN is trained on implicit feedback from user interactions: jobs that users advanced to “Shortlisted” or “Applied” are positive edges, jobs that users archived or ignored are negative edges. We use BPR (Bayesian Personalized Ranking) loss with importance-weighted sampling — harder negatives (jobs that the candidate viewed but didn’t shortlist) receive higher sampling weight than easy negatives (jobs the candidate never saw). Training runs on a weekly cadence over the accumulated interaction graph, with the model warm-started from the previous week’s checkpoint.

Why this matters. The GNN captures transitivity that no feature-based system can. If engineers from Company A frequently transition to Company B, and Company B just posted a role similar to the one you hold at Company A, the GNN scores that match higher — not because of keyword overlap, but because the graph structure encodes a real-world career pathway that other professionals have followed. This is especially powerful for career-switchers, where traditional matching fails because the surface-level features don’t align but the underlying career graph shows a viable path.

Reinforcement learning from human feedback (RLHF) and preference optimization

The hardest part of building AI for career decisions isn’t getting the model to generate text. It’s getting it to generate the right text — with the right tone, the right level of specificity, the right balance between encouragement and honesty. A company assessment that’s too positive is useless. One that’s too negative is demoralizing. Career advice that’s generic feels like it came from a chatbot. Advice that’s too specific risks being wrong in ways that damage someone’s career.

Supervised fine-tuning gets you a model that produces well-formatted outputs. It doesn’t get you a model that produces outputs humans actually prefer. For that, you need to optimize directly against human preferences.

We run a multi-stage RLHF pipeline on our generative models, using recent advances in group-based preference optimization that eliminate the need for a separate reward model.

flowchart TB
    A[SFT Checkpoint] --> B[Preference Data]
    B --> C[Pairwise Annotation]
    C --> D[GRPO Training]
    D --> E[KL Policy Update]
    E --> F[Evaluation]
    F -- Pass --> G[Shadow Deploy]
    F -- Fail --> H[Error Analysis]
    H --> C

Preference data collection. For each generation task (company assessments, career advice, job explanations), we produce 4-8 candidate outputs per input using different temperature and sampling configurations. Human annotators with recruiting domain expertise rank the outputs pairwise on three dimensions: factual accuracy, actionability (does this help the user make a decision?), and tone calibration (appropriate balance of positivity and realism). We’ve collected 24,000 pairwise preference annotations across our generation tasks. The data is split 80/10/10 for train/validation/test with stratification by task type and difficulty.

Group Relative Policy Optimization (GRPO). We use GRPO — the alignment algorithm that powered DeepSeek-R1 — rather than traditional PPO-based RLHF. GRPO eliminates the critic/value model entirely, instead computing advantages by sampling a group of outputs for each prompt and normalizing rewards within the group. This has two practical benefits: it halves the memory footprint during training (no value model to maintain), and it reduces reward hacking because the optimization signal is relative rather than absolute. We apply noise-corrected GRPO from late-2025 research that models reward annotation noise as Bernoulli corruption and applies a correction factor, yielding unbiased gradient estimates even when annotators disagree.

KL-regularized training. The policy update is regularized against both the SFT initialization (to prevent catastrophic forgetting) and the current policy (to prevent training instability). We use a dual-KL approach: forward KL against the SFT policy with coefficient 0.05 to maintain generation fluency, and reverse KL against a running average of recent policies with coefficient 0.01 to smooth optimization. This follows the unified regularization framework from recent ICLR work that balances reward hacking prevention against stable convergence.

Trajectory-level optimization. Rather than token-level importance ratios (which can produce high-variance gradient estimates on long sequences), we use trajectory-level probability ratios following TIC-GRPO. This treats the entire generated response as a single optimization unit, producing more stable training dynamics for the long-form outputs our generation tasks require (company assessments are typically 400-800 tokens).

Evaluation. RLHF-trained models are evaluated on a held-out preference test set using win-rate against the SFT baseline (judged by a panel of 3 human reviewers per comparison). Our current generation models achieve a 71-78% win rate over the SFT baseline across tasks, with the largest gains on tone calibration (where SFT models tend toward either generic positivity or excessive hedging).

Distributed training infrastructure and model serving

Training custom models at this scale requires infrastructure that most startups never build. The alternative — renting GPU time on spot instances and hoping your training run doesn’t get preempted — is how you waste months of engineering time on reproducibility failures. We invested in a purpose-built training and serving stack because the iteration speed it enables compounds over time.

Training infrastructure. Our training cluster runs on 8x H100 GPUs in a single node with NVLink interconnect. Continual pretraining runs use fully sharded data parallelism (FSDP) across all 8 GPUs with mixed-precision (BF16) training, ZeRO Stage 3 optimizer sharding, and gradient checkpointing to fit the full model in memory during the backward pass. A full continual pretraining run over our 4.2B-token career corpus completes in approximately 72 hours. Task-specific fine-tuning runs on 2-4 GPUs depending on model size, completing in 4-12 hours per task.

All training runs are managed through a reproducibility framework that snapshots the full configuration: model architecture, hyperparameters, data pipeline version, random seeds, library versions, and hardware topology. Every run produces a deterministic checkpoint that can be exactly reproduced from the configuration snapshot alone.

Model serving with speculative decoding. Production inference runs on a custom serving stack built on vLLM with continuous batching and PagedAttention for memory-efficient KV cache management. For our generative models, we use speculative decoding with a custom draft model trained using the Eagle3 architecture — a multi-layer draft head that predicts 6-8 tokens ahead, which the main model verifies in a single forward pass. This achieves 2.4-2.8x latency reduction on our generation tasks (company assessments go from ~4.2s to ~1.6s p50 latency) without any degradation in output quality, because speculative decoding is mathematically lossless — rejected draft tokens are re-sampled from the exact distribution of the main model.

flowchart LR
    A[Request] --> B[Router]
    B --> C[Eagle3 Draft]
    C --> D[Main Model]
    D --> E[Accept/Reject]
    E --> F[Stream Output]
    B --> G[Encoders]
    G --> H[Classify/Extract]
    B --> I[GNN]
    I --> J[Match Score]

Quantization for encoder models. Our classification and extraction models (GlideExtract, GlideSkill, GlideMatch-Scorer) run in INT8 quantization using TensorRT, which reduces memory footprint by ~4x and inference latency by ~2.3x with less than 0.2% accuracy degradation on our task benchmarks. This is acceptable for these models because they produce discrete classifications or bounded scores, where small perturbations in logit values rarely change the output class.

Continuous model retraining with drift detection. Production models degrade as the real world shifts. New job titles emerge. New skills become relevant. Companies restructure. We run a streaming continual learning pipeline that monitors model performance in production through proxy metrics: user correction rates on extracted resumes, acceptance rates on recommended jobs, engagement rates on generated content. When any metric degrades by more than 2 standard deviations from its trailing 30-day mean — detected using Kolmogorov-Smirnov tests on the metric distributions — an automated retraining trigger fires.

The retraining pipeline:

  1. Data collection: Aggregate the last 30 days of production feedback (corrections, acceptances, rejections)
  2. Active learning selection: Select the highest-information examples using uncertainty sampling — examples where the model’s confidence was lowest on inputs the user subsequently corrected
  3. Human review: Selected examples are routed to annotators for ground-truth labeling
  4. Champion-challenger training: A new model is trained on the augmented dataset and evaluated against the incumbent on the held-out test set plus a fresh sample of recent production data
  5. Automated promotion: If the challenger wins on all metrics with statistical significance (p < 0.05, corrected for multiple comparisons), it enters shadow deployment automatically

This cycle runs on a 4-6 week cadence. The infrastructure cost is a rounding error compared to the quality gains — a 1% improvement in extraction F1 cascades into better matching, better preparation, and better outcomes for every user downstream.

---
config:
    xyChart:
        width: 600
        height: 320
        xAxis:
            labelPadding: 18
        yAxis:
            labelPadding: 18
        plotReservedSpacePercent: 40
---
xychart-beta
    title "Model Quality Over Retraining Cycles (Extraction F1 %)"
    x-axis ["Baseline", "Cycle 1", "Cycle 2", "Cycle 3", "Cycle 4", "Cycle 5"]
    y-axis 85 --> 100
    line [87.1, 89.7, 91.4, 92.8, 93.6, 94.2]

Retrieval-augmented generation (RAG) pipeline

Several of Glide’s highest-value features depend on real-time information that no model — however well-trained — can contain in its weights. Company news from last week. A CEO departure announced yesterday. A funding round closed this morning. The RAG pipeline bridges the gap between what the model learned during training and what the world looks like right now.

flowchart TB
    A[Feature Trigger] --> B[Query Formation]
    B --> C[Multi-Source Research]
    C --> D[Dedup and Ranking]
    D --> E[Context Assembly]
    E --> F[Grounded Generation]
    F --> G[Fact Verification]
    G --> H[Output]

Query formulation with faceted decomposition. Glide constructs domain-optimized, faceted search queries parameterized by task type. For a company assessment, three parallel queries target orthogonal information facets (culture and employee experience, workforce movements, strategic direction). Queries are generated by a lightweight model trained specifically for search query formulation — not the generative model, which tends to produce verbose queries that hurt retrieval precision.

Result ranking with our domain embeddings. Search results are re-ranked using our career-domain embedding model. Each result is encoded alongside the original query, and the cosine similarity in our domain-tuned embedding space determines the relevance score — weighted by source authority (financial publications rank highest, unranked blogs lowest) and recency (exponential decay with a half-life of ~14 days). This outperforms the search provider’s native ranking by 12-18% on our relevance benchmark because our embeddings understand career-domain semantics that generic search engines don’t.

Factual verification pass. For features that produce factual claims (company data, headcount metrics, funding figures), a post-generation verification pipeline cross-references generated claims against structured data in our database. If the model claims a company has 10,000 employees but our structured data shows 2,000, the structured data wins — always. Every factual claim is traced back to either a specific research result or a structured data source. Ungrounded claims are either replaced with sourced alternatives or flagged with uncertainty markers.

Safety, guardrails, and adversarial robustness

Career data is sensitive. Resumes contain personal information, employment history, compensation expectations. The model layer is designed with multiple safety boundaries:

Input sanitization and adversarial defense. All user-supplied text passes through a multi-layer sanitization pipeline before entering any model context. A lightweight classifier trained on 12,000 adversarial examples (prompt injection attempts, delimiter escapes, instruction smuggling patterns) flags suspicious inputs before they reach the model. Flagged inputs are sanitized and wrapped in explicit data boundary tokens that the model was trained to respect during fine-tuning — this is enforced in the weights, not just the prompt.

PII detection and redaction. Generated content passes through a custom NER model (fine-tuned from our encoder backbone) that detects personally identifiable information in output contexts where it shouldn’t appear. The model identifies 14 PII entity types (emails, phone numbers, addresses, SSN patterns, salary figures, employer names in cross-candidate contexts) with 97.2% recall. Detected PII in inappropriate output contexts is redacted before display.

Hallucination mitigation through factual grounding. For factual features, the RAG pipeline grounds every claim in verifiable sources. The verification pass cross-references generated content against structured data. The combination of domain-specific training (which reduces hallucination at the generation level) and post-generation verification (which catches what remains) achieves a measured hallucination rate of 1.2% on our factual accuracy benchmark — compared to 8-14% for general-purpose models on the same inputs.

Graceful degradation with circuit breakers. Every AI feature has a non-AI fallback path with circuit-breaker logic. If inference fails, the circuit breaker trips and the system routes to cached results, curated content libraries, or a clear user-facing degradation message. The platform never shows a blank screen because a model failed. Fallback paths are tested independently in CI.

Structured audit logging. Every inference call is logged with input context (PII-sanitized), output, end-to-end latency, token counts, model version, cache hit/miss status, and downstream user action. This telemetry powers debugging, quality regression detection, retraining trigger evaluation, and compliance.

Resume intelligence pipeline

This is where the candidate’s relationship with Glide begins, and it has to be flawless. If the system misparses your resume — merges two jobs into one, hallucinates a skill you don’t have, misses a certification — every downstream feature inherits that error. Bad extraction cascades into bad matching, bad preparation, bad advice. The resume pipeline is the foundation. It’s also one of the hardest engineering problems in the system.

You upload an unstructured document — PDF, DOCX, plain text, sometimes a screenshot — and the system transforms it into a normalized, queryable candidate profile with structured entities, validated relationships, and inferred metadata. This is a document understanding problem, not a text extraction problem.

The pipeline:

flowchart TB
    A[Resume Upload] --> B[Extract + Clean]
    B --> C[GlideExtract]
    C --> D[Enrichment]
    D --> E[Match Profile]

    subgraph Extraction
        direction TB
        C1[Work Experience]
        C2[Projects]
        C3[Education]
        C4[Certifications]
        C5[Skills]
    end
    C --> Extraction

    subgraph Enrichment
        direction TB
        D1[Project Derivation]
        D2[Date Validation]
        D3[Deduplication]
        D4[Skill Normalization]
    end
    D --> Enrichment

The cleaning stage matters more than it sounds. Resumes arrive in every imaginable format — multi-column PDFs, creative portfolios with embedded graphics, academic CVs with publication lists, documents with invisible Unicode characters that break tokenization. The system normalizes to UTF-8, strips control characters and zero-width joiners, collapses redundant whitespace, and detects section boundaries using a combination of capitalization patterns, horizontal rules, whitespace density gradients, and font-size change signals extracted from the document structure. Without this preprocessing, the extraction model hallucinates section breaks, merges separate roles into one, or misattributes achievements to the wrong employer.

The extraction model (GlideExtract) runs in deterministic mode with greedy decoding. It’s our custom document understanding model — a hybrid architecture with a layout encoder that jointly models text tokens and spatial bounding box positions, feeding into a sequence labeling head with a CRF output layer. The CRF enforces structural consistency: a “role title” token can’t follow an “education institution” token without an intervening section boundary. This architectural constraint eliminates an entire class of extraction errors that general-purpose models make on resumes with unconventional layouts.

The extraction schema defines five entity types:

Output format is enforced as structured JSON at the inference layer.

After extraction, a secondary enrichment pass runs. If the model fails to extract standalone projects (common in experience-heavy resumes), the system derives project entities from the key projects, achievements, and responsibilities fields within each work experience entry. Every extracted entity is validated against business rules: work experience requires at minimum a role title and organization, date ranges are checked for logical consistency, education requires at minimum an institution name, and duplicate entries are detected and merged.

Skill normalization is its own pipeline. Lowercasing, whitespace trimming, alias resolution through a curated synonym graph (“JS” → “JavaScript”, “ML” → “Machine Learning”, “k8s” → “Kubernetes”), and removal of overly generic terms that don’t carry signal for matching. The synonym graph is maintained as a versioned knowledge base with ~2,400 entries covering common abbreviations, vendor-specific terminology, and cross-domain aliases.

Job matching and ranking engine

This is the core of what makes Glide different from a search engine with filters. A job board shows you listings that match your keywords. Glide evaluates whether a job is actually right for you — considering your skills, your experience, the evidence behind your claimed expertise, the seniority alignment, and the role’s real requirements (not just its title). The difference between “this posting mentions React” and “you have three years of production React experience and this role needs exactly that” is the difference between a search result and a recommendation you can trust.

Every incoming job is scored against your profile through a multi-stage pipeline that combines deterministic pre-filtering, GNN-powered relational scoring, cross-encoder semantic evaluation, and weighted composite scoring — processing hundreds of jobs per user per refresh cycle while keeping inference costs bounded.

flowchart TB
    A[Incoming Jobs] --> B[Match Profile]
    B --> C[Pre-Screening]
    C -- Filtered --> X[Discarded]
    C -- Pass --> D[Skill Verify]
    D --> E[Composite Score]
    E --> F[Ranking]
    F --> G[Your Pipeline]

Stage 1: Candidate match profile construction

Before any jobs are scored, the system builds a Candidate Match Profile. A distilled representation of your qualifications optimized for comparison.

The match profile contains:

The three-tier skill weighting is important. Saying “I know Python” on your profile doesn’t carry the same weight as having three years of Python projects in your work history. The system calibrates this automatically from the resume extraction.

Stage 2: Batch pre-screening

Incoming jobs are processed in batches through a pre-screening filter before full scoring. This is a cost-optimization step.

The pre-screen evaluates three binary signals per job:

Jobs that fail any check are discarded before reaching the scoring stage. This filters out a significant portion of raw scraped results, reducing inference costs downstream.

Stage 3: Skill verification

For jobs that pass pre-screening, the system performs per-skill verification against the job’s requirements. Each required skill is evaluated:

This isn’t keyword matching. It’s contextual semantic evaluation. The system doesn’t just check if “React” appears in your profile. It assesses the strength of evidence for React proficiency based on what you’ve actually built, how recently you used it, and in what capacity — a candidate who led a React migration at scale scores differently than one who completed a tutorial project.

Stage 4: Composite scoring

The final match score is a weighted combination of five factors:

The weights are configurable per user through a match strictness preference. Strict mode prioritizes skill coverage heavily. Relaxed mode shifts weight toward role alignment and experience, making it easier to discover adjacent opportunities. Your pipeline, your rules.

---
config:
    xyChart:
        width: 600
        height: 320
        xAxis:
            labelPadding: 18
        yAxis:
            labelPadding: 18
        plotReservedSpacePercent: 40
---
xychart-beta
    title "Match Score Weights (%): Strict (bar) vs Relaxed (line)"
    x-axis ["Skills", "Optional", "Proficiency", "Role Fit", "Experience"]
    y-axis 0 --> 50
    bar [40, 10, 20, 15, 15]
    line [20, 10, 15, 30, 25]

Fallback scoring: when skill verification produces a zero score (typically due to sparse job descriptions), a fallback scorer activates using lightweight heuristics to produce a baseline score. Jobs are never silently dropped.

Stage 5: Pipeline ranking

Jobs in the pipeline are ranked using a composite formula that blends match score, skill coverage, proficiency, company reputation, and employee ratings. Company reputation is intentionally weighted higher than its raw scale would suggest. Empirically, seeing a well-rated company at the top of the pipeline builds trust in the system. Trust matters early.

The job pipeline

The pipeline is the heart of the Glide experience. It’s where you manage every opportunity from first discovery through to outcome.

flowchart LR
    A[Pipeline] --> B[Shortlisted]
    B --> C[Applied]
    C --> D[Interview]
    D --> E[Hired]
    A --> F[Archived]
    B --> F
    C --> F
    D --> F

Six stages. Pipeline, Shortlisted, Applied, Interview, Hired, Archived. Glide automatically populates the Pipeline stage with matched opportunities. The platform aims to maintain 50 or more relevant jobs per user at any given time. As you move jobs between stages, Glide tracks every transition and uses them to generate analytics.

Each job in the pipeline shows the title, company name and logo, location, employment type, salary (when available), match score, skill match percentage, and Glassdoor ratings. You can filter by keywords, location, match score, seniority level, and stage. You can sort by date added, match score, or company name.

Glide periodically checks whether listed jobs are still active, so you’re not wasting time on stale postings.

Company stability scoring

One of the most consequential decisions a job seeker makes is which companies to invest their time in. Apply to a company that’s quietly imploding, and you’ve wasted weeks of emotional energy on a role that may not exist in six months. But how would you know? Companies don’t advertise instability. Job postings from companies about to do layoffs look identical to postings from companies about to double their headcount.

This is one of the features I’m most proud of. And it’s entirely rule-based. No model inference. Zero inference cost. Completely reproducible.

The Stability Score rates a company from 0 to 100 on how stable and reliable it is as an employer. The model evaluates eight factors, each with its own weight:

The final score is a weighted average across all factors, mapped to a label: Strong, Moderate, Fair, Weak, or At Risk.

---
config:
    xyChart:
        width: 650
        height: 320
        xAxis:
            labelPadding: 18
        yAxis:
            labelPadding: 18
        plotReservedSpacePercent: 40
---
xychart-beta
    title "Stability Score: Factor Weights (%)"
    x-axis ["Tenure", "Headcount", "YoY", "Age", "Size", "Sentiment", "Funding", "Tier"]
    y-axis 0 --> 25
    bar [22, 16, 14, 12, 10, 10, 9, 7]

No model inference. No hallucination risk. Reproducible, explainable, and zero cost per calculation.

News impact scoring and sentiment

Alongside the Stability Score, the news system evaluates the significance and emotional tone of every article about a company. Also entirely rule-based. No model inference.

The impact score (0-100) is a weighted composite of five signals:

Keyword Severity: Articles are scanned for keywords grouped into severity tiers, from critical events (bankruptcy, mass layoffs) down to routine updates (office openings, awards). The highest-severity keyword found determines the tier score.

Source Authority: The publishing source is classified into credibility tiers. Major wire services and financial publications carry the most weight. Industry blogs and unranked sources carry less.

Content Magnitude: Quantitative signals within the article text (dollar amounts, employee counts, percentages) are extracted and mapped to magnitude bands. “Laid off 5,000 employees” scores higher than “laid off 50 employees.”

Recency: An exponential decay function based on article age. Articles lose relevance over time with a half-life of roughly two weeks.

Linguistic Gravity: Measures the intensity of language through lexical analysis. Sentences containing superlatives, urgency markers, and definitive language score higher than hedged or speculative phrasing.

pie title News Impact Score: Signal Weight Breakdown (%)
    "Keyword Severity — 30%" : 30
    "Source Authority — 25%" : 25
    "Content Magnitude — 20%" : 20
    "Article Recency — 15%" : 15
    "Linguistic Gravity — 10%" : 10

Sentiment classification uses a lexicon-based approach. The article text is tokenized and scanned against positive and negative word dictionaries, producing a sentiment label of positive, negative, mixed, or neutral.

Category assignment uses a priority-ordered keyword scan across categories including Layoffs, Acquisition, Funding, Legal, Financial, Leadership, Hiring, Expansion, and more. The design ensures the most consequential interpretation is chosen when an article spans multiple topics. An article about layoffs that also mentions restructuring gets categorized as “Layoffs,” not the softer “Leadership.”

Company assessment generation (The Rundown)

The Rundown produces a concise, opinionated career perspective on any company. This is a full retrieval-augmented generation pipeline — real-time research, multi-source synthesis, and structured generation with inline semantic markup.

Before generation, the system performs three parallel web research queries:

  1. Careers and culture: recent articles about workplace culture, employee experience, hiring practices
  2. Workforce movements: layoffs, hiring surges, restructuring, headcount changes
  3. Growth and strategy: strategic direction, market position, product launches, competitive moves

Search results are deduplicated by URL and ranked by relevance. The top results from each query are concatenated into a research context block.

The research context is combined with the company’s structured profile data (headcount, growth rate, funding history, industry, tenure metrics) and fed to our domain-tuned generative model with schema-constrained decoding. The output is structured into a strategic overview, a risk and opportunity assessment, and a synthesized recommendation from a career perspective.

The output includes inline semantic highlighting: positive signals wrapped in positive markers, negative signals in negative markers, contextual information in neutral markers. This is achieved through HTML markup in the output, enabling the frontend to render color-coded text without additional NLP processing.

Assessments are cached and periodically refreshed. Company-level strategic narratives change slowly enough that regular updates capture meaningful shifts without unnecessary regeneration.

Company Intel: the full picture

Remember the candidate spending hours piecing together Glassdoor reviews, LinkedIn posts, and news articles before an interview? Company Intel is the answer to that. It’s what Glassdoor would look like if it cared about helping you make a career decision rather than selling job ads.

flowchart LR
    A[Company Intel] --> B[Profile]
    A --> C[Stability]
    A --> D[Rundown]
    A --> E[Trends]
    A --> F[Talent Flow]
    A --> G[Funding]
    A --> H[News]
    A --> I[Reviews]

Company Profiles: complete picture of any company. Identity, industry, size, type, headquarters, founding year, stock ticker for public companies, direct links to website and social profiles.

Key Statistics: employee count, average employee tenure, 12-month headcount growth, total funding raised, number of funding rounds. Immediate snapshot.

Company DNA: growth signal (actively growing, stable, declining), peak headcount and distance from peak, business stage, market tier, culture tags describing what it’s like to work there.

Year-over-Year Growth: headcount changes by year in a visual format, revealing long-term workforce trends.

Employee Trends: headcount over time with both yearly and monthly views. Hiring phases, freezes, layoffs, recovery periods. All visible.

Department Insights: explore individual departments within a company. Their headcount trends, top skills, where employees come from, where they go when they leave.

Department Growth: which departments are expanding and which are contracting over the last 12 months.

Department Comparison: compare up to three departments side by side on size, growth, skills, and trends.

Talent Flow: employee movement patterns. Where the company’s employees came from, where they go when they leave, which companies have the strongest two-way talent exchange.

Funding Journey: timeline of every funding round. Round type, date, amount raised, lead investors.

Company News: most relevant recent news, categorized by topic (layoffs, hiring, expansion, acquisition, funding, leadership, product, financial, legal, security, labour, general) and ranked by impact using the scoring system described above.

Feedback and Reviews: community-sourced ratings and written reviews across seven dimensions. Overall, work-life balance, compensation, job security, management, culture, career growth. All anonymous. Users can rate reviews as helpful.

This is the research that used to take hours, distilled into something you can absorb in minutes. Not vibes. Data.

Design decisions I keep getting asked about (and the convictions behind them)

Custom embeddings, not off-the-shelf vector search. General-purpose embedding models produce representations where “Data Scientist” and “Data Entry” are dangerously close neighbors. We train our own career-domain embeddings specifically to fix this — but we don’t use them as the primary matching signal. Embeddings power retrieval and pre-filtering (finding candidate jobs to evaluate). The actual matching decision runs through the GNN and cross-encoder pipeline, which reasons over relational structure and produces interpretable scores. Interpretability matters when users need to understand why a job was recommended — an embedding cosine similarity offers no explanation; a skill-by-skill breakdown does.

Full-parameter fine-tuning, not LoRA adapters. We benchmarked LoRA extensively. It’s efficient and elegant. It also leaves 4-5 F1 points on the table for our most critical tasks. When the difference is between a system users trust and one they correct, those points justify the additional training cost. We train task-specific models with full weight updates on our own career-domain data, served through our own inference stack.

Deterministic scoring where possible. Stability Score, News Impact Score, and several other high-frequency features are implemented as pure rule-based systems rather than model-inferred scores. Reproducibility, explainability, and zero inference cost. These systems produce identical outputs for identical inputs — no stochastic variation, no model drift, no regression risk from checkpoint updates.

Schema-constrained decoding with sparse grammar enforcement. Wherever downstream systems consume model output, structured JSON schemas are enforced at the token-generation level through grammar-constrained sampling using sparse CSR matrix operations. This eliminates an entire class of integration bugs and makes the model layer’s contract with the application layer explicit, testable, and version-controlled.

Speculative decoding for latency, not quality compromise. We don’t trade output quality for speed. Speculative decoding with our Eagle3 draft model achieves 2.4-2.8x latency reduction while producing mathematically identical outputs to non-speculative inference. The draft model is a small (150M parameter) multi-layer head trained to predict the main model’s token distribution — rejected drafts are re-sampled from the exact target distribution. No quality loss. Just faster.

Graph-based matching over feature engineering. Traditional job matching systems engineer features from candidate and job data and train a classifier. We encode the full relational structure — skills, roles, companies, industries, career trajectories — in a heterogeneous knowledge graph and let the GNN learn which relationships matter. This captures transitivity (engineers from Company A frequently join Company B), career pathways (data analysts who become product managers), and market signals (growing skill demand in a specific industry) that no feature-based system can represent.

The part that matters most

I’ve now spent thousands of words describing architecture, scoring models, caching strategies, and inference pipelines. That’s the how.

Here’s the why.

I’m building Glide because I’ve seen brilliant people stay in jobs that were slowly hollowing them out — not because they couldn’t leave, but because the process of leaving felt so overwhelming that staying seemed easier. I’ve watched the light go out of someone’s eyes when they describe their sixth month of searching. Not frustration. Something worse. The quiet resignation of someone who has started to believe the system’s verdict: that they’re not good enough.

They are good enough. The system is just too stupid to notice.

The numbers confirm what anyone who’s searched for work already knows in their bones. About 40% of white-collar job seekers in 2024 did not receive any interviews over extended search periods despite sending dozens of applications. Between 52% and 61% were ghosted. Over half encountered discriminatory questions. Over half found the job didn’t match the posting. Underrepresented groups were 67% more likely to be ghosted. And 20% of candidates rejected offers specifically because the interview experience was so poor — meaning the system fails even the people it selects.

These aren’t edge cases. They’re the median experience.

The emotional toll of this is treated as an unavoidable cost — just something you endure, like turbulence on a flight. But it’s not turbulence. It’s a design failure. Sending 32 to 200+ tailored applications, preparing for multiple interviews across weeks, and navigating ambiguous outcomes with zero feedback creates a psychological burden that discourages people from pursuing better-fit opportunities. People settle. People stop trying. People with extraordinary potential accept mediocre situations because the alternative — the process — is too painful to face again.

Job searching shouldn’t require six months, a spreadsheet, fourteen browser tabs, three resume versions, and a therapist.

It should require one tool that actually understands you. That’s what I’m trying to build.

What Glide is not

Glide is not a job board. It doesn’t list jobs and leave you to figure out the rest. It works alongside you from the moment you start exploring to the moment you evaluate an offer — and beyond, with career pathway planning that extends years into the future.

Glide is not an automation tool that applies to jobs on your behalf. I’ve seen what mass-application tools do. They treat hiring as a numbers game, flooding recruiters with low-signal noise and making the problem worse for everyone. Hiring isn’t a numbers game. It’s a matching problem, and matching requires understanding both sides deeply.

Glide is not a replacement for human judgment. The AI provides data, structure, context, and preparation. The decisions are yours. The career is yours. The system should make you more informed, not more passive.

Glide is a career intelligence platform. One that removes the busywork, fills the information gaps, gives you the analytical tools that have always existed on the employer’s side of the table, and lets you focus on the thing that actually matters: finding work that fits who you are and where you want to go.

Why now

Everything I’ve described has been true for years. So why now?

Because three forces are converging in a way that makes this moment uniquely urgent — and uniquely solvable.

The market has structurally tilted against knowledge workers. Analysts increasingly describe the conditions facing college-educated professionals in 2024–2026 as a “white-collar recession,” despite broader economic resilience. Hiring of high-earning professionals dropped to its lowest level since 2014. US job vacancies slid to 7.6 million by December 2024, the weakest level since late 2020. Average monthly job growth fell to about 203,000 over the trailing twelve months, down sharply from the post-pandemic rebound. Applications surged far faster than openings, shifting bargaining power decisively toward employers. The Robert Walters Global Jobs Index reveals gut-wrenching volatility: professional vacancies in January 2025 were 54% higher than December 2024, then dropped 8.3% in February 2025. Technology, media, and energy saw contraction in white-collar postings. Global unemployment may look stable at 190–200 million, but that headline masks the reality: white-collar roles in technology and corporate services are under more pressure than blue-collar work. The middle class is being squeezed from both sides.

AI is simultaneously the threat and the opportunity. The World Economic Forum’s Future of Jobs work shows that AI, automation, and digitization are reshaping task compositions in knowledge work — increasing demand for advanced analytical skills while reducing some routine white-collar tasks. Many candidates apply for roles where their skills are only partially aligned, contributing to high application volumes and low conversion. Meanwhile, 60% of workers blame AI for making the job market more challenging, and candidates worry that AI screening tools may introduce or amplify bias while reducing transparency. But the same technology that’s disrupting the labor market can be used to give individuals the intelligence, context, and preparation they need to navigate it. That’s the asymmetry I’m betting on.

The capability finally exists to build what’s needed. Two years ago, open-weight foundation models weren’t strong enough to serve as starting points for domain-specific training. Five years ago, the GPU infrastructure to train custom models was prohibitively expensive. Today, open-weight checkpoints combined with domain-specific continual pretraining, graph neural networks for relational reasoning, and efficient serving infrastructure (speculative decoding, quantized encoder models, PagedAttention) make it possible to build and deploy custom AI systems at a cost that’s declining fast enough to sustain a business. The components to build Glide didn’t exist when the problem first became obvious. They exist now.

Economic headwinds — high interest rates, corporate caution, geopolitical tension — are causing organizations to defer white-collar hiring, focusing on productivity gains from technology rather than headcount growth. Remote and hybrid work have expanded geographic competition for every role. Skills have a shorter half-life than ever.

Professionals navigating this need more than listings. They need intelligence. They need context. They need a system that adapts as fast as the market does.

Traditional job platforms were built for a slower, simpler world. They assumed you’d search locally, apply to a handful of roles, hear back in a reasonable time, and negotiate from a position of stability. That world is gone.

The tools didn’t keep up. Glide is built for the world that replaced it.

The road ahead

Glide launched with a waitlist because we’re onboarding in small batches. Not as a growth hack. Because the product needs to work deeply for each person before we scale it broadly. Career tools that feel generic are worse than useless — they waste time and erode trust. Personalization at this level requires care, iteration, and the willingness to be slow when the market rewards fast.

We’re building for the job seeker first. The recruiter tools, the talent network features, the employer-side analytics. Those matter, and they’re coming. But the foundation is the individual. The person sitting at their laptop at midnight, staring at a job board, wondering if they’ll ever hear back from any of the 47 applications they’ve submitted this month.

That person has been underserved by every tool the industry has built for twenty years. Not because the technology wasn’t available. Because nobody bothered to build it for them. The business model of every major job platform is built around selling access to candidates, not serving them. The incentives point the wrong direction.

I don’t think it has to be that way. I think you can build a company that’s genuinely, structurally good for the people it serves — and that the business model follows from that, not the other way around.

That person deserves better than what exists.

That’s why I’m building Glide.


Next Post
Where Decisions Disappear