Post-Sale Management
Health Score Models: Designing Effective Customer Health Scoring
A SaaS company tracked customer health using a simple model: green if they logged in this month, yellow if they didn't, red if they hadn't logged in for two months.
The problem: Their churn rate was 15%, but they only predicted 40% of churned customers. Even worse, 30% of their "green" customers churned anyway.
The VP of Customer Success asked: "Why is our health score so bad at predicting anything?"
They dug into the data and found:
- Login frequency alone was basically useless for predicting retention
 - They weren't measuring engagement quality, relationship depth, or whether customers actually saw value
 - Every signal got equal weight, even though some mattered way more than others
 - They missed declining patterns because they only looked at the current month
 - A one-size-fits-all approach meant enterprise and SMB customers got scored identically
 
So they rebuilt their health score from scratch:
- Multiple dimensions: usage, engagement, sentiment, relationship, value
 - Weighted scoring based on what actually predicted retention (usage 35%, engagement 20%, etc.)
 - Trending and momentum tracking—because direction matters as much as the score itself
 - Different models for different segments (enterprise vs SMB have different "healthy" baselines)
 - Quarterly validation against actual renewal outcomes
 
Six months later:
- They predicted 82% of churned customers (up from 40%)
 - False positives dropped 60% (way fewer healthy accounts getting flagged as at-risk)
 - Intervention success rate jumped 45% (because they were acting on real signals, not noise)
 - They identified 25 expansion opportunities they would have missed before
 
The lesson: Not all health scores are created equal. Building one that actually works takes thoughtful design, continuous validation, and a willingness to keep refining it.
Health Score Fundamentals
Purpose and Use Cases
What Health Scores Actually Do: A customer health score quantifies the likelihood that a customer will achieve their goals, stick around long-term, and grow their relationship with you. That's the theory, anyway. In practice, it's your answer to "Should I be worried about this account?"
Here's where you'll actually use them:
CSM Prioritization:
- Which accounts need me to drop everything and call them right now?
 - Where should I spend my limited time today?
 - Which accounts are fine with quarterly check-ins?
 
Risk Management:
- Which customers might churn if I don't do something?
 - How bad is it—yellow alert or red alert?
 - Do I need to intervene this week or can it wait?
 
Opportunity Identification:
- Which accounts are ready for an expansion conversation?
 - Where can I push for deeper adoption without being annoying?
 - Who's happy enough to become a reference customer?
 
Forecasting:
- What's our retention rate looking like next quarter?
 - How much revenue might walk out the door?
 - What's realistically in our expansion pipeline?
 
Executive Reporting:
- Overall portfolio health (the dashboard executives actually look at)
 - How things are trending month to month
 - Whether our initiatives are working or we're just busy
 
Types of Health Scores
You've got three basic flavors of health scores, and they build on each other in complexity.
Descriptive Health Scores: These tell you where things stand right now. "This customer is healthy" or "this one's at risk." They look at recent behavior and current metrics. This is what most companies start with, and honestly, where many stay.
Example: Account XYZ has 75% active users, attended their last QBR, and gave you an NPS of 8. Health score: 78 (Healthy). Simple snapshot of where they are today.
Predictive Health Scores: These try to tell you where things are headed. "This customer will probably churn in 90 days based on their current trajectory." They look at patterns and trends over time. You need decent historical data to pull this off.
Example: Account XYZ's usage is declining 30% per month. Right now they're at a "moderate" 65, but if you run the numbers, they'll hit 42 (At Risk) in 90 days. The insight? Intervene now while you still have a relationship, not when they're already one foot out the door.
Prescriptive Health Scores: These tell you what to do about it. "This customer needs re-onboarding, here's the playbook." They compare patterns from similar accounts to recommend specific actions. This is the most sophisticated approach and usually needs machine learning or a really good data science team.
Example: Account XYZ has a health score of 58. Your system spots that accounts with similar patterns improved by 12-15 points after a targeted feature adoption campaign. Recommended action: Launch the same playbook for this account.
Which one should you build? Start with descriptive—it's your foundation. Add predictive once you have enough historical data to spot patterns. Only build prescriptive if you have the data science resources and enough accounts to make the patterns meaningful.
Score Components and Dimensions
Here are the dimensions most companies track, roughly in order of how much they matter:
1. Product Usage and Adoption (30-40% weight)
- Active users (both the raw number and percentage of licenses they're paying for)
 - Login frequency
 - Feature breadth (how many features they actually use)
 - Feature depth (are they power users or just scratching the surface?)
 - Usage trends (growing, flat, or declining)
 
Why it matters: Usage predicts retention better than anything else. Customers who use your product stick around. Customers who don't are already halfway out the door.
2. Engagement and Activity (15-25% weight)
- How often your CSM talks to them
 - Whether they show up to QBRs
 - Training and webinar attendance
 - Community involvement
 - Email engagement (opens, clicks, responses)
 - How quickly they respond when you reach out
 
Why it matters: Engaged customers have invested time and energy into the relationship. Disengaged customers are one competitive email away from switching.
3. Relationship and Sentiment (15-25% weight)
- Do they have an executive sponsor?
 - Is there an identified champion, and are they still engaged?
 - NPS and CSAT scores
 - Feedback sentiment (are they happy or frustrated?)
 - Relationship strength (your CSM's gut feeling, quantified)
 - Stakeholder coverage (how many people do you know there?)
 
Why it matters: Strong relationships survive product bugs and pricing increases. Weak relationships don't survive much of anything.
4. Support and Issue Resolution (10-15% weight)
- Support ticket volume
 - Issue severity (P1 emergencies vs minor questions)
 - How long issues take to resolve
 - Support satisfaction ratings
 - Escalations
 
Why it matters: Lots of serious tickets means either the product doesn't fit or you've got quality problems. A clean support history usually means smooth sailing.
5. Business Outcomes and Value (10-20% weight)
- Goals achieved (the ones they told you about during the sales process)
 - ROI demonstrated (can they point to actual impact?)
 - Use cases expanded (started with sales, now marketing's using it too)
 - Value milestones hit
 - Business impact metrics they actually care about
 
Why it matters: Customers who see clear value renew. Customers who can't articulate ROI are vulnerable at renewal time.
6. Financial and Commercial (5-10% weight)
- Payment history (on-time vs consistently late)
 - Contract status
 - Expansion history
 - Budget signals (did they just announce layoffs?)
 
Why it matters: Late payments often predict churn. Past expansion usually signals satisfaction.
Weighting and Calculation Methods
How to Figure Out the Right Weights:
Don't just guess. Here's how to do it properly:
Step 1: Dig Into Your Historical Data Run a correlation analysis between each dimension and actual retention. This shows you what really predicts whether customers stick around.
Example Analysis:
- Usage dimension correlation with retention: 0.72 (strong predictor)
 - Engagement dimension correlation: 0.48 (moderate predictor)
 - Sentiment dimension correlation: 0.35 (weak to moderate)
 - Financial dimension correlation: 0.18 (weak predictor)
 
Step 2: Weight Based on Predictive Power Give the most weight to dimensions that actually predict retention. Don't treat everything equally just because it feels fair.
Example Weighting:
- Usage: 35% (strongest predictor gets the most weight)
 - Engagement: 25%
 - Value: 20%
 - Relationship: 15%
 - Financial: 5% (weak predictor gets minimal weight)
 
Step 3: Test It and Adjust Run your weighted model against historical outcomes. If it's not accurate, adjust and try again. This isn't a one-and-done exercise.
Calculation Example:
| Dimension | Weight | Raw Score (0-100) | Weighted Score | 
|---|---|---|---|
| Usage | 35% | 80 | 28.0 | 
| Engagement | 25% | 70 | 17.5 | 
| Value | 20% | 75 | 15.0 | 
| Relationship | 15% | 60 | 9.0 | 
| Financial | 5% | 90 | 4.5 | 
| Total | 100% | — | 74.0 | 
Final Health Score: 74 (Moderate)
Setting Score Ranges and Thresholds
Standard Health Score Ranges:
Healthy (75-100):
- Strong usage and engagement
 - Positive sentiment
 - Retention looks solid
 - Probably ready for expansion conversations
 - What to do: Keep the relationship warm, look for expansion opportunities, ask for referrals
 
Moderate (50-74):
- Acceptable but could be better
 - Some gaps in usage or engagement that need attention
 - They'll probably renew, but it's not a sure thing
 - What to do: Run proactive improvement initiatives, fix the specific gaps you're seeing
 
At Risk (25-49):
- Low or declining usage
 - Weak engagement or relationship
 - Retention is genuinely at risk here
 - What to do: Drop everything, intervene now, get a save plan together, escalate if needed
 
Critical (0-24):
- Barely using the product or completely dormant
 - Zero engagement
 - They're probably going to churn unless you pull off a miracle
 - What to do: Executive escalation, all-hands-on-deck save effort
 
Different Segments Need Different Thresholds:
Not all customers are created equal. What's "healthy" for an enterprise customer might be concerning for an SMB customer.
Enterprise Customers:
- Healthy: 70+ (complex products take forever to roll out)
 - At Risk: <50
 - Why: Enterprise customers have long adoption curves. Lower usage early on doesn't mean they're unhappy—it means they're still getting 5 departments to agree on a workflow.
 
SMB Customers:
- Healthy: 80+ (simpler products, faster adoption)
 - At Risk: <60
 - Why: SMB customers should be up and running fast. If they're not, something's wrong.
 
Your thresholds should reflect your actual data and how different segments behave.
Designing Your Health Score Model
Identifying Outcomes to Predict
Start With the Main Thing: Retention
- Will this customer actually renew?
 - At what contract value?
 - What's the renewal rate going to be?
 
Then Add Secondary Outcomes:
Churn Risk:
- Will they churn in the next 90 days?
 - What kind of churn? (Did they choose to leave, or did they just forget to pay?)
 
Expansion:
- Are they going to expand?
 - By how much?
 - When's the right time to have that conversation?
 
Advocacy:
- Will they be a reference customer?
 - Might they refer other customers?
 - Will they give you a testimonial for your website?
 
Keep It Simple at First: Focus on predicting retention vs churn. That's the thing that really matters. You can add expansion and advocacy prediction later once your retention model actually works.
Selecting Health Score Dimensions
How to Pick the Right Dimensions:
Step 1: Brain Dump Every Signal You Can Think Of
- Product usage metrics
 - How they engage with you
 - Relationship indicators
 - Financial signals
 - Support ticket patterns
 - Sentiment data
 - External signals (are they growing? Did they just get funded? Are they laying people off?)
 
Step 2: Figure Out What You Can Actually Measure Be honest about your data reality:
- Is this data available right now?
 - Can you integrate it without a six-month engineering project?
 - Is the data quality good enough to trust?
 
Step 3: Test What Actually Predicts Retention Run correlation analysis with your actual outcomes:
- High correlation (>0.5): Include this
 - Moderate correlation (0.3-0.5): Consider including it
 - Low correlation (<0.3): Probably skip it unless you have a strategic reason
 
Step 4: Don't Go Overboard
- Too few dimensions: You'll miss important signals
 - Too many dimensions: You'll drown in complexity and maintenance
 - Sweet spot: 4-6 dimensions
 
Start With These Four:
- Usage (always include this—it's the strongest predictor by far)
 - Engagement (how invested they are in the relationship)
 - Sentiment (NPS, CSAT, how they feel about you)
 - Relationship (do they have an exec sponsor? An active champion?)
 
Add others as your data and systems mature: value realization, support quality, financial health.
Determining Data Inputs and Metrics
For Each Dimension, Define Specific Metrics:
Usage Dimension Inputs:
- % of licenses with active users (last 30 days)
 - Average logins per user per week
 of core features used (breadth)
- Depth of usage within key features
 - Usage trend (month-over-month % change)
 
Engagement Dimension Inputs:
- CSM touchpoints per quarter
 - QBR attendance (Y/N)
 - Training sessions attended
 - Email open and click rates
 - Community posts or participation
 
Sentiment Dimension Inputs:
- Most recent NPS score
 - Support CSAT average (last 3 months)
 - Qualitative feedback sentiment
 - CSM relationship rating (1-5 scale)
 
Relationship Dimension Inputs:
- Executive sponsor identified (Y/N)
 - Champion active (Y/N)
 of contacts in CRM
of departments using product
- Relationship depth score (CSM assessment)
 
Financial Dimension Inputs:
- Payment status (current, late, past due)
 - Expansion in last 12 months (Y/N)
 - Contract value (ARR)
 
Data Source Mapping: Document where each metric comes from:
- Product analytics platform
 - CRM system
 - Support ticketing system
 - Survey tools
 - Billing system
 
Establishing Weighting Methodology
Data-Driven Weight Assignment:
Method 1: Correlation Analysis
- Calculate correlation between each dimension and retention
 - Assign weights proportional to correlation strength
 
Example:
- Usage correlation: 0.70 → Weight: 35%
 - Engagement correlation: 0.50 → Weight: 25%
 - Sentiment correlation: 0.40 → Weight: 20%
 - Relationship correlation: 0.30 → Weight: 15%
 - Financial correlation: 0.10 → Weight: 5%
 
Method 2: Regression Analysis
- Run logistic regression with churn as outcome
 - Use coefficient values to inform weights
 - More sophisticated than simple correlation
 
Method 3: Expert Judgment (When Data Limited)
- Survey CSM team on predictive power of each dimension
 - Weight based on consensus
 - Validate against outcomes as data accumulates
 
Method 4: Equal Weighting (Starting Point)
- All dimensions weighted equally
 - Adjust based on performance
 - Quick to implement but less accurate
 
Best Practice: Start with correlation analysis (if data exists) or expert judgment. Refine weights quarterly based on predictive accuracy.
Data-Driven Model Development
Analyzing Historical Data Patterns
Historical Analysis Steps:
Step 1: Gather Retention Data
- Last 12-24 months of customer data
 - Renewal outcomes (renewed vs churned)
 - Final health scores before renewal
 - Dimension scores
 
Step 2: Segment Analysis
- Retention rate by health score range
 - Retention rate by dimension score
 - Segment-specific patterns (enterprise vs SMB)
 
Example Analysis:
| Health Score Range | Retention Rate | Sample Size | 
|---|---|---|
| 90-100 | 98% | 45 | 
| 80-89 | 95% | 112 | 
| 70-79 | 88% | 134 | 
| 60-69 | 75% | 87 | 
| 50-59 | 58% | 56 | 
| <50 | 35% | 41 | 
Insight: Clear threshold at 60 where retention drops significantly.
Step 3: Identify Patterns
- Which churned customers had high scores? (false negatives)
 - Which renewed customers had low scores? (false positives)
 - What signals did we miss?
 
Step 4: Refine Model
- Adjust weights
 - Add missing dimensions
 - Recalibrate thresholds
 
Correlation Analysis with Outcomes
Running Correlation Analysis:
For Each Dimension: Calculate correlation coefficient with retention (0 to 1, higher = stronger relationship)
Example Results:
- Usage score correlation with retention: 0.72
 - Engagement score correlation: 0.48
 - Sentiment score correlation: 0.35
 - Relationship score correlation: 0.52
 - Financial score correlation: 0.21
 
Interpretation:
- Strong predictors (>0.6): Usage
 - Moderate predictors (0.4-0.6): Engagement, Relationship
 - Weak predictors (<0.4): Sentiment, Financial
 
Actions:
- Increase weight for strong predictors (usage)
 - Maintain moderate weights for moderate predictors
 - Reduce weight or remove weak predictors (unless strategic value)
 
Multi-Variate Analysis: Some dimensions may be predictive in combination but not individually. Test combinations:
- Low usage + low engagement = very high churn risk
 - Low usage + high engagement = re-onboarding opportunity
 
Identifying Predictive vs Vanity Metrics
Predictive Metrics: These actually predict what's going to happen. When these numbers move, retention moves.
Examples:
- Active user percentage (real predictor of retention)
 - Login frequency (people who log in regularly stick around)
 - QBR attendance (engaged customers show up)
 - Feature adoption depth (power users don't churn)
 
Vanity Metrics: These look good in a dashboard but don't tell you much about retention. They might correlate with health, but they don't cause it.
Examples:
- Total registered users (meaningless if they're not active)
 - Total data stored (unless storage actually drives value for your product)
 - Product page views (browsing isn't the same as using)
 - Emails sent (sending emails means nothing if nobody opens them)
 
How to Tell the Difference:
Test 1: Does It Correlate With Retention? Run the numbers. If the metric moves and retention doesn't, it's vanity.
- Correlates → Potentially predictive
 - Doesn't correlate → Probably vanity
 
Test 2: Does Improving It Actually Improve Retention? This is the causation test.
- Yes → Predictive
 - No → Vanity
 
Test 3: Does It Change Before Churn or After? Timing matters.
- Changes before churn → Leading indicator (useful!)
 - Changes after churn → Lagging indicator (too late to help)
 
Build your health score on predictive, leading indicators. Leave the vanity metrics for your marketing slides.
Testing and Validating Models
How to Validate Your Model:
Step 1: Test It Against Historical Data
- Run your health score model on past customer data
 - Compare what the model predicted to what actually happened
 - Calculate your accuracy metrics
 
Step 2: Measure How Accurate You Are
True Positive Rate (Did You Catch the Churners?): Of the customers who actually churned, how many did you flag as at-risk?
- Formula: True Positives / (True Positives + False Negatives)
 - Target: >75%
 
True Negative Rate (Did You Get the Healthy Ones Right?): Of the customers who renewed, how many did you correctly flag as healthy?
- Formula: True Negatives / (True Negatives + False Positives)
 - Target: >85%
 
Overall Accuracy: Of all your predictions, how many were right?
- Formula: (True Positives + True Negatives) / Total Customers
 - Target: >80%
 
Step 3: Figure Out Why You Were Wrong
False Positives (you said at-risk, but they renewed):
- Why did your model think they were at-risk?
 - What signal did you miss that showed they were actually fine?
 - How can you adjust the model to reduce these?
 
False Negatives (you said healthy, but they churned):
- What signals did you completely miss?
 - What dimension needs to be added or weighted more heavily?
 - These are more dangerous than false positives—you missed a real risk
 
Step 4: Fix Your Model
- Adjust weights based on what you learned
 - Add dimensions you were missing
 - Recalibrate your thresholds
 - Test it again on historical data
 
Step 5: Keep Watching It
- Track accuracy as the model runs live
 - Compare predictions to actual renewal outcomes every month
 - Keep refining it quarterly
 
Iterating Based on Results
Continuous Improvement Cycle:
Monthly Review:
- Which at-risk accounts actually churned?
 - Were there healthy accounts that churned (miss)?
 - False positive rate (at-risk accounts that renewed)
 - CSM feedback on score accuracy
 
Quarterly Refinement:
- Full model validation
 - Weight adjustments
 - Threshold recalibration
 - Add/remove dimensions
 
Annual Overhaul:
- Major model redesign if needed
 - Incorporate new data sources
 - Adopt new methodologies (ML, etc.)
 
Example Iteration:
Quarter 1:
- Model accuracy: 73%
 - False negative rate: 32% (too many healthy customers churned)
 - Analysis: Usage dimension not weighted heavily enough
 - Action: Increase usage weight from 30% to 40%
 
Quarter 2:
- Model accuracy: 79%
 - False negative rate: 24%
 - Improvement: Catching more at-risk customers
 - New issue: False positives increased
 - Action: Adjust at-risk threshold from <60 to <55
 
Quarter 3:
- Model accuracy: 84%
 - Balanced false positives and negatives
 - CSM feedback: Scores feel accurate
 - Action: Maintain current model, continue monitoring
 
Score Calculation Methods
Simple Weighted Average
This Is What Most Companies Use: Calculate scores for each dimension, apply your weights, add them up. Done.
Here's How It Works:
Step 1: Score Each Dimension (0-100)
- Usage: 75 (based on active users, login frequency, which features they use)
 - Engagement: 80 (touchpoints, QBR attendance, training participation)
 - Sentiment: 70 (NPS, CSAT scores)
 - Relationship: 60 (they have a champion but no exec sponsor yet)
 
Step 2: Apply Your Weights
- Usage: 75 × 0.40 = 30.0
 - Engagement: 80 × 0.25 = 20.0
 - Sentiment: 70 × 0.20 = 14.0
 - Relationship: 60 × 0.15 = 9.0
 
Step 3: Add It Up Total Health Score = 30.0 + 20.0 + 14.0 + 9.0 = 73
Why This Works:
- Simple enough for anyone to understand
 - Easy to explain to stakeholders
 - You can see exactly how each dimension contributes
 - Flexible—easy to adjust weights when you need to
 
The Downsides:
- It's linear, so it doesn't capture complex interactions between dimensions
 - You need data for all dimensions, or the math breaks
 
Red/Yellow/Green Categorical
The Traffic Light Approach: Instead of a numeric score, just assign a color. Simple as that.
How It Works:
- Define what qualifies for each color
 - Check where the account fits
 - Assign the color
 
Example Criteria:
Green (Healthy):
- ≥70% licenses active AND
 - Attended last QBR AND
 - NPS ≥7 AND
 - Executive sponsor is engaged
 
Yellow (Moderate):
- 50-69% licenses active OR
 - Missed last QBR OR
 - NPS 5-6 OR
 - No executive sponsor
 
Red (At Risk):
- <50% licenses active OR
 - No touchpoints in 60 days OR
 - NPS <5 OR
 - Multiple P1 support tickets open
 
Why This Works:
- Super simple
 - Clear action categories (green = maintain, yellow = improve, red = save)
 - Non-technical stakeholders get it immediately
 
The Downsides:
- Not very nuanced—you only get 3 states
 - Hard to prioritize when you have 50 yellow accounts
 - You can't see trending (improving or declining)
 - The thresholds are arbitrary (70% usage gets green, 69% gets yellow—really?)
 
Use this if: You have a small team, simple product, or you're just starting with health monitoring.
Points-Based Scoring
Method: Assign points for specific behaviors or attributes. Sum points to total score.
Example:
| Criteria | Points | 
|---|---|
| ≥80% license utilization | 20 | 
| 60-79% license utilization | 15 | 
| <60% license utilization | 5 | 
| Attended last QBR | 15 | 
| Executive sponsor identified | 15 | 
| Champion active | 10 | 
| NPS 9-10 | 15 | 
| NPS 7-8 | 10 | 
| NPS 0-6 | 0 | 
| No support tickets | 10 | 
| Feature adoption ≥70% | 10 | 
| Total Possible | 100 | 
Customer A:
- 75% utilization: 15 points
 - Attended QBR: 15 points
 - Has exec sponsor: 15 points
 - No champion: 0 points
 - NPS 8: 10 points
 - 2 support tickets: 0 points
 - 80% feature adoption: 10 points
 - Total: 65 points (Moderate)
 
Pros:
- Easy to build and adjust
 - Clear point allocation
 - Flexible (add/remove criteria easily)
 
Cons:
- Can become complex (too many criteria)
 - Point values somewhat arbitrary
 - May not reflect true predictive weights
 
Percentile Ranking
Method: Rank accounts relative to each other, assign health score based on percentile.
Example:
- Top 20% of accounts: 90-100 (Healthy)
 - 20-50%: 70-89 (Good)
 - 50-80%: 50-69 (Moderate)
 - Bottom 20%: 0-49 (At Risk)
 
Pros:
- Relative comparison (shows where account stands vs peers)
 - Automatically adjusts as portfolio improves
 - Useful for benchmarking
 
Cons:
- Score depends on cohort (same behavior = different score in different cohorts)
 - Bottom 20% always "at risk" even if all accounts healthy
 - Not absolute measure
 
Best for: Mature portfolios with large customer bases, benchmarking, prioritization.
Machine Learning Models
The Advanced (and Complicated) Approach: Use ML algorithms to predict churn probability based on historical patterns. This is the fancy option.
Common Algorithms:
- Logistic regression (predicts churn probability from 0 to 1)
 - Random forest (ensemble of decision trees)
 - Gradient boosting (XGBoost, LightGBM)
 - Neural networks (if you have massive datasets)
 
How It Works:
- Input: All your customer data (usage, engagement, everything)
 - The model trains itself on historical churn data
 - Output: Churn probability (0-100%)
 - Your health score = 100 - churn probability
 
Why This Can Be Great:
- Most accurate method (when you have enough data)
 - Captures complex interactions between dimensions
 - Finds patterns humans would never spot
 - Gets better over time as you feed it more data
 
Why This Can Be a Nightmare:
- You need serious data science expertise
 - Requires tons of historical data (think 1000+ customers, 2+ years minimum)
 - "Black box" problem—hard to explain why a score is what it is
 - Infrastructure and maintenance costs add up fast
 
Use this if: You're a large SaaS company with a data team and mature datasets. If you're still figuring out your basic health scoring, skip this for now.
Model Segmentation
Segment-Specific Models
Why Segment: Different customer segments have different behaviors, adoption patterns, and health profiles.
Common Segmentation Approaches:
By Company Size:
- Enterprise (1000+ employees)
 - Mid-Market (100-999)
 - SMB (<100)
 
Differences:
- Enterprise: Slower adoption, complex implementations, longer sales cycles
 - SMB: Fast adoption, simpler usage, higher churn rates
 
By Product or Plan:
- Starter/Basic tier
 - Professional tier
 - Enterprise tier
 
Differences:
- Enterprise plans: More features, higher engagement expected
 - Starter plans: Limited features, lower engagement still healthy
 
By Industry:
- Healthcare
 - Financial services
 - Technology
 - Manufacturing
 
Differences:
- Industry-specific usage patterns
 - Regulatory requirements affect engagement
 - Different value drivers
 
By Use Case:
- Sales teams
 - Marketing teams
 - Engineering teams
 
Differences:
- Different feature usage
 - Different adoption curves
 - Different success metrics
 
Journey Stage Considerations
Health Score by Customer Lifecycle Stage:
Onboarding (0-90 days):
- Lower baseline usage expected (still ramping)
 - Focus on activation milestones
 - Engagement more important than usage
 - Threshold: Moderate = 40+, Healthy = 60+
 
Adoption (90 days - 12 months):
- Usage ramping up
 - Feature breadth expanding
 - Standard health thresholds apply
 - Threshold: Moderate = 50+, Healthy = 70+
 
Maturity (12+ months):
- Expect full usage and engagement
 - Higher thresholds for healthy
 - Look for expansion signals
 - Threshold: Moderate = 60+, Healthy = 75+
 
Renewal Period (60 days before renewal):
- Critical period
 - Lower tolerance for at-risk
 - Extra attention to relationship and sentiment
 - Threshold: At-risk if <65, even if normally moderate
 
Adjust health scoring and thresholds based on customer journey stage.
When to Use Universal vs Segment Models
Universal Model (One Model for All):
Pros:
- Simpler to build and maintain
 - Consistent across portfolio
 - Easier to compare accounts
 
Cons:
- Less accurate (doesn't account for segment differences)
 - May miss segment-specific patterns
 - One-size-fits-all limitations
 
Use When:
- Small customer base (<200 customers)
 - Homogeneous customer segments
 - Early in health scoring maturity
 - Limited data or resources
 
Segment-Specific Models:
Pros:
- More accurate predictions
 - Accounts for segment behaviors
 - Better threshold calibration
 - Enables segment benchmarking
 
Cons:
- More complex to build and maintain
 - Requires sufficient data per segment
 - Harder to compare across segments
 
Use When:
- Large customer base (>500 customers)
 - Diverse customer segments
 - Mature health scoring program
 - Sufficient data per segment (>100 customers)
 
Hybrid Approach:
- Start with universal model
 - Add segment adjustments (segment-specific thresholds)
 - Gradually move to fully separate models as data permits
 
Implementation and Operationalization
Technology and Infrastructure
The Build vs Buy Decision:
Buy: Customer Success Platform
- Tools like Gainsight, Totango, ChurnZero, Catalyst
 - Pros: You're up and running fast, proven functionality, they handle updates
 - Cons: Costs $50k-200k per year, less flexible, you're locked into their system
 - Use this if: You're a mid-to-large CS team with budget and you want speed
 
Build: Custom System
- Stack: Your own data warehouse + BI tool + custom scoring engine
 - Pros: Total control, built exactly for your needs, cheaper long-term
 - Cons: Eats up engineering time, you own all the maintenance, slower to launch
 - Use this if: You have a technical team, unique requirements, and engineering resources to spare
 
Hybrid: Mix and Match
- Core: Use a CS platform for scoring and alerts
 - Custom: Build your own data warehouse for complex analytics
 - Integrations: Connect everything (product analytics, CRM, support)
 - Use this if: You're like most companies—you want a balance of speed and flexibility
 
What You Actually Need:
- Data integration layer (pulls data from all your systems)
 - Scoring engine (does the math to calculate health scores)
 - Visualization layer (dashboards people will actually look at)
 - Alerting system (notifications and automated workflows)
 - Historical database (so you can track trends over time)
 
Data Pipeline and Automation
Automated Data Flow:
Product DB → ETL → Data Warehouse → Scoring Engine → Dashboard
CRM → API → Data Warehouse → Scoring Engine → Dashboard
Support → API → Data Warehouse → Scoring Engine → Dashboard
Survey → Webhook → Data Warehouse → Scoring Engine → Dashboard
Pipeline Steps:
1. Extract:
- Pull data from source systems (product analytics, CRM, support)
 - Schedule: Daily for most metrics, real-time for critical alerts
 - Handle API rate limits and errors
 
2. Transform:
- Normalize data formats
 - Calculate derived metrics (% active users, usage trends)
 - Aggregate to account level
 - Join data from multiple sources
 
3. Load:
- Store in data warehouse
 - Calculate health scores
 - Update dashboards
 - Trigger alerts if thresholds crossed
 
4. Archive:
- Store historical scores for trending
 - Enable year-over-year comparisons
 
Automation Best Practices:
- Monitor pipeline health (alert on failures)
 - Validate data quality (check for anomalies)
 - Document data sources and transformations
 - Version control scoring logic
 
Score Refresh Frequency
How Often to Recalculate:
Real-Time (Continuous):
- Use for: Critical alerts (P1 tickets, payment failures)
 - Requires: Streaming data pipeline, higher infrastructure cost
 - Example: Payment past due → instant alert
 
Daily:
- Use for: Standard health scores, most accounts
 - Requires: Nightly batch job, moderate infrastructure
 - Example: Usage data updated each morning
 
Weekly:
- Use for: Low-touch accounts, less critical metrics
 - Requires: Weekly batch job, simple infrastructure
 - Example: SMB accounts with stable patterns
 
Considerations:
- More frequent = more current but higher cost
 - Less frequent = sufficient for most needs, simpler
 - Hybrid: Real-time for critical, daily for standard
 
Recommended: Daily refresh for health scores, real-time for critical alerts.
Historical Trending and Changes
Why Trending Matters as Much as the Score Itself:
The direction an account is moving matters just as much as where they are right now. A score of 70 that's climbing looks completely different from a 70 that's dropping fast.
Here's what trending tells you:
- Catch problems early, before they become critical
 - Know if your interventions are actually working
 - Spot seasonal patterns you need to account for
 
Time Windows That Matter:
30-Day Change (Short-Term):
- Shows you quick wins or new problems
 - Alert if it drops more than 10 points
 - Good for catching immediate issues
 
90-Day Change (Medium-Term):
- Shows sustained improvement or decline
 - Most actionable timeframe for interventions
 - This is where you should focus
 
12-Month Change (Long-Term):
- Reveals customer lifecycle patterns
 - Good for cohort analysis
 - Helps you understand what "normal" looks like
 
Use Momentum Indicators:
- Improving: ↑ (score going up)
 - Stable: → (score flat, within ±5 points)
 - Declining: ↓ (score going down)
 
Here's Why This Matters:
Account A:
- Current score: 70
 - 30-day change: +8
 - 90-day change: +15
 - Status: Moderate but improving ↑
 - What to do: Whatever you're doing is working—keep it up
 
Account B:
- Current score: 72
 - 30-day change: -12
 - 90-day change: -18
 - Status: Moderate but declining ↓
 - What to do: Something's wrong—investigate now and intervene
 
Same score, completely different situations, totally different actions needed.
Integration with Workflows
Operationalize Health Scores:
CSM Daily Workflow:
- Check dashboard for alerts
 - Review accounts with declining health
 - Focus on at-risk accounts (score <50)
 - Update success plans based on scores
 
Automated Playbooks:
- Health drops to at-risk → Trigger save playbook
 - Health improves to healthy → Trigger expansion playbook
 - 30 days to renewal + moderate health → Trigger renewal prep playbook
 
CRM Integration:
- Sync health scores to CRM (Salesforce, HubSpot)
 - Display on account page
 - Use in reporting and forecasting
 - Trigger sales team alerts (exec escalation)
 
Communication Integration:
- Email alerts to CSMs (daily digest of at-risk accounts)
 - Slack notifications (critical alerts)
 - Automated customer outreach (based on health changes)
 
Meeting Preparation:
- Pull health score before QBR
 - Prepare talking points (wins and concerns)
 - Set agenda based on health insights
 
Model Validation and Refinement
Accuracy Measurement and Tracking
Key Accuracy Metrics:
Predictive Accuracy: Of all predictions, how many were correct?
- Formula: (True Positives + True Negatives) / Total
 - Benchmark: >80% is good, >85% is excellent
 
Precision (Positive Predictive Value): Of customers flagged at-risk, how many actually churned?
- Formula: True Positives / (True Positives + False Positives)
 - Benchmark: >60% (some false positives acceptable to catch all risk)
 
Recall (Sensitivity): Of customers who churned, how many did we flag as at-risk?
- Formula: True Positives / (True Positives + False Negatives)
 - Benchmark: >75% (critical to catch most churn)
 
F1 Score: Balance of precision and recall
- Formula: 2 × (Precision × Recall) / (Precision + Recall)
 - Benchmark: >0.70
 
Track Monthly: Calculate these metrics each month as renewals occur and compare predictions to actuals.
False Positive/Negative Analysis
False Positives (Type I Error): Flagged as at-risk but renewed.
Impact:
- Wasted CSM time
 - Unnecessary interventions
 - Alert fatigue
 - Lower confidence in scores
 
Example: Account flagged as at-risk (score 45) but renewed at 100%.
Analysis:
- Why did model think at-risk? (Low usage)
 - Why did they actually renew? (Still saw value, exec champion)
 - Learning: Add executive sponsor dimension, increase relationship weight
 
False Negatives (Type II Error): Flagged as healthy but churned.
Impact:
- Missed intervention opportunity
 - Lost revenue
 - More dangerous than false positives
 - Erodes trust in model
 
Example: Account flagged as healthy (score 78) but churned.
Analysis:
- What signals did we miss? (New competitor, budget cut)
 - What dimension should catch this? (Competitive intelligence, financial)
 - Learning: Add competitive tracking, increase weight on stakeholder changes
 
Monthly Review Process:
- Identify all false positives and false negatives
 - Analyze root causes
 - Identify model improvements
 - Implement changes
 - Validate on historical data
 
Model Drift Detection
What Is Model Drift: Your model's accuracy degrades over time because your customers, product, or market are changing. What predicted retention six months ago might not work today.
Signs Your Model Is Drifting:
- Accuracy dropping month after month
 - More false positives or false negatives than before
 - CSMs saying "these scores don't feel right anymore"
 - New patterns your model doesn't capture
 
What Causes Drift:
- Product changes (you launched new features or redesigned the UI)
 - Customer behavior evolves (usage patterns shift over time)
 - Market dynamics change (new competitor enters the scene)
 - Your data quality gets worse
 
How to Catch It:
- Track accuracy trends (if it's declining for 3+ months straight, you've got drift)
 - Compare current accuracy to historical accuracy
 - Watch for shifts in your prediction distribution
 
How to Fix It:
- Retrain your model on recent data
 - Add new dimensions that capture new patterns
 - Adjust weights to reflect what matters now
 - Update thresholds based on current behavior
 
How to Prevent It:
- Validate your model every quarter
 - Track accuracy continuously
 - Get regular feedback from your CSM team
 - Document when you make product or go-to-market changes
 
Regular Review and Updates
Model Maintenance Schedule:
Weekly:
- Monitor alert volume and response
 - Track CSM feedback on scores
 - Identify data quality issues
 
Monthly:
- Calculate accuracy metrics
 - Review false positives/negatives
 - Identify quick wins (threshold adjustments)
 
Quarterly:
- Full model validation
 - Weight adjustments
 - Dimension additions/removals
 - Backtest on recent data
 - Implement refinements
 
Annual:
- Comprehensive model review
 - Consider major redesign if needed
 - Adopt new methodologies (ML, etc.)
 - Benchmark against industry standards
 - Align with strategic priorities
 
Documentation:
- Track all model changes
 - Document rationale
 - Measure impact
 - Share learnings with team
 
A/B Testing Model Variations
Test Model Changes Before Full Rollout:
Example A/B Test:
Control (Current Model):
- Usage: 35%
 - Engagement: 25%
 - Value: 20%
 - Relationship: 15%
 - Financial: 5%
 
Variant (Proposed Model):
- Usage: 40% (increased)
 - Engagement: 25%
 - Value: 15% (decreased)
 - Relationship: 20% (increased)
 - Financial: 0% (removed)
 
Test Setup:
- Apply both models to last 6 months of historical data
 - Compare accuracy metrics
 - Identify which model predicts better
 
Results:
| Metric | Current Model | New Model | 
|---|---|---|
| Accuracy | 78% | 84% | 
| Precision | 65% | 72% | 
| Recall | 73% | 81% | 
| F1 Score | 0.69 | 0.76 | 
Decision: New model performs better across all metrics. Implement.
Shadow Mode Testing:
- Run new model in parallel with current model
 - Don't act on new model scores yet
 - Compare predictions to actual outcomes over 1-2 months
 - If new model more accurate, switch
 
Benefits:
- Validate improvements before rollout
 - Reduce risk of making model worse
 - Data-driven decision making
 - Build confidence in changes
 
Using Health Scores Effectively
CSM Prioritization and Focus
Prioritize Accounts by Health:
Tier 1: Critical (Score <40)
- Immediate action required
 - Daily monitoring
 - Save plans, escalation
 - Time allocation: 40% of CSM time
 
Tier 2: At Risk (Score 40-60)
- Proactive intervention
 - Weekly touchpoints
 - Improvement initiatives
 - Time allocation: 30% of CSM time
 
Tier 3: Moderate (Score 60-75)
- Maintain and improve
 - Bi-weekly touchpoints
 - Standard cadence
 - Time allocation: 20% of CSM time
 
Tier 4: Healthy (Score 75+)
- Maintain and grow
 - Monthly touchpoints
 - Expansion conversations
 - Time allocation: 10% of CSM time
 
Dynamic Prioritization: Re-prioritize daily as health scores change. Account that drops from healthy to at-risk moves up the priority list immediately.
Triggering Interventions and Playbooks
Health Score Thresholds Trigger Actions:
Score Drops Below 50:
- Playbook: At-Risk Intervention
 - Actions: Root cause analysis, save plan, weekly check-ins, escalation path
 
Score Drops 15+ Points in 30 Days:
- Playbook: Rapid Decline Investigation
 - Actions: Emergency CSM call, identify cause, immediate intervention
 
Score Improves to 80+:
- Playbook: Expansion Opportunity
 - Actions: Identify expansion signals, schedule expansion call, generate proposal
 
60 Days to Renewal + Score <70:
- Playbook: Renewal Risk
 - Actions: Renewal prep, value reporting, stakeholder mapping, negotiation strategy
 
Automated Playbook Triggers: Integrate health scores with CS platform to automatically launch playbooks when thresholds crossed.
Executive Reporting
Monthly Executive Dashboard:
Portfolio Health Summary:
- Total customers: 487
 - Healthy (75+): 312 (64%)
 - Moderate (50-74): 130 (27%)
 - At Risk (<50): 45 (9%)
 - At-Risk ARR: $2.3M
 
Trends:
- Health improving: 78 accounts (16%)
 - Health declining: 52 accounts (11%)
 - Net trend: Positive
 
Focus Areas:
- Top 10 at-risk accounts (by ARR)
 - Accounts approaching renewal
 - Intervention success stories
 
Actions:
- Saved customers this month: 8 ($450k ARR)
 - Expansion opportunities: 15 ($780k potential)
 
Customer-Facing Health Reports
Sharing Health Insights with Customers:
What to Include:
- Usage metrics (active users, feature adoption)
 - Progress over time (celebrating growth)
 - Benchmarks (vs similar companies)
 - Recommendations (areas for improvement)
 
What to Exclude:
- Actual health "score" or "grade" (feels judgmental)
 - "At risk" or "churn" language (negative framing)
 - Internal scoring methodology
 
Format:
- Part of QBR presentation
 - Monthly email digest
 - Self-service dashboard
 
Example Customer-Facing Language:
"Your adoption grew 18% this quarter! You now have 78 active users and are using 6 of 8 core features. Companies at your adoption level report 2.3x productivity gains.
To unlock even more value: - Expand reporting adoption to managers (40% time savings) - Enable integrations (60% usage increase) - Pilot with marketing team (similar to [Customer X])"
Tone: Positive, helpful, collaborative (not judgmental or punitive)
Avoiding Over-Optimization
Beware of Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure." In other words, the moment you start optimizing for the health score itself, it stops being useful.
Here's What Can Go Wrong:
Gaming the Metrics:
- CSMs start focusing on improving scores rather than actual customer success
 - You optimize for metrics instead of outcomes
 - Example: You push customers to log in more (improves the metric) without actually helping them get value (the outcome that matters)
 
False Comfort:
- High scores make you complacent
 - You miss important context that the score doesn't capture
 - Example: Account has a score of 85, but the executive champion just left the company—your model doesn't track that
 
Tunnel Vision:
- You only pay attention to what's measured
 - Important qualitative signals get ignored
 - Example: Customer is visibly frustrated but still using the product out of necessity (usage high, actual sentiment terrible)
 
How to Avoid These Traps:
Balance Scores with Human Judgment:
- Let CSMs override scores when they have good reason
 - Keep doing regular qualitative check-ins
 - Trust your CSM's gut when it conflicts with the score
 
Track Outcomes, Not Just Scores:
- What matters is retention rate, not health scores
 - Measure customer satisfaction, not just usage numbers
 - Focus on value realization, not just engagement activities
 
Use Multiple Metrics:
- Don't rely on a single health score for everything
 - Track expansion, advocacy, and satisfaction separately
 - Get a holistic view of what's really happening
 
Review Your Model Regularly:
- Make sure scores still predict actual outcomes
 - Adjust when customer behavior patterns change
 - Add new signals when you spot gaps
 
The Bottom Line
Not all health scores are created equal. The difference between a good health score and a useless one comes down to thoughtful design, continuous validation, and a willingness to keep refining it.
When you build a health score model that actually works, here's what you get:
- Churn prediction with >80% accuracy (yes, this is achievable)
 - 4-6 weeks of lead time to intervene before customers churn
 - CSM time spent on accounts that actually need help
 - Data-driven decisions instead of gut feel
 - Proactive customer success instead of constantly reacting to fires
 
A health score model that works has these components:
- Multi-dimensional scoring (usage, engagement, relationship, sentiment—not just one thing)
 - Data-driven weighting (based on what actually predicts retention in your business)
 - Segment-specific models (because enterprise and SMB customers behave completely differently)
 - Historical trending (momentum matters as much as the current score)
 - Continuous validation (check accuracy monthly against actual outcomes)
 - Regular refinement (update the model quarterly as you learn what works)
 
Start simple, test it against real outcomes, and keep improving it. Your health score model is never "done"—it needs to evolve as your product, customers, and market evolve.
Build a model that actually predicts outcomes, not one that just looks impressive in a dashboard.
Ready to build your health score model? Start with customer health monitoring, implement early warning systems, and track retention metrics.
Learn more:

Tara Minh
Operation Enthusiast
On this page
- Health Score Fundamentals
 - Purpose and Use Cases
 - Types of Health Scores
 - Score Components and Dimensions
 - Weighting and Calculation Methods
 - Setting Score Ranges and Thresholds
 - Designing Your Health Score Model
 - Identifying Outcomes to Predict
 - Selecting Health Score Dimensions
 - Determining Data Inputs and Metrics
 - Establishing Weighting Methodology
 - Data-Driven Model Development
 - Analyzing Historical Data Patterns
 - Correlation Analysis with Outcomes
 - Identifying Predictive vs Vanity Metrics
 - Testing and Validating Models
 - Iterating Based on Results
 - Score Calculation Methods
 - Simple Weighted Average
 - Red/Yellow/Green Categorical
 - Points-Based Scoring
 - Percentile Ranking
 - Machine Learning Models
 - Model Segmentation
 - Segment-Specific Models
 - Journey Stage Considerations
 - When to Use Universal vs Segment Models
 - Implementation and Operationalization
 - Technology and Infrastructure
 - Data Pipeline and Automation
 - Score Refresh Frequency
 - Historical Trending and Changes
 - Integration with Workflows
 - Model Validation and Refinement
 - Accuracy Measurement and Tracking
 - False Positive/Negative Analysis
 - Model Drift Detection
 - Regular Review and Updates
 - A/B Testing Model Variations
 - Using Health Scores Effectively
 - CSM Prioritization and Focus
 - Triggering Interventions and Playbooks
 - Executive Reporting
 - Customer-Facing Health Reports
 - Avoiding Over-Optimization
 - The Bottom Line