Shift-share decomposition of W16 (Apr 18–24) vs W15 (Apr 11–17). Global moved 86.04%→86.00%, yet ID and LATAM each contributed ~800% of Δ to the drag, offset by Adult Sexualized Behaviors and Tobacco recovering. The headline calm masks high single-policy and single-market volatility.
W16 (Apr 18–24) vs W15 (Apr 11–17). Each segment's contribution to OMA is decomposed into rate, weight, and interaction effects. Because the global Δ is tiny (−0.04pp), individual segment % of Δ figures can balloon — focus on absolute pp contributions to gauge true scale.
Global Δ = −0.04pp. When a segment contributes −0.30pp (a normal magnitude), it's ~750% of the global change. This is mathematically correct but visually scary.
The right interpretation: treat absolute pp contributions as the signal. Anything > 0.10pp is materially large in absolute terms — and the W16 table has 10+ such items on each side, indicating high underlying volatility.
If next week one of the offsets fails to repeat (e.g., Tobacco continues recovering but Adult Sexualized Behaviors regresses), the headline could swing 1–2pp easily. The current calm is fragile.
Four policies show 0% accuracy in both W15 and W16 yet still contribute meaningfully to the global delta via weight changes:
A persistent 0% on a non-trivial sample is implausible as a true accuracy figure. Likely causes: data filter excluding all "approve" cases for these policies, sampling artifact, or definitional change. Verify before treating these as real signal.
| Market | Acc W15 | Acc W16 | Δ Acc | Wt W15 | Wt W16 | Rate | Weight | Inter | Total | % of Δ |
|---|---|---|---|---|---|---|---|---|---|---|
| ID | 90.22% | 86.81% | −3.40 | 8.70% | 8.85% | −0.296 | +0.006 | −0.005 | −0.295 | 842.3% |
| LATAM | 85.54% | 82.51% | −3.03 | 8.79% | 9.24% | −0.266 | −0.002 | −0.014 | −0.282 | 805.8% |
| SSA | 79.43% | 75.32% | −4.11 | 3.37% | 3.52% | −0.139 | −0.010 | −0.006 | −0.154 | 439.8% |
| PH | 87.45% | 83.99% | −3.45 | 4.15% | 3.95% | −0.143 | −0.003 | +0.007 | −0.139 | 397.7% |
| MENA1 | 86.79% | 84.98% | −1.81 | 5.60% | 7.50% | −0.101 | +0.014 | −0.034 | −0.122 | 347.2% |
| BD | 86.85% | 84.98% | −1.87 | 5.28% | 5.68% | −0.099 | +0.003 | −0.007 | −0.103 | 294.1% |
| JP | 92.01% | 90.48% | −1.53 | 2.28% | 1.07% | −0.035 | −0.072 | +0.018 | −0.088 | 252.1% |
| TR | 88.70% | 84.36% | −4.34 | 1.97% | 1.92% | −0.085 | −0.001 | +0.002 | −0.084 | 241.0% |
| ES | 89.25% | 82.39% | −6.87 | 1.08% | 1.00% | −0.074 | −0.003 | +0.006 | −0.071 | 203.6% |
| MX | 82.78% | 81.87% | −0.91 | 4.65% | 5.26% | −0.042 | −0.020 | −0.006 | −0.068 | 193.8% |
| Top-10 negative subtotal | −1.408 | 4017.4% | ||||||||
| MENA2 | 77.79% | 84.63% | +6.84 | 4.39% | 4.00% | +0.300 | +0.032 | −0.027 | +0.305 | −870.2% |
| VN | 90.16% | 91.69% | +1.53 | 6.84% | 7.73% | +0.105 | +0.037 | +0.014 | +0.155 | −443.5% |
| BR | 83.73% | 87.16% | +3.43 | 4.46% | 4.75% | +0.153 | −0.007 | +0.005 | +0.150 | −429.1% |
| IT | 86.41% | 91.78% | +5.37 | 2.47% | 2.39% | +0.133 | +0.000 | −0.005 | +0.128 | −364.9% |
| MY | 83.36% | 90.47% | +7.11 | 1.67% | 1.52% | +0.119 | −0.004 | −0.007 | +0.108 | −309.6% |
| Top-5 positive subtotal | +0.846 | −2417.3% | ||||||||
ID OMA accuracy fell from 90.22% → 86.81% in W16 (−3.40pp). This is the third significant decline in four weeks: W13 92.79% → W14 89.63% → W15 90.22% → W16 86.81%. Cumulative drop: −5.97pp from W13 baseline.
The market is also gaining global share (8.70% → 8.85%) while accuracy worsens — the interaction effect is small but negative. Suggests either Indonesia-specific moderation quality is degrading, or the additional volume is concentrated in harder-to-judge content.
Action: Request structured Indonesia retrospective. The trend is now clear enough to need a dedicated investigation.
LATAM accuracy fell from 85.54% → 82.51%, a 3.03pp drop. Weight grew slightly (8.79% → 9.24%) which marginally amplified damage via interaction (−0.014pp).
The rate component (−0.266pp) is by far the largest driver. Investigate whether a regional policy change, language model update, or sampling shift hit the LATAM portfolio specifically in W16.
MENA2 accuracy jumped 77.79% → 84.63% (+6.84pp). Weight contracted slightly (4.39% → 4.00%), so this is overwhelmingly a rate story.
Looking back, MENA2 has been a problem region — this single-week recovery is the largest market gain in the dataset. Whether it's durable depends on whether the W14–W15 issue was a one-off (sample anomaly, transient labeling problem) or whether deeper calibration work lifted the floor.
Confirm with the regional team whether structural changes were made.
79.04% → 81.20% (+2.16pp) and its share contracted 12.23% → 10.81% (−1.42pp). Because Tobacco accuracy is well below the global mean (−7.0pp from 86.04%), shrinking its weight is a strong net positive. Combined contribution: +0.334pp (offsetting 952% of the global Δ).At 10.81% of W16 sample weight, Tobacco & Nicotine is the 2nd-largest single policy (after Youth Regulated Goods at 12.29%). Its accuracy moves the global needle directly.
−8.59pp single-week collapseTobacco quality has rebounded ~5.13pp from the W14 trough, but is still 3.46pp below its W13 baseline. The trajectory is clearly positive.
Sample volume: 1,878 → 1,537 cases (−18%). Some of the weight contraction may reflect a sampling change. Verify the methodology hasn't changed.
Below-mean accuracy persistence: at 81.20%, Tobacco is still 4.80pp below the global mean. If volume rebounds before quality recovers further, the helpful weight-effect direction will reverse — Tobacco could flip back to a major drag.
Action: Lock in the recovery — confirm whether the W14 trough was an isolated event and whether the 3-week rebound has structural support, not just regression-to-mean.
| Policy | Acc W15 | Acc W16 | Δ Acc | Wt W15 | Wt W16 | Rate | Weight | Inter | Total | % of Δ |
|---|---|---|---|---|---|---|---|---|---|---|
| Violent Behaviors ? | 76.78% | 57.46% | −19.32 | 1.47% | 1.77% | −0.284 | −0.029 | −0.058 | −0.371 | 1059.4% |
| Gambling - Depiction and Promotion | 69.68% | 59.89% | −9.79 | 1.51% | 2.07% | −0.148 | −0.092 | −0.054 | −0.293 | 836.9% |
| Dangerous Trends - Serious Harm | 68.09% | 63.14% | −4.95 | 4.83% | 4.95% | −0.239 | −0.022 | −0.006 | −0.266 | 758.7% |
| Personal Information - High Risk ? | 84.12% | 53.12% | −31.01 | 0.67% | 0.71% | −0.208 | −0.001 | −0.011 | −0.220 | 627.4% |
| Youth Non-Sexualized Nudity | 76.77% | 74.61% | −2.16 | 4.86% | 5.60% | −0.105 | −0.069 | −0.016 | −0.189 | 540.2% |
| Youth Body Exposure - Light (4-17) | 40.38% | 37.08% | −3.30 | 0.67% | 0.98% | −0.022 | −0.146 | −0.010 | −0.178 | 507.6% |
| Youth Regulated Goods and Services | 73.69% | 72.65% | −1.04 | 12.10% | 12.29% | −0.126 | −0.023 | −0.002 | −0.151 | 430.1% |
| Light Body Exposure ? | 70.00% | 33.17% | −36.83 | 0.08% | 0.30% | −0.029 | −0.036 | −0.082 | −0.147 | 419.9% |
| High Risk Driving | 64.91% | 60.73% | −4.18 | 2.33% | 2.53% | −0.097 | −0.041 | −0.008 | −0.147 | 419.7% |
| Regulated Goods - Marketing/Trade | 47.96% | 48.53% | +0.57 | 1.44% | 1.80% | +0.008 | −0.135 | +0.002 | −0.129 | 367.7% |
| Top-10 negative subtotal | −2.090 | 5967.7% | ||||||||
| Adult Sexualized Behaviors | 54.88% | 58.75% | +3.87 | 5.77% | 5.00% | +0.224 | +0.239 | −0.030 | +0.433 | −1236.1% |
| Tobacco and Nicotine ★ 2nd heaviest policy | 79.04% | 81.20% | +2.16 | 12.23% | 10.81% | +0.264 | +0.099 | −0.031 | +0.334 | −952.5% |
| Invasive Cosmetic Procedures ? | 65.14% | 87.86% | +22.73 | 1.30% | 2.26% | +0.295 | −0.201 | +0.219 | +0.313 | −894.6% |
| Combat sports, Extreme Sports & Stunts | 75.02% | 82.54% | +7.51 | 4.04% | 4.28% | +0.304 | −0.026 | +0.018 | +0.296 | −844.0% |
| Moderate Bullying | 48.14% | 50.83% | +2.69 | 2.26% | 1.62% | +0.061 | +0.247 | −0.017 | +0.290 | −826.3% |
| Top-5 positive subtotal | +1.666 | −4753.5% | ||||||||
Accuracy collapsed 76.78% → 57.46% (−19.32pp). Weight grew (1.47% → 1.77%), so the additional volume entered a now-failing segment — interaction effect (−0.058pp) compounds the damage.
This is one of the largest reputational-risk policy categories. A 19pp accuracy drop combined with growing volume is a serious signal — escalate immediately.
Accuracy fell 84.12% → 53.12% (−31.01pp) on stable weight (~0.69%). The pure rate effect (−0.208pp) entirely explains this row's contribution.
A 31pp drop on a privacy-related, high-stakes policy is alarming. Possible drivers: policy interpretation change, new content vector (e.g., new types of doxxing patterns), or model/labeler retraining gone wrong. Investigate before W17.
Adult Sexualized Behaviors recovered 54.88% → 58.75% (+3.87pp). Weight contracted 5.77% → 5.00% (−0.77pp). Both effects are favorable: rate (+0.224pp) and weight (+0.239pp) — shrinking a below-mean segment helps.
This single policy contributed +0.433pp — by itself, more than 12× the global Δ in the offsetting direction. Worth understanding what drove the accuracy jump (calibration, content shift, sampling) since A.S.B is a chronic problem area.
This policy collapsed by 75.96pp on a tiny sample weight (~0.08–0.13%). Global impact is "only" −0.099pp (246%), but the rate magnitude is unprecedented.
Almost certainly a sample/policy/labeling artifact — a 76pp single-week swing is implausible as a true accuracy change. Verify the W16 sample is representative; if it is, escalate as a critical operational failure.
+0.334pp — the 2nd largest single-policy offset in W1679.04% → 81.20%, +2.16pp) for the second straight week. Both share and absolute volume contracted, and because Tobacco accuracy sits well below the global mean (−4.84pp), shrinking its weight further amplifies the global benefit.| Week | Acc | WoW Acc | Wt % | WoW Wt | Volume (raw) | WoW Vol | Annot. sample |
|---|---|---|---|---|---|---|---|
| W13 (Mar 28–Apr 3) | 84.66% | +6.27 | 13.23% | — | 262,884 | — | 796 |
| W14 (Apr 4–10) | 76.07% | −8.59 | 11.06% | −2.17 | 223,139 | −15.1% | 509 |
| W15 (Apr 11–17) | 79.04% | +2.97 | 12.23% | +1.17 | 288,581 | +29.3% | 1,878 |
| W16 (Apr 18–24) | 81.20% | +2.16 | 10.81% | −1.42 | 197,907 | −31.4% | 1,537 |
OMA total volume W15 → W16: 5,645,117 → 4,544,936 (−19.6%).
Tobacco volume W15 → W16: 288,581 → 197,907 (−31.4%).
If Tobacco had shrunk at the same rate as the pipeline, it would have been ~232K (still 12.23% share). The fact that it dropped to 198K means Tobacco genuinely lost share — about 11.6% relative share contraction, or 1.42 percentage points absolute.
12.23% → 10.81% looks like a meaningful drop. But put in context: Tobacco's share has been oscillating in the 11–13% band for 4 weeks (W13 13.23%, W14 11.06%, W15 12.23%, W16 10.81%). The W16 figure is at the bottom of that band but not an outlier.
Tobacco accuracy (81.20%) is 4.84pp below the global mean (86.00%). For below-mean segments:
This is why the W16 contribution (+0.334pp) is so much bigger than what the +2.16pp accuracy gain alone would predict.
If Tobacco had only improved accuracy (+2.16pp) without the share drop, contribution would be just +0.264pp. The share contraction adds another +0.099pp — about 30% extra leverage. The interaction is small because the two factors moved in opposite directions, which limits the joint effect.
Scenario A — Volume rebounds before accuracy fully recovers: If W17 sees Tobacco volume pop back to ~12% share (a ~+1.2pp share gain) while accuracy plateaus at ~81%, the weight effect flips to negative. A +1.2pp share increase × (81.0% − 86.0%) ≈ −0.06pp drag. Combined with no rate gain, Tobacco could contribute close to zero or slightly negative.
Scenario B — Accuracy recovery stalls or reverses: Tobacco hit 84.66% in W13 — that's the recent ceiling. Without structural fixes, regression to ~79% (the W15 level) is plausible. A −2pp accuracy fall × 11% weight ≈ −0.22pp drag in a single week.
Scenario C — Both reverse: Volume back to 13% AND accuracy back to 79% would create a triple-negative similar to what happened in W14 (−8.59pp accuracy collapse). The W16 +0.334pp tailwind would flip to ~−0.40pp drag — a 0.7pp swing on a single policy.
Action: Monitor whether the W14 trough was caused by an isolated event (sample anomaly, transient labeling problem) or a structural issue. The 3-week recovery looks real but is on a thin base.
The OMA dashboard has two volume-like metrics for each policy:
Tobacco annotation sample: 1,878 → 1,537 (−18.2%). Tobacco's share of total annotation samples: 6.73% → 6.76% — essentially flat. So at the evaluation/sampling level, Tobacco's representation didn't change.
The 31% volume drop is in moderation traffic, not in evaluation effort. This means: Tobacco is being moderated less in production, not just sampled less. Likely drivers: content trends, policy enforcement changes, or seasonality.
54.25% → 86.21% (+31.96pp) ?. DE (+0.71pp) and EN (share contraction effect) added the rest. The recovery is concentrated, not broad.| Market | Region | Acc W15 | Acc W16 | Δ Acc | Wt W15 | Wt W16 | Δ Wt | Δ Vol% | Total | % of Tob Δ |
|---|---|---|---|---|---|---|---|---|---|---|
| MENA2 ? | EMEA | 54.25% | 86.21% | +31.96 | 5.40% | 3.99% | −1.41 | −49.3% | +1.626 | −75.3% |
| VN | APAC | 87.50% | 98.04% | +10.54 | 2.00% | 8.12% | +6.12 | +178.1% | +1.373 | −63.6% |
| DE | EMEA | 58.00% | 64.26% | +6.26 | 7.82% | 6.32% | −1.50 | −44.6% | +0.711 | −32.9% |
| LATAM ? | AMS | 48.89% | 73.34% | +24.45 | 2.92% | 3.52% | +0.60 | −17.2% | +0.679 | −31.4% |
| EN | EMEA | 65.42% | 58.59% | −6.82 | 9.74% | 3.61% | −6.13 | −74.6% | +0.589 | −27.3% |
| MY | APAC | 86.92% | 94.16% | +7.23 | 1.11% | 4.13% | +3.02 | +154.8% | +0.537 | −24.9% |
| TH | APAC | 84.55% | 92.51% | +7.96 | 1.48% | 3.72% | +2.24 | +72.3% | +0.420 | −19.4% |
| BR ? | AMS | 55.25% | 63.94% | +8.69 | 2.97% | 2.57% | −0.40 | −40.7% | +0.319 | −14.8% |
| Top-8 positive subtotal | +6.255 | −289.6% | ||||||||
| ID ? | APAC | 88.14% | 83.66% | −4.47 | 17.17% | 10.21% | −6.96 | −59.2% | −1.090 | +50.4% |
| MENA1 | EMEA | 88.12% | 74.00% | −14.13 | 5.03% | 11.57% | +6.54 | +57.9% | −1.040 | +48.1% |
| SSA ? | EMEA | 83.14% | 65.42% | −17.72 | 1.31% | 4.99% | +3.68 | +160.8% | −0.734 | +34.0% |
| UA | EMEA | 91.61% | 89.69% | −1.92 | 7.72% | 3.60% | −4.12 | −68.1% | −0.588 | +27.2% |
| KR ? | APAC | 96.04% | 81.92% | −14.12 | 2.87% | 1.14% | −1.73 | −72.8% | −0.455 | +21.1% |
| BD ? | APAC | 93.47% | 84.21% | −9.26 | 3.38% | 1.01% | −2.37 | −79.5% | −0.435 | +20.2% |
| Top-6 negative subtotal | −4.342 | +201.0% | ||||||||
Tobacco accuracy in MENA2 jumped 54.25% → 86.21% (+31.96pp). This is the largest single-market accuracy swing in the entire dataset.
Mechanism breakdown:
Why MENA2 mattered so much: a) starting accuracy was extremely low (54%), so improvement headroom was huge; b) it carried a meaningful share (5.4%) of Tobacco moderation, so each accuracy point of improvement multiplied; c) volume contracted 49% so the additional gain wasn't diluted.
Verify: a 32pp single-week swing in one market is implausible as organic improvement. Likely candidates: labeling guideline change, sample composition shift, or regional moderator team retrained. Investigate before W17.
Tobacco volume in VN: 5,776 → 16,065 (+178%). Within-Tobacco share rose 2.00% → 8.12% (+6.12pp). Accuracy rose 87.50% → 98.04% (+10.54pp).
This is the best-case pattern: more volume, higher accuracy. All three shift-share components are positive: rate (+0.21pp), weight (+0.52pp — VN was ABOVE Tobacco mean, so growing helps), interaction (+0.64pp).
Why this matters: if VN sustains 98% Tobacco accuracy at 8% share, it becomes a structural Tobacco anchor. Track whether the volume surge is a one-off (data backlog catch-up?) or new baseline.
ID accounted for 17.17% of Tobacco moderation in W15 — the largest single market. In W16: share crashed to 10.21% (−6.96pp absolute, −41% relative) AND accuracy fell from 88.14% → 83.66% (−4.47pp).
Mechanism: Volume dropped 59% (49,542 → 20,210). Accuracy dropped on top of the volume drop. ID was above Tobacco's mean, so losing share also hurt via weight effect.
Tobacco's +2.16pp gain would have been roughly +3.25pp without the ID drag. Indonesia is now the single largest variable in Tobacco's trajectory.
MENA1 Tobacco share more than doubled (5.03% → 11.57%, +6.54pp absolute). Accuracy collapsed (88.12% → 74.00%, −14.13pp). Volume +57.9% (14,502 → 22,896).
This is the worst possible direction for a Tobacco market: more volume into a now-failing segment.
Pattern: Why did MENA1 Tobacco volume surge while quality crashed? Possible: sudden enforcement campaign or content trend in the region pushed more Tobacco cases into review faster than moderator capacity could absorb. Compare to the broader MENA1 market data (which dropped −1.81pp overall) — Tobacco is the dominant contributor to MENA1's regional decline.
A market's contribution to Tobacco's accuracy delta depends on three factors:
Why MENA2 dominated: low starting accuracy (huge headroom) + meaningful share (5%+) + share contraction at the right time (compounded the rate effect).
Why ID dragged so much: largest share by far (17%), accuracy was above mean at W15 (so losing share hurt), and accuracy also fell (compounding interaction).
EMEA: 47.55% of Tobacco's W16 mix (47.55% share, 94K cases). Accuracy 74.39% — well below Tobacco's 81.20% mean and far below the region-aggregate Tobacco baseline. EMEA Tobacco improved +2.16pp this week, driven almost entirely by MENA2 + DE recovery.
APAC: 44.06% of Tobacco's mix (87K cases). Accuracy 89.79% — well above Tobacco's mean. APAC is the high-quality anchor, but its accuracy slipped −0.12pp this week as ID drag offset VN/MY/TH gains.
AMS: 8.38% of Tobacco's mix (17K cases). Smallest share but biggest WoW accuracy improvement at +15.86pp (LATAM and BR both jumped 8–25pp). Despite the small footprint, AMS contributed +0.84pp to Tobacco's gain (39%).
Implications:
−4.66pp (137% of ID's −3.41pp) — the single biggest drag. Youth Regulated Goods adds another −4.55pp (133%). Adult Sexualized Behaviors −2.35pp (69%). Youth Non-Sexualized Nudity −1.74pp (51%, most sample-backed). Tobacco contributes −1.01pp (30%) — significant but smaller than the youth-content cluster. The youth-content failures with growing weight are the dominant story.For ID-internal analysis, "Global Acc" in the formulas is replaced by ID's overall accuracy (90.22% in W15). Each ID policy is decomposed into rate, weight, and interaction effects relative to ID's mean.
| Policy | Acc W15 | Acc W16 | Δ Acc | Wt W15 | Wt W16 | Δ Wt | Rate | Weight | Inter | Total | % of ID Δ | Sample |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Youth Sexualized Behaviors ? | 70.00% | 47.66% | −22.34 | 9.08% | 15.27% | +6.19 | −2.029 | −1.251 | −1.382 | −4.662 | 137% | 51 → 32 |
| Youth Regulated Goods and Services ? | 78.32% | 26.46% | −51.86 | 8.87% | 8.79% | −0.08 | −4.602 | +0.010 | +0.044 | −4.548 | 133% | 37 → 12 |
| Adult Sexualized Behaviors ? | 58.38% | 41.52% | −16.86 | 4.16% | 7.55% | +3.39 | −0.701 | −1.079 | −0.571 | −2.352 | 69% | 25 → 18 |
| Youth Body Exposure - Light (4-17) ? | 83.02% | 43.38% | −39.63 | 0.44% | 4.51% | +4.07 | −0.174 | −0.293 | −1.614 | −2.081 | 61% | 12 → 11 |
| Light Body Exposure ? | 66.67% | 19.91% | −46.76 | 0.93% | 3.12% | +2.19 | −0.436 | −0.516 | −1.024 | −1.976 | 58% | 3 → 4 |
| Youth Non-Sexualized Nudity ? | 81.80% | 66.91% | −14.89 | 8.36% | 10.48% | +2.12 | −1.245 | −0.178 | −0.316 | −1.739 | 51% | 169 → 98 |
| Tobacco and Nicotine ? | 90.47% | 83.66% | −6.81 | 26.33% | 14.34% | −11.99 | −1.793 | −0.030 | +0.817 | −1.007 | 30% | 91 → 28 |
| Adult Sexual Activity ? | 67.12% | 6.48% | −60.64 | 1.59% | 1.31% | −0.28 | −0.966 | +0.066 | +0.173 | −0.728 | 21% | 9 → 5 |
| Top-8 negative subtotal | −11.946 | −3.270 | −3.873 | −19.10 | 560% | |||||||
| Policy | Acc W15 | Acc W16 | Δ Acc | Wt W15 | Wt W16 | Rate | Weight | Inter | Total | % of ID Δ | Sample |
|---|---|---|---|---|---|---|---|---|---|---|---|
| High Risk Weight Loss & Muscle Gain | 0.00% | 100.00% | +100.00 | 1.37% | 0.62% | +1.368 | +0.673 | −0.746 | +1.295 | −38% | 2 → 1 |
| Frauds & Scams | 72.22% | 100.00% | +27.78 | 5.60% | 2.49% | +1.554 | +0.560 | −0.864 | +1.250 | −37% | 18 → 4 |
| Alcohol | 62.31% | 100.00% | +37.69 | 2.38% | 1.88% | +0.898 | +0.140 | −0.190 | +0.849 | −25% | 5 → 1 |
| Physical Assault | 38.35% | 100.00% | +61.65 | 1.63% | 0.02% | +1.003 | +0.832 | −0.989 | +0.846 | −25% | 5 → 1 |
| High Risk Driving | 70.09% | 100.00% | +29.91 | 3.12% | 1.91% | +0.933 | +0.243 | −0.362 | +0.814 | −24% | 11 → 6 |
| Youth Body Exposure - Sig & Moderate ? | 77.45% | 91.46% | +14.01 | 5.19% | 5.65% | +0.727 | −0.060 | +0.065 | +0.733 | −21% | 60 → 39 |
| Graphic Content - Realistic Fiction | 2.06% | 100.00% | +97.94 | 0.63% | 1.24% | +0.619 | −0.539 | +0.599 | +0.678 | −20% | 3 → 2 |
| Top-7 positive subtotal | +7.102 | +1.849 | −2.487 | +6.464 | −190% | ||||||
Multi-week trajectory: W13 70.00% (sample 16) → W14 41.14% (sample 11) → W15 70.00% (sample 51) → W16 47.66% (sample 32).
The W14–W15 oscillation suggests this category sits on the noise threshold, but the W15 figure (70% on 51 samples) is the most reliable W15 baseline — and W16's 47.66% on 32 samples is large enough that the −22pp gap likely reflects a genuine quality issue.
The compounding factor: share grew 9.08% → 15.27% (+6.19pp). Volume jumped 16,656 → 25,077 (+50%). So more cases entered a now-failing category — interaction effect within ID.
Action: The growing volume + falling accuracy is the worst direction for any policy. Investigate whether the volume growth is content-driven (e.g., a viral trend) or operational (changed routing rules), and whether moderator capacity for this category scaled accordingly.
This is the most statistically reliable drag in ID this week. Even after W16's sample reduction, 98 samples gives reasonable confidence in the −14.89pp signal.
Multi-week: W13 73.25% (sample 43) → W14 94.67% (sample 27) → W15 81.80% (sample 169) → W16 66.91% (sample 98). The category is volatile but the W15→W16 movement is supported by a meaningful sample on both sides.
Share also grew (8.36% → 10.48%, +2.12pp), so the rate × interaction compounded.
Action: This is one of the few ID policy signals that is sample-backed and trend-coherent. Add to the ID quality investigation immediately.
Globally, Tobacco is a +2.16pp gainer. But in ID specifically:
This is the rare combination where weight effect more than offsets rate effect. Tobacco was an above-mean segment in ID (90% vs ID's 90.22% mean was barely above). Cutting its share in half removes some volume from a now-mediocre segment, which in shift-share terms is roughly neutral.
4-week trend (Tobacco accuracy in ID): 96.47% → 96.24% → 88.14% → 83.66%. This is a real degradation pattern, and ID is the single biggest contributor to the global Tobacco x Markets drag (−1.09pp on the Tobacco delta).
The ID Tobacco volume halving is itself worth investigating: is it a real moderation traffic decrease (content trend) or a routing/sampling change?
Volume in this policy went 804 → 7,410 in a single week (a 9× jump). Accuracy went 83.02% → 43.38% (−39.6pp).
This pattern is unusual: a category that was near-marginal in W15 suddenly carries 9× volume at much worse quality. Hypothesis: a content trigger (viral trend, regulatory directive, or routing rule change) flooded this category with cases the moderation pipeline wasn't calibrated for.
Sample is small (12 → 11) so the magnitude is uncertain, but the volume pattern alone is worth investigating — that kind of step-change usually has a discrete cause.
The recovery in W15 was partial and W16 erased it plus more. ID is now its lowest level in the analyzed window.
The decline is increasingly concentrated in Youth-related policies (per W16 data). This points to either: a) a Indonesia-specific content trend (more youth-content moderation cases hitting the system), b) a localization issue with policy guidelines for youth content in Bahasa Indonesia, or c) a moderator team rotation that affected youth-category calibration.
1. The "Title Accuracy" column in this dataset isn't directly comparable to OMA accuracy. Sum of policy contributions reaches −14.4pp while ID's actual OMA delta is −3.41pp. The metric definitions diverge — use this table for relative comparison among ID policies, not for precise contribution accounting.
2. Sample sizes are very small. Most policies have W16 samples between 1 and 30. Only 4 policies cross the 30-sample threshold. This is fundamentally a small-N analysis at the policy × market level.
3. Several policies show 0% or 100% accuracy persistently. These are likely pipeline/data-integrity artifacts, not real moderator behavior. Excluded from the dragger table.
4. Within-ID share doesn't sum to 100%. The dataset's "Weight Proportion" column appears to be normalized within sub-buckets, not within ID overall. Volume-derived share has been used in this analysis instead.
P0 (immediate, integrity risk): Personal Information - High Risk (−31pp), Violent Behaviors (−19.32pp). Both are reputational categories with material accuracy regression on growing or stable weight.
P0 (data integrity): Verify Disparaging Religion (−76pp), 0%-accuracy policies, and other extreme single-policy swings are not sample/labeling artifacts. Swings of this size are more likely measurement issues than real changes.
P1 (regional): ID + LATAM joint investigation. If the cause is shared (e.g., a regional model rollout), one fix solves both. Otherwise treat as independent.
P1 (trend): ID 4-week decline pattern — even if W16 isolated event resolves, the trend itself warrants attention.
P2 (lock in gains): Tobacco & Nicotine recovery (3 weeks now positive) and Adult Sexualized Behaviors offset — confirm structural drivers, not just regression-to-mean.
P3 (signal hygiene): Replace single-week % of Δ as the primary metric for non-trivial WoW reports — when global Δ < 0.1pp, use absolute pp contributions instead.
W14 (Apr 4–10) vs W13 (Mar 28–Apr 3). Each segment's total contribution is decomposed into three additive components. Positive % of Δ = contributed to the decline; negative = offset.
−2.64pp) alone would have caused a 3.4pp decline if the mix hadn't shifted favorably. The actual −1.63pp is the best-case outcome given how much accuracy fell — saved only by favorable weight rebalancing.Rate effect (161%) tells us accuracy degradation within segments — holding mix constant — more than fully explains the decline. This is the "quality got worse" signal.
Weight effect (offset 47%) means the mix actually shifted favorably: segments with above-average accuracy gained share. Without this, the decline would have been ~3.4pp instead of 1.63pp.
Interaction (offset 15%) captures the joint effect — segments that lost accuracy also tended to shrink in weight, providing a small additional buffer.
The sum: −2.64 + 0.76 + 0.24 = −1.63pp, matching the observed global decline exactly.
| Hub | FR W14 | FR W13 | Δ FR | Acc Δ total | Fuzzy explains | Non-fuzzy Δ | Verdict |
|---|---|---|---|---|---|---|---|
| AMS | 1.76% | 1.57% | +0.19pp | −0.12pp | −0.19pp | +0.06pp | Entire decline is fuzzy-driven. Non-fuzzy accuracy actually improved. |
| APAC | 2.08% | 1.59% | +0.49pp | +0.92pp | −0.49pp | +1.41pp | Fuzzy headwind absorbed — non-fuzzy quality improved strongly (+1.41pp). |
| EMEA | 3.12% | 2.86% | +0.26pp | −6.48pp | −0.26pp | −6.21pp | 96% of EMEA's decline is non-fuzzy. Fuzzy is a minor factor here. |
| Global | 2.35% | 2.00% | +0.36pp | −1.63pp | −0.36pp | −1.28pp | Fuzzy = 22%, non-fuzzy = 78% |
AMS accuracy fell just −0.12pp, and the entire decline is explained by the fuzzy rate increase (+0.19pp). Once fuzzy is stripped out, AMS non-fuzzy accuracy actually improved by +0.06pp.
This means AMS's labeling quality is holding steady or improving — the headline number is being dragged by borderline cases being reclassified or new ambiguous content types entering the pipeline.
Action: Consider fuzzy calibration or policy clarification for the specific content types driving the 0.19pp fuzzy increase. This is a recoverable loss.
APAC's reported accuracy improved +0.92pp, but the underlying non-fuzzy improvement is actually +1.41pp — being partially masked by a +0.49pp fuzzy rate increase (the largest of any hub).
APAC absorbed the biggest fuzzy headwind and still delivered the best headline improvement. However, the fuzzy trend (+0.49pp WoW) needs monitoring — if it continues, it will eventually overwhelm the quality gains.
Action: Investigate whether policy updates or new content types in APAC are driving the fuzzy surge. The quality fundamentals are strong, but the fuzzy trajectory is concerning.
EMEA's fuzzy rate only increased +0.26pp, explaining just 4% of its massive −6.48pp accuracy decline. The remaining −6.21pp is pure non-fuzzy accuracy degradation.
This definitively rules out "borderline cases" as an explanation for EMEA's performance. The problem is fundamentally about labeler accuracy, policy interpretation, or operational execution — not content ambiguity.
EMEA also has the highest absolute fuzzy rate (3.12% vs 2.08% APAC, 1.76% AMS), suggesting a structural baseline of ambiguity in its content mix, but the week-over-week change is small.
82.7% → 76.6% (−6.06pp) while still carrying 15.6% of global weight. APAC General Recall is the largest single offset (−30.6%), improving to 90.1% while gaining share.| Hub | Type | Acc W14 | Acc W13 | Δ Acc | GWt W14 | GWt W13 | Rate | Weight | Inter | Total | % of Δ |
|---|---|---|---|---|---|---|---|---|---|---|---|
| EMEA | Appeal | 76.6% | 82.7% | −6.06 | 15.6% | 19.1% | −1.156 | +0.113 | +0.212 | −0.831 | 50.9% |
| EMEA | General Recall | 84.2% | 91.6% | −7.37 | 12.1% | 10.0% | −0.734 | +0.119 | −0.156 | −0.771 | 47.1% |
| EMEA | Analytics Appeal | 77.7% | 89.7% | −12.01 | 4.7% | 3.2% | −0.387 | +0.055 | −0.174 | −0.505 | 30.9% |
| AMS | General Recall | 84.4% | 85.8% | −1.37 | 14.3% | 9.5% | −0.130 | −0.005 | −0.066 | −0.200 | 12.2% |
| APAC | Appeal | 85.4% | 85.7% | −0.27 | 15.9% | 18.8% | −0.052 | +0.006 | +0.008 | −0.037 | 2.3% |
| Negative subtotal | −2.345 | 143.4% | |||||||||
| AMS | Appeal | 72.9% | 78.7% | −5.86 | 5.0% | 10.5% | −0.614 | +0.394 | +0.320 | +0.100 | −6.1% |
| AMS | Analytics Appeal | 79.0% | 62.7% | +16.35 | 0.5% | 0.6% | +0.102 | +0.038 | −0.027 | +0.113 | −6.9% |
| APAC | General Recall | 90.1% | 88.6% | +1.49 | 27.5% | 24.1% | +0.359 | +0.091 | +0.051 | +0.501 | −30.6% |
| Positive subtotal | +0.714 | −43.7% | |||||||||
The −1.156pp rate effect is the single largest driver in this decomposition. EMEA Appeal dropped from 82.7% to 76.6%, a −6.06pp swing, while still carrying 15.6% global weight.
The weight did shrink (19.1% → 15.6%), which partially offset the damage (+0.113pp weight effect, +0.212pp interaction), but the sheer magnitude of the accuracy collapse overwhelms both offsets.
Key question: Is this driven by specific BPO sites, policy updates, or labeler calibration drift? See the "Top Projects" tab for project-level decomposition.
APAC General Recall improved from 88.6% to 90.1% (+1.49pp) while also gaining weight (24.1% → 27.5%). This is the ideal scenario: an above-average segment both improves and grows.
All three effects are positive: rate (+0.359pp), weight (+0.091pp), interaction (+0.051pp), summing to +0.501pp — the single largest offset at −30.6% of the decline.
| Market | Acc W14 | Acc W13 | Δ Acc | GWt W14 | GWt W13 | Rate | Weight | Inter | Total | % of Δ |
|---|---|---|---|---|---|---|---|---|---|---|
| MENA1 | 80.5% | 90.4% | −9.89 | 6.26% | 6.46% | −0.639 | −0.009 | +0.020 | −0.628 | 38.4% |
| EN (GB) | 78.8% | 88.5% | −9.67 | 3.56% | 4.18% | −0.404 | −0.016 | +0.061 | −0.360 | 22.0% |
| SSA | 75.9% | 84.2% | −8.22 | 3.65% | 2.46% | −0.202 | −0.021 | −0.099 | −0.321 | 19.7% |
| DE | 77.4% | 86.8% | −9.42 | 2.93% | 2.90% | −0.273 | +0.000 | −0.003 | −0.276 | 16.9% |
| MENA2 | 75.8% | 80.9% | −5.08 | 4.28% | 4.37% | −0.222 | +0.005 | +0.005 | −0.213 | 13.0% |
| IT | 84.2% | 93.2% | −9.07 | 2.43% | 2.27% | −0.205 | +0.012 | −0.015 | −0.208 | 12.7% |
| IL | 67.2% | 83.8% | −16.55 | 0.36% | 0.43% | −0.071 | +0.001 | +0.011 | −0.059 | 3.6% |
| UA | 74.3% | 77.3% | −3.04 | 1.08% | 1.04% | −0.032 | −0.003 | −0.001 | −0.036 | 2.2% |
MENA1 dropped from 90.4% to 80.5% (−9.89pp) while maintaining roughly stable weight (6.46% → 6.26%). The rate effect (−0.639pp) almost entirely explains its contribution.
This is a nearly pure accuracy regression — no confounding mix shifts. The investigation should focus on what changed in MENA1 labeling quality, policy interpretation, or task distribution during W14.
SSA is unique among all segments: rate, weight, and interaction are all negative.
Rate (−0.202pp): accuracy fell from 84.2% to 75.9%, a −8.22pp drop.
Weight (−0.021pp): SSA's weight grew from 2.46% to 3.65%, but since SSA accuracy (84.2%) was below the W13 global mean (85.9%), this expansion hurts.
Interaction (−0.099pp): the weight grew AND accuracy fell simultaneously — the worst combination.
Key question: Was the SSA weight increase intentional (ramp-up)? If so, quality support did not scale with volume.
IL has the most dramatic accuracy decline of any market (83.8% → 67.2%, −16.55pp), but its small weight (0.36%) limits global impact to just −0.059pp (3.6% of decline).
Still worth flagging: a 16.5pp drop likely indicates a systemic issue — new policy, labeler turnover, or task type change — that could worsen if IL weight increases.
| Project | Type | Acc W14 | Acc W13 | GWt W14 | GWt W13 | Rate | Weight | Inter | Total | % of Δ |
|---|---|---|---|---|---|---|---|---|---|---|
| GCP-TT-Video appeal-GB-en-ALR-MNL | Appeal | 69.9% | 100.0% | 2.25% | 0.36% | −0.108 | +0.266 | −0.568 | −0.410 | 25.1% |
| TT-Video-Analytics Appeal-MENA2-ar-T&S-CAS | Analytics Appeal | 67.1% | 73.5% | 2.25% | 0.51% | −0.033 | −0.215 | −0.112 | −0.360 | 22.0% |
| TT-Video-General Recall General-MENA1-ku-CNX-ANK | General Recall | 84.4% | 96.8% | 2.59% | 2.61% | −0.322 | −0.002 | +0.002 | −0.321 | 19.7% |
| TT-Video appeal-KE/TZ/UG-sw-TP-NBO | Appeal | 69.4% | 81.4% | 1.12% | 0.77% | −0.092 | −0.016 | −0.043 | −0.151 | 9.2% |
| GCP-TT-Video-General Recall General-GB-en-TP-ALB | General Recall | 58.9% | 92.7% | 0.06% | 1.40% | −0.473 | −0.091 | +0.451 | −0.113 | 6.9% |
| GCP-TT-Video appeal-IT-it-TP-BRV | Appeal | 84.7% | 92.3% | 0.87% | 1.60% | −0.122 | −0.046 | +0.055 | −0.113 | 6.9% |
| TT-Video appeal-MENA1-other-TP-MAK | Appeal | 63.4% | 96.1% | 0.22% | 0.53% | −0.174 | −0.032 | +0.101 | −0.104 | 6.4% |
| GCP-TT-Video-General Recall General-DE-de-TLS-LEJ | General Recall | 75.4% | 85.4% | 1.00% | 0.46% | −0.046 | −0.003 | −0.054 | −0.102 | 6.3% |
| TT-Video appeal-MENA1-ar-CNX-IBD | Appeal | 74.2% | 78.9% | 1.48% | 1.17% | −0.056 | −0.021 | −0.014 | −0.091 | 5.6% |
| TT-Video-General Recall General-MENA1-ar-TP-MAK | General Recall | N/A | 100% | 0.00% | 0.53% | −0.534 | −0.075 | +0.534 | −0.075 | 4.6% |
This project's weight surged 6.25x (0.36% → 2.25%) while accuracy crashed from 100% → 69.9%. The interaction effect (−0.568pp) is the largest single component — weight grew dramatically while accuracy fell dramatically.
The weight effect is actually positive (+0.266pp) because the project was above the global mean in W13 (100% vs 85.9%). But the interaction overwhelms it: expanding into what became a low-accuracy segment is a compounding failure.
Key question: Was this a deliberate ramp-up of a previously small project? If so, quality controls didn't scale with volume.
Weight grew from 0.51% → 2.25% (4.4x) while accuracy was already below the global mean (73.5%) and fell further to 67.1%. The weight effect alone (−0.215pp) is the largest component — this is a mix-shift problem, not primarily a rate problem.
All three effects are negative: rate (−0.033), weight (−0.215), interaction (−0.112). A triple headwind totaling −0.360pp (22.0% of decline).
Action: Validate whether this weight increase was intentional. Expanding a chronically below-mean segment without quality uplift compounds the global decline.
This is the cleanest rate-driven case in the top 10: weight barely moved (2.61% → 2.59%), so the rate effect (−0.322pp) almost entirely explains the −0.321pp total contribution.
Accuracy dropped from 96.8% → 84.4% (−12.4pp) — a steep fall from a high base. No mix-shift or weight excuses here; something changed in execution quality.
Action: Investigate what changed for Kurdish-language GR in MENA1 during W14 — policy update, new labeler cohort, or calibration drift.
P0 (immediate): DE-LEJ site — likely site-level outage/failure, accounts for 63.7% of decline from just two projects. Quick root cause identification could recover the most impact.
P1 (this week): Four N/A projects — verify if delivery gaps are fixable. If unplanned, restoring these could offset 167% of the decline (they overlap with rate-driven decline).
P1 (this week): MENA1 accuracy regression — 38.4% of decline, pure rate effect. Check if a policy update or labeler calibration issue occurred during W14.
P2 (track): SSA triple headwind and EMEA GCP weight surge — these are structural issues that need monitoring over W15–W16 to determine if they're transient or persistent.
P2 (track): APAC fuzzy rate surge (+0.49pp) — quality fundamentals are strong but the fuzzy trajectory needs monitoring. AMS fuzzy calibration is a quick-win candidate.
W15 (Apr 11–17) vs W14 (Apr 4–10). OMA jumped +1.92pp — the largest weekly gain in the observed window, fully reversing the W14 decline and finishing 0.10pp above the W13 baseline.
Top 3 markets (EN, LATAM, MENA1) contributed +1.07pp — 56% of the global gain.
Top 5 markets contributed +1.56pp — 82%.
Top 3 policies contributed +1.56pp — 81%.
Top 5 policies contributed +2.42pp — 126% (i.e., the rest of the policies net negative).
Recovery is genuinely concentrated — a handful of policies and markets did most of the work. This is fragile: if next week one or two of these reverse, the headline swings substantially.
The W14–W15 swing pattern (−1.82pp → +1.92pp) is unusually large. Such a near-perfect reversal often suggests operational/sampling causes rather than a true two-step quality change: e.g., a labeling guideline correction issued mid-W14 only fully took effect in W15.
Volume context: OMA pipeline grew +21% from W14 (4.66M) to W15 (5.65M). The bigger sample base may have stabilized noisy categories.
79.59% → 83.29% (+3.71pp). The contributors are spread: EN +9.90pp, MENA1 +5.90pp, SSA +3.49pp, ES +8.40pp. AMS added +24% (LATAM +4.39pp, CA +7.58pp, BR +2.19pp). APAC contributed +15%, with BD +5.44pp and PH +3.10pp leading.| Market | Acc W14 | Acc W15 | Δ Acc | Wt W14 | Wt W15 | Rate | Weight | Inter | Total | % of Δ |
|---|---|---|---|---|---|---|---|---|---|---|
| EN | 78.79% | 88.69% | +9.90 | 3.56% | 4.06% | +0.352 | −0.027 | +0.049 | +0.375 | −20% |
| LATAM | 81.15% | 85.54% | +4.39 | 7.92% | 8.79% | +0.348 | −0.026 | +0.038 | +0.360 | −19% |
| MENA1 | 80.89% | 86.79% | +5.90 | 5.82% | 5.60% | +0.343 | +0.007 | −0.013 | +0.337 | −18% |
| BD | 81.41% | 86.85% | +5.44 | 4.92% | 5.28% | +0.268 | −0.010 | +0.020 | +0.278 | −14% |
| CA | 73.60% | 81.18% | +7.58 | 2.69% | 2.31% | +0.204 | +0.039 | −0.028 | +0.215 | −11% |
| SSA | 75.94% | 79.43% | +3.49 | 3.66% | 3.37% | +0.128 | +0.024 | −0.010 | +0.141 | −7% |
| BR | 79.45% | 81.64% | +2.19 | 5.30% | 4.46% | +0.116 | +0.039 | −0.018 | +0.137 | −7% |
| PH | 84.34% | 87.45% | +3.10 | 3.82% | 4.15% | +0.119 | +0.007 | +0.004 | +0.130 | −7% |
| Top-8 positive subtotal | +1.972 | −103% | ||||||||
| MX | 88.85% | 82.78% | −6.07 | 3.94% | 4.65% | −0.239 | +0.033 | −0.043 | −0.249 | +13% |
| PK | 92.40% | 90.69% | −1.71 | 7.18% | 6.43% | −0.123 | −0.062 | +0.013 | −0.172 | +9% |
| KR | 92.22% | 89.42% | −2.80 | 1.96% | 1.99% | −0.055 | +0.003 | −0.001 | −0.053 | +3% |
| Top-3 negative subtotal | −0.474 | +25% | ||||||||
UK English market accuracy jumped 78.79% → 88.69%, a +9.90pp rebound on slightly growing share (3.56% → 4.06%). Rate effect (+0.352pp) dominates — virtually all of EN's contribution comes from genuine accuracy improvement, not mix shift.
Combined with EN's W14 collapse, this looks like a clean V-shape: something specific to English-language moderation broke in W14 and got fixed by W15.
MX accuracy fell from 88.85% → 82.78%, while almost every other market rose. Weight also grew (3.94% → 4.65%, +0.71pp), so the additional volume entered a now-failing market — interaction effect (−0.043pp) compounds the damage.
Possible causes: a Mexico-specific moderation issue (Spanish-language LATAM policy interpretation) that didn't share the W14→W15 fix that helped most other markets. Worth checking against MX-specific policy/labeler rotation.
| Policy | Acc W14 | Acc W15 | Δ Acc | Wt W14 | Wt W15 | Rate | Weight | Inter | Total | % of Δ |
|---|---|---|---|---|---|---|---|---|---|---|
| Combat sports, Extreme Sports & Stunts | 66.24% | 75.02% | +8.79 | 5.37% | 4.04% | +0.472 | +0.237 | −0.116 | +0.592 | −31% |
| Designated Dangerous Entities | 50.75% | 68.18% | +17.43 | 2.34% | 1.59% | +0.408 | +0.252 | −0.132 | +0.529 | −28% |
| Violent Behaviors ? | 51.08% | 76.78% | +25.70 | 1.65% | 1.47% | +0.425 | +0.062 | −0.048 | +0.439 | −23% |
| Alcohol | 67.99% | 69.67% | +1.68 | 5.84% | 3.50% | +0.098 | +0.379 | −0.039 | +0.437 | −23% |
| Highly Imitable Acts | 46.45% | 40.09% | −6.36 | 3.00% | 1.61% | −0.191 | +0.524 | +0.088 | +0.421 | −22% |
| Personal Information - High Risk ? | 42.65% | 84.12% | +41.47 | 0.91% | 0.67% | +0.378 | +0.101 | −0.101 | +0.378 | −20% |
| Regulated Goods - Marketing/Trade | 33.32% | 47.96% | +14.64 | 1.61% | 1.44% | +0.236 | +0.088 | −0.025 | +0.298 | −16% |
| Tobacco and Nicotine ★ heaviest policy | 76.07% | 79.04% | +2.97 | 11.06% | 12.23% | +0.328 | −0.094 | +0.035 | +0.269 | −14% |
| Top-8 positive subtotal | +3.363 | −175% | ||||||||
| Dangerous Trends - Serious Harm | 77.45% | 68.09% | −9.36 | 4.61% | 4.83% | −0.432 | −0.015 | −0.021 | −0.467 | +24% |
| Adult Sexualized Behaviors | 58.55% | 54.88% | −3.67 | 5.06% | 5.77% | −0.186 | −0.180 | −0.026 | −0.391 | +20% |
| Reference to Cannabis, Drugs | 71.95% | 53.36% | −18.59 | 0.94% | 1.17% | −0.174 | −0.029 | −0.044 | −0.247 | +13% |
| Firearms & Explosive Weapons | 75.86% | 71.18% | −4.69 | 2.80% | 3.50% | −0.131 | −0.057 | −0.032 | −0.221 | +12% |
| Top-4 negative subtotal | −1.326 | +69% | ||||||||
Accuracy rose 66.24% → 75.02% (+8.79pp) AND share contracted 5.37% → 4.04% (−1.33pp). Both effects favorable: rate (+0.47pp) from accuracy improvement, weight (+0.24pp) from below-mean segment shrinking.
4-week trajectory: W13 ~80% → W14 66% → W15 75% → W16 ~83%. The category has a clear W14 trough that is recovering through W16. Suggests a labeling-guideline change for sports content was rolled back or refined.
Accuracy actually got worse: 46.45% → 40.09% (−6.36pp). But weight halved: 3.00% → 1.61%. Since this segment was deeply below the global mean (84%), removing it from the mix is a strong positive even though its quality regressed.
Net: weight effect +0.52pp dwarfs the rate damage −0.19pp, total contribution +0.42pp.
This is a textbook "good-because-it-shrunk-not-because-it-improved" case. The weight change might reflect a sampling/routing decision — verify it's not just a statistical artifact.
Accuracy fell from 77.45% → 68.09% (−9.36pp) on growing share (4.61% → 4.83%). All three effects negative: rate −0.43pp, weight −0.02pp, interaction −0.02pp.
4-week trajectory: W13 ~69% → W14 77% → W15 68% → W16 65%. The W14 figure looks like the outlier — W13/W15/W16 cluster around 65–70%. So Dangerous Trends didn't really regress in W15; rather, W14 was an anomaly that the W15 reading reverted from.
Implication: this category sits structurally low (~65–70%) and any single-week reading is volatile. The W15 "drag" is partly a methodology artifact of comparing against an unusually high W14 baseline.
| Market | Region | Acc W14 | Acc W15 | Δ Acc | Sh W14 | Sh W15 | Δ Sh | Total | % of Tob Δ | Sample |
|---|---|---|---|---|---|---|---|---|---|---|
| BR | AMS | 13.11% | 55.25% | +42.14 | 3.65% | 2.97% | −0.68 | +1.682 | −57% | 7 → 32 |
| SSA | EMEA | 26.77% | 83.14% | +56.38 | 2.90% | 1.31% | −1.59 | +1.524 | −51% | 8 → 15 |
| EN | EMEA | 54.04% | 65.42% | +11.37 | 11.38% | 9.74% | −1.64 | +1.469 | −50% | 20 → 39 |
| UA | EMEA | 69.92% | 91.61% | +21.70 | 3.42% | 7.72% | +4.30 | +1.411 | −48% | 18 → 83 |
| ES | EMEA | 0.00% | 73.80% | +73.80 | 1.18% | 0.82% | −0.36 | +0.875 | −29% | 6 → 16 |
| MENA1 | EMEA | 76.83% | 88.12% | +11.29 | 6.85% | 5.03% | −1.82 | +0.554 | −19% | 23 → 56 |
| PK | APAC | 82.69% | 92.17% | +9.48 | 9.32% | 6.10% | −3.22 | +0.365 | −12% | 45 → 103 |
| CA | AMS | 23.87% | 29.93% | +6.06 | 1.17% | 0.54% | −0.63 | +0.364 | −12% | 3 → 9 |
| Top-8 positive subtotal | +8.245 | −278% | ||||||||
| MENA2 ? | EMEA | 77.27% | 54.25% | −23.02 | 13.21% | 5.40% | −7.81 | −1.338 | +45% | 23 → 27 |
| LATAM ? | AMS | 100.00% | 48.89% | −51.11 | 0.81% | 2.92% | +2.11 | −0.987 | +33% | 1 → 17 |
| ID | APAC | 96.24% | 88.14% | −8.10 | 14.22% | 17.17% | +2.95 | −0.798 | +27% | 23 → 92 |
| Top-3 negative subtotal | −3.123 | +105% | ||||||||
ID Tobacco accuracy declined from 96.24% → 88.14% (−8.10pp) while ID's overall OMA was actually rising. Tobacco share also grew (14.22% → 17.17%, +2.95pp), so more cases entered a now-failing segment.
This was the start of a sustained ID Tobacco decline:
The W15 drop was the inflection point. Whatever broke ID-Tobacco moderation appears to have started here.
Same framework as W16: each ID policy is decomposed against ID's overall accuracy mean (89.63% in W14).
| Policy | Acc W14 | Acc W15 | Δ Acc | Wt W14 | Wt W15 | Rate | Weight | Inter | Total | % of ID Δ | Sample | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| High Risk Weight Loss & Muscle Gain ? | 100.00% | 0.00% | −100.00 | 1.22% | 1.37% | −1.221 | +0.015 | −0.147 | −1.352 | −232% | 1 → 2 | |
| Youth Regulated Goods and Services ? | 91.96% | 78.32% | −13.65 | 8.61% | 8.87% | −1.175 | +0.006 | −0.036 | −1.205 | −207% | 10 → 37 | |
| Tobacco and Nicotine ? | 96.24% | 90.47% | −5.77 | 21.02% | 26.33% | +5.31 | −1.213 | +0.351 | −0.306 | −1.168 | −200% | 23 → 91 |
| Frauds & Scams ? | 100.00% | 72.22% | −27.78 | 1.22% | 5.60% | +4.38 | −0.339 | +0.453 | −1.215 | −1.101 | −189% | 2 → 18 |
| Sexualized Animation & Illustration - Suggestive ? | 100.00% | 0.00% | −100.00 | 1.22% | 0.99% | −1.221 | −0.023 | +0.226 | −1.018 | −175% | 1 → 2 | |
| Youth Non-Sexualized Nudity ? | 94.67% | 81.80% | −12.87 | 4.52% | 8.36% | +3.84 | −0.581 | +0.194 | −0.495 | −0.882 | −151% | 27 → 169 |
| Adult Sexualized Behaviors ? | 66.08% | 58.38% | −7.70 | 2.47% | 4.16% | +1.69 | −0.191 | −0.397 | −0.130 | −0.717 | −123% | 3 → 25 |
| Policy | Acc W14 | Acc W15 | Δ Acc | Wt W14 | Wt W15 | Rate | Weight | Inter | Total | % of ID Δ | Sample | W16 fate |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Youth Body Exposure - Sig & Moderate ? | 46.27% | 77.45% | +31.18 | 8.89% | 5.19% | +2.771 | +1.606 | −1.154 | +3.223 | +553% | 23 → 60 | held strong (+14pp more) |
| Youth Sexualized Behaviors ? | 41.14% | 70.00% | +28.86 | 9.69% | 9.08% | +2.796 | +0.293 | −0.174 | +2.915 | +500% | 11 → 51 | collapsed (W16: −22.34pp) |
| Personal Information - High Risk ? | 20.82% | 50.00% | +29.18 | 5.86% | 0.62% | +1.710 | +3.606 | −1.529 | +3.788 | +650% | 3 → 2 | still volatile |
| Combat Sports, Extreme Sports, & Stunts ? | 33.33% | 98.86% | +65.53 | 0.64% | 3.46% | +0.422 | −1.585 | +1.844 | +0.682 | +117% | 6 → 7 | held at 100% |
| Youth Body Exposure - Light (4-17) ? | 65.51% | 83.02% | +17.51 | 2.47% | 0.44% | +0.433 | +0.491 | −0.357 | +0.568 | +97% | 3 → 12 | collapsed (W16: −39.63pp) |
Real signals (sustained across W15 and W16):
Likely noise (reversed within one week):
Implication: ID-policy crossbreaks need either much larger samples or aggregation across weeks before they yield reliable signal. The market-level OMA (90.22% → 86.81%) is more trustworthy as a signal than any individual policy-level reading.
4-week trajectory:
The W15 drop is the most credible single reading because of the sample expansion (23→91). It's also the moment when ID Tobacco started diverging from global Tobacco, which was recovering in W15.
Whatever started breaking ID Tobacco moderation appears to have happened around the W14/W15 boundary. By W16 the share contraction had absorbed most of the rate damage, but the underlying quality issue persists.
The W15 recovery was small and didn't restore the W13 baseline. By W16 ID had fallen to a 4-week low (cumulative −5.97pp from W13).
ID was barely participating in the W14→W15 global recovery (most other markets recovered 3–10pp; ID only +0.58pp). This was an early signal that ID had a more persistent issue than the policy-interpretation hiccup that affected most other markets in W14.
P0 (root cause): Identify the W14 → W15 swing cause. Was it a labeling-guideline change, a moderator team rotation, a sampling pipeline correction? If unknown, the same issue could recur.
P1 (verification): Confirm 4-week trajectories for the top recovery policies (Combat Sports, Designated Dangerous Entities, Violent Behaviors) — are they sustained in W16? Most are, which is good news.
P1 (early warning): ID Tobacco decline started here. Track whether the W15 inflection persists into W17.
P2 (outliers): MX, PK, Dangerous Trends regressed during the recovery week. Investigate whether they share a common cause or are independent local issues.
P3 (data hygiene): Several extreme single-policy swings (Personal Information +41pp, Sexualized Animation -45pp) rest on samples below 80. Use directional reading, not magnitudes, until samples expand.
W13 (Mar 28 – Apr 3) is the baseline reference for W14 analysis. A standalone W13 vs W12 RCA would require Overall Moderation Accuracy data for W12, which is not currently available.
A standalone view of policies that read exactly 0% accuracy in every week they appear in the W13–W16 window. Filter rules:
correct_cases / total_cases = acc × W where W is the moderation weight. The 15 persistent-0 policies have weight values reported in the policy breakdown (a subset of OMA's total moderation traffic). Because acc=0 for these policies, they contribute 0 correct cases to the numerator. Excluding them: numerator unchanged, denominator shrinks by their weight. Result: a +0.6 to +1.0pp lift in OMA.| Week | 0% policies | OMA total weight | 0%-policy weight | 0% share | OMA measured | OMA adjusted | Gain |
|---|---|---|---|---|---|---|---|
| W13 (Mar 28–Apr 3) | 11 | 4,790,672 | 59,727 | 1.25% | 85.94% | 87.03% | +1.09pp |
| W14 (Apr 4–10) | 12 | 4,661,402 | 56,282 | 1.21% | 84.12% | 85.15% | +1.03pp |
| W15 (Apr 11–17) | 13 | 5,645,117 | 41,891 | 0.74% | 86.04% | 86.68% | +0.64pp |
| W16 (Apr 18–24) | 15 | 4,544,936 | 46,319 | 1.02% | 86.00% | 86.89% | +0.89pp |
| Policy | W13 weight | W14 weight | W15 weight | W16 weight | Total weight |
|---|---|---|---|---|---|
| Adult Sexual Abuse | 10,951 | 11,956 | 9,669 | 9,091 | 41,667 |
| Youth Sexual Abuse - Depiction | 12,053 | 1,935 | 8,471 | 7,445 | 29,904 |
| Youth Physical Abuse, Assault & Neglect | 12,466 | 6,307 | 5,073 | 6,894 | 30,740 |
| Suicide & NSSI - Highly Harmful | 7,069 | 8,969 | 3,392 | 3,806 | 23,236 |
| Graphic Content | 2,306 | 7,965 | 4,026 | 5,249 | 19,546 |
| Animal Abuse & Graphic Content | 4,877 | 1,062 | 2,995 | 7,117 | 16,051 |
| Blood | 2,016 | 10,661 | 1,599 | 555 | 14,831 |
| Highly Harmful Adult Sexual Abuse - Visual Depiction | 772 | 352 | 1,855 | 2,446 | 5,425 |
| Human Exploitation - Risk | 4,257 | 3,025 | 2,281 | 1,716 | 11,279 |
| Youth Sexual Objectification & Fetish | — | 508 | 294 | 612 | 1,414 |
| Youth Sexual Abuse - Facilitation and Trade | — | — | 1,202 | 532 | 1,734 |
| Youth Sexual Abuse — Promotion and Admission | — | — | 389 | 590 | 979 |
| Human Exploitation - Facilitation | 2,770 | 1,771 | 645 | 79 | 5,265 |
| ERT Transfer | — | 1,771 | — | 108 | 1,879 |
| Dangerous Misinformation - Policy Tag | 190 | — | — | 79 | 269 |
| Total — all 15 policies | 59,727 | 56,282 | 41,891 | 46,319 | 204,219 |
Most of the 15 policies share a property: they cover content that gets auto-removed by upstream classifiers with very high precision:
Real-world moderator behavior on these categories is overwhelmingly correct, because the obvious cases never reach human review — they're auto-actioned. What ends up in the OHA/OMA sample is the residual: edge cases that escaped automated enforcement, where moderator judgment is genuinely difficult.
On these residuals:
Bottom line: 0% accuracy on these categories is almost certainly NOT a real moderator quality signal.
For each week, we have:
Excluding 0%-policies:
0 × cases = 0 correct. Removing them doesn't change the numerator.W16 example: OMA = 86.00% on 4,544,936 production weight. 0%-policies = 46,319 weight (acc=0 each). Numerator = 0.86 × 4,544,936 = 3,908,645 correct. Excluded numerator = 0. New = 3,908,645 / (4,544,936 − 46,319) = 3,908,645 / 4,498,617 = 86.89%. Gain = +0.89pp.
This gives a real OMA-level number, not a within-policy-table approximation.
The page above uses the OMA evaluation sample as the case base — the population that gets human-reviewed for OMA scoring. This is the most direct reading of "headline OMA if these policies were excluded from sampling."
An alternative is to use the raw moderation weight (production traffic volume) as the case base. The two diverge slightly because OMA samples are not perfectly proportional to production traffic.
Production-traffic basis (page default): W13 +1.09pp, W14 +1.03pp, W15 +0.64pp, W16 +0.89pp.
Sample-base alternative: W13 +0.74pp, W14 +0.66pp, W15 +0.54pp, W16 +0.67pp.
The production-traffic view answers: "if we removed the production volume from the moderation pipeline that becomes 0%-policy in OMA, what would OMA become?" The sample view answers: "if we removed those same cases from the OMA evaluation sample, what would the headline read?"
For most reporting purposes the sample view is the right one. For estimating production-quality impact (e.g., what would happen if upstream classification took over these categories entirely), the production-traffic view is more relevant.
P0 — investigate the metric pipeline. Confirm the hypothesis that these 15 categories surface only their residual cases (not the auto-actioned majority). If true, the policy table's accuracy column for these categories is mathematically incapable of being non-zero — it's a metric definition issue, not a moderator performance issue.
P1 — separate display. In future reports, segregate persistent-0 policies from the main accuracy breakdown. The headline number should still include them (because they're real moderation cases) but the per-policy ranking shouldn't surface them at the top of "drag" lists, where they create false alarms.
P2 — reframe the headline. Consider reporting both "OMA measured" and "OMA excluding persistent-0 categories" as paired numbers. The 0.5–0.7pp gap is stable enough across weeks that it could be a standing footnote.
P3 — sample the underlying population properly. If sampling logic restricts to residuals for these categories, expand sampling to include automated-action cases for ground-truth verification. This would let these categories report meaningful non-zero accuracies.