🦉

Moderation Quality · Weekly RCA Report

W17 +0.75pp

W16 −0.04pp

W15 +1.92pp

W14 −1.82pp

W13

⊘ Persistent 0%

OMA gained +0.75pp — but 99% of it came from fixing the persistent-0% measurement drag

W17 vs W16 (May 6 refresh, consistent across all weeks): 86.50% → 87.25%. The persistent 0% drag dropped from −0.75pp (W16) to −0.002pp (W17) — that drag-resolution alone explains +0.7458pp of the +0.7530pp gain. Underlying market/policy quality was nearly flat: VN dragged −4.28pp, Tobacco −4.57pp, offset by MENA1 +6.41pp, SSA +7.27pp, BD +6.71pp.

Global OMA W17

87.25%

▲ +0.75pp

from W16 86.50% · May 6 refresh, consistent across W14–W17

EMEA · Wt 32.82%

85.46%

▲ +1.28pp

+61% of global gain

APAC · Wt 47.07%

90.19%

▲ +0.41pp

+32% of global gain

AMS · Wt 20.11%

83.30%

▲ +0.33pp

+7% of global gain

◈ Overview +0.75pp

99% from persistent-0 fix

⊘ Persistent 0% Issue 99% of gain

case-grain attribution

◉ Markets EMEA 61%

MENA1+BD+SSA lead, VN drags

★ Top Policies mixed

Tobacco −0.37pp, top dragger

☺ Tobacco −0.37pp

reversed 4.57pp on growing share

☑ Prior Actions W16 follow-up

most resolved or reframed

⚡ Actions 5

verify, monitor, dig

Methodology & headline summary

W17 (Apr 25–May 1) vs W16 (Apr 18–24). May 6 refresh data is consistent across W14–W17 — the dataset-shift confound from earlier pulls is resolved. All numbers below use the refresh.

Rate effect = GWt_W16 × (Acc_W17 − Acc_W16) — pure accuracy change at prior weight
Weight effect = (GWt_W17 − GWt_W16) × (Acc_W16 − Global Acc_W16) — mix shift relative to global mean
Interaction = (GWt_W17 − GWt_W16) × (Acc_W17 − Acc_W16) — joint change

⚙

99% of the +0.75pp OMA gain came from a measurement-pipeline fix, not from quality improvement

The persistent-0% policy drag dropped from −0.75pp (W16) to −0.002pp (W17) — a +0.7458pp shift driven entirely by the metric pipeline. Of W17's +0.7530pp headline gain, +0.7458pp (99.0%) is mathematically attributable to this drag resolution. Underlying moderator quality contributed only +0.0072pp. See the Persistent 0% Issue tab for full case-grain attribution.

▲

Under the hood: real moderation quality is nearly flat — gainers and draggers offset

Strong gainers: MENA1 +6.41pp (+0.41pp contribution), BD +6.71pp (+0.34pp), SSA +7.27pp (+0.28pp), KR +5.38pp (+0.11pp). Strong draggers: VN −4.28pp on growing share (−0.29pp), Tobacco −4.57pp on growing share (−0.37pp), Firearms −8.79pp (−0.34pp), Alcohol −8.00pp (−0.32pp), Invasive Cosmetic −8.53pp (−0.30pp). Net of these visible moves: ≈ +0.01pp. The remaining +0.74pp is the measurement fix.

⚠

Don't celebrate the headline — the underlying signal is concerning

If the persistent-0 fix had not happened, W17 would have read 86.51% (a +0.01pp WoW change). Tobacco lost 4.57pp on growing share. VN — previously a quiet anchor — slipped to 87.58%. Several mid-volume policies (Firearms, Alcohol, Invasive Cosmetic, Frauds & Scams, Combat Sports) all reversed. Strip out the measurement fix and W17 looks like a flat-to-slightly-negative week.

▶ 4-week trajectory in the refresh dataset

OMA across W14–W17 (May 6 refresh)

W14: 84.12% (drag −1.00pp ⇒ if-fixed 85.12%)
W15: 86.04% (drag −0.65pp ⇒ if-fixed 86.69%)
W16: 86.50% (drag −0.75pp ⇒ if-fixed 87.25%)
W17: 87.25% (drag −0.002pp ⇒ if-fixed 87.25%)

Notice that the W17 measured value (87.25%) almost exactly matches the W16 if-fixed value (87.25%). In other words: the W17 OMA is the W16 OMA-with-drag-removed. Real underlying quality didn't move much.

Net W14→W17: +3.13pp measured, +2.13pp if-fixed. The 1pp gap = roughly the cumulative drag that has been resolved.

⊘

Persistent 0% issue — how its resolution drove the W17 OMA gain

The single most important finding for W17: +0.7458pp out of the +0.7530pp OMA gain (99.0%) is mathematically attributable to the persistent 0%-accuracy policy drag dropping to ~zero. This page walks through the case-grain calculation step by step.

−0.7473pp

W16 drag

13 policies, 49,353 weight (0.86%)

−0.0016pp

W17 drag

2 policies, 108 weight (0.002%)

+0.7458pp

Drag reduction

≈ contribution to OMA gain

99.0%

Of total +0.7530pp gain

that the drag fix accounts for

⊘a

Step-by-step: how the 99% was computed

⚙

Counterfactual: if the persistent-0 issue had NOT been fixed, what would W17's OMA be?

We use case-grain math. For each week we have OMA = correct_cases / total_cases = acc × W. Persistent-0 policies contribute 0 correct cases to the numerator but full weight to the denominator — that's the drag. To compute the counterfactual, we keep W17's actual production volume and quality for non-zero policies, but ask: what if W16's persistent-0 drag had carried forward?

Step	Quantity	Value	How computed
1	W16 measured OMA	86.4994%	From OMA market table (May 6 refresh)
2	W16 total weight	5,761,548	Σ weight across all markets, W16
3	W16 0%-policy weight (drag)	49,353	Σ weight of 13 persistent-0 policies, W16
4	W16 if-fixed OMA	87.2467%	= 86.4994% × 5,761,548 / (5,761,548 − 49,353) — drag removed
5	W16 drag	−0.7473pp	= (W16 measured) − (W16 if-fixed) = 86.4994 − 87.2467
6	W17 measured OMA	87.2524%	From OMA market table (May 6 refresh)
7	W17 total weight	5,953,725	Σ weight across all markets, W17
8	W17 0%-policy weight	108	Σ weight of 2 carryover persistent-0 policies, W17
9	W17 if-fixed OMA	87.2539%	= 87.2524% × 5,953,725 / (5,953,725 − 108) — drag near zero
10	W17 drag	−0.0016pp	= (W17 measured) − (W17 if-fixed) — essentially gone
11	OMA gain W16→W17	+0.7530pp	= 87.2524 − 86.4994
12	Drag reduction	+0.7458pp	= W17 drag − W16 drag = (−0.0016) − (−0.7473)
13	Share of OMA gain attributable to drag fix	99.0%	= 0.7458 / 0.7530
14	Residual gain (real moderation quality)	+0.0072pp	= 0.7530 − 0.7458 — the genuine WoW quality move

All values use the May 6 refresh data, which is consistent across W14–W17 (no cross-pull confound). Case-grain math: numerator is OMA × total_weight (correct cases); denominator is total_weight. Persistent-0 policies contribute 0 to the numerator, so removing them from the denominator equals "fixing the drag."

⊘b

Why this framing matters for W17 storytelling

⚠

If you only report "OMA +0.75pp WoW" you'll mislead stakeholders

A naive read says "moderation quality jumped 0.75pp" — but only 0.0072pp of that is genuine quality change. The other 99% is a measurement-pipeline correction that cleared a long-standing structural artifact. Stakeholders need to know: (a) the real quality signal is essentially flat, (b) several markets/policies actually regressed (VN, Tobacco, Firearms, Alcohol, Invasive Cosmetic), (c) the new W17 baseline (87.25%) is structurally higher than W14–W16 because the drag is gone, not because moderation got better.

✓

The good news: the persistent-0 issue itself appears to be resolved

Of the 13 historic persistent-0% policies, only 2 remain at 0% in W17 (Suspected Youth Sexual Abuse - Depiction, Human Exploitation - Risk) and both with negligible weight (54 each, sample 1). The other 11 either reported non-zero accuracy or stopped appearing entirely. This is a structural fix worth confirming with the data team, but on the data we have it looks like the measurement issue is gone.

▶ Why the math gives 99%, not 100%

Decomposing the +0.7530pp into its two components

The W16→W17 OMA change can be exactly decomposed into:

Drag reduction: +0.7458pp (99.0%) — this is the mechanical effect of removing the 0%-policy weight from the denominator.
Real underlying quality change: +0.0072pp (1.0%) — the change in the if-fixed OMA from W16 (87.2467%) to W17 (87.2539%). This represents whatever genuine quality move happened in non-persistent-0 policies.

The 1.0% residual is a tiny positive — the visible market and policy moves (VN, Tobacco, Firearms, Alcohol negatives offset by MENA1, BD, SSA, KR positives) net to roughly +0.01pp at the global level. Most of the visible "shift-share" activity in the Markets and Top Policies tabs cancels out.

Note on precision: 99.0% is computed as 0.7458 / 0.7530 = 0.9904. The exact share depends on how you anchor the counterfactual (W16 drag carried forward at W17 weights vs other constructions), but all reasonable choices give 95–100%. The story is robust: the headline gain is overwhelmingly the drag fix, not real quality.

▶ Alternative attribution: per-policy contribution to drag reduction

Which specific policies contributed most to the +0.7458pp?

Each historic persistent-0 policy stopped dragging in W17. Roughly proportional to their W16 weight, the contribution of each to the +0.7458pp drag-reduction was:

Adult Sexual Abuse: W16 weight 8,633 → contributed ~+0.131pp to drag reduction
Suspected Youth Sexual Abuse - Depiction: W16 weight 8,151 → ~+0.123pp (still 54 in W17, so partial)
Animal Abuse & Graphic Content: W16 weight 8,127 → ~+0.123pp
Youth Physical Abuse, Assault & Neglect: W16 weight 6,069 → ~+0.092pp
Graphic Content: W16 weight 5,177 → ~+0.078pp
Suicide & NSSI - Highly Harmful: W16 weight 3,917 → ~+0.059pp
Highly Harmful Adult Sexual Abuse - Visual Depiction: W16 weight 3,271 → ~+0.050pp
Human Exploitation - Risk: W16 weight 1,983 → ~+0.030pp (still 54 in W17)
Other 5 smaller policies: ~+0.062pp combined

These are approximate — the exact decomposition assumes each policy's weight was removed from the denominator at W16's anchor accuracy. CSAM/CSAE categories (Adult Sexual Abuse, Youth Sexual Abuse — multiple variants, Highly Harmful Adult Sexual Abuse, Human Exploitation) account for the majority of the recovery.

▶ What to confirm with the data team

Three questions before reporting W17 externally

Did the metric pipeline change? Ask whether sampling logic, ground-truth labeling, or filter rules changed between the May 4 and May 6 pulls. The categories that left the persistent-0 list all dropped simultaneously — consistent with a single upstream change rather than 11 independent quality improvements.
Will it stick? If it's a real fix, expect W18 to also show drag near zero. If it's a sampling artifact, expect drag to come back. Track this in W18.
Is the new W17 baseline (87.25%) the right anchor for going forward? Comparing W18 against W17 measured = 87.25% would be apples-to-apples (both have low drag). Comparing W18 against W16 measured = 86.50% would mix in the drag fix. Be consistent.

▶ Cross-week drag table (refresh data)

Persistent-0 drag week-by-week

Week	OMA measured	0% weight	% of total	OMA if fixed	Drag
W14	84.12%	54,521	1.17%	85.12%	−1.00pp
W15	86.04%	42,510	0.75%	86.69%	−0.65pp
W16	86.50%	49,353	0.86%	87.25%	−0.75pp
W17	87.25%	108	0.002%	87.25%	−0.00pp

Note: across W14–W16 the drag stayed in the −0.65 to −1.00pp band. W17 collapsed to essentially zero. The sharp single-week change strongly suggests a discrete upstream event (filter change, label correction, or sampling logic update), not 11 independent quality improvements.

By market — top contributors

⚠

Important context: 99% of the +0.75pp headline came from the persistent-0 fix, not from these market moves

The market-level shift-share below sums to roughly the global Δ, but most of the activity is offsetting (gainers cancel draggers). The real story is the measurement fix; the visible market churn is largely noise around a near-flat underlying baseline.

▲

EMEA contributes 61% of visible movement — concentrated in MENA1 + BD-equivalent

MENA1 +6.41pp (54% of gain), SSA +7.27pp (37%), BD +6.71pp (45%, APAC). These are 6–7pp single-week jumps — investigate whether structural fixes drove them or whether last week's lows were sample anomalies.

⚠

VN flipped to dragger — biggest single-market reversal

VN went 91.86% → 87.58% (−4.28pp accuracy) on growing share. Total drag −0.29pp. Previously a quiet anchor (W15 90.16%, W16 91.86%). Worth investigating whether Vietnamese moderation cohort changed, content trends shifted, or this is sample-driven oscillation.

Market	Region	Acc W16	Acc W17	Δ Acc	Wt W16	Wt W17	Rate	Weight	Inter	Total	% of Δ
MENA1	EMEA	85.21%	91.62%	+6.41	7.41%	6.08%	+0.475	−0.017	−0.052	+0.407	−54%
BD	APAC	85.12%	91.83%	+6.71	5.07%	5.08%	+0.341	−0.000	+0.001	+0.341	−45%
SSA	EMEA	76.15%	83.41%	+7.27	3.67%	3.39%	+0.267	+0.029	−0.020	+0.276	−37%
MX	AMS	81.80%	84.17%	+2.37	4.91%	5.20%	+0.117	−0.014	+0.007	+0.110	−15%
KR	APAC	89.26%	94.64%	+5.38	1.92%	1.94%	+0.103	+0.001	+0.001	+0.105	−14%
BR	AMS	84.34%	85.82%	+1.48	4.43%	4.54%	+0.066	−0.002	+0.002	+0.065	−9%
MENA2	EMEA	83.19%	84.71%	+1.52	4.55%	4.88%	+0.069	−0.011	+0.005	+0.063	−8%
Top-7 positive subtotal									+1.367	−181%
VN	APAC	91.86%	87.58%	−4.28	7.08%	7.88%	−0.303	+0.043	−0.034	−0.295	+39%
DE	EMEA	80.81%	77.59%	−3.21	3.47%	3.69%	−0.111	−0.018	−0.001	−0.130	+17%
EN	EMEA	89.14%	86.06%	−3.08	3.85%	3.53%	−0.119	−0.008	+0.010	−0.117	+16%
IT	EMEA	92.22%	87.97%	−4.24	2.64%	2.72%	−0.112	+0.005	−0.003	−0.111	+15%
LATAM	AMS	82.52%	81.99%	−0.53	8.35%	8.86%	−0.044	−0.020	−0.003	−0.067	+9%
Top-5 negative subtotal									−0.720	+96%

Region totals: EMEA +0.46pp (61% of gain), APAC +0.24pp (32%), AMS +0.06pp (7%). EMEA is broad: MENA1 + SSA gain offset by DE + EN + IT regression. APAC: BD/KR gain offsets VN reversal. Many of these 5–7pp single-week swings are likely noise that will partially reverse next week.

▶ EMEA: gain is from MENA1 + SSA, offset by DE/EN/IT regressing

EMEA's +1.28pp accuracy / +0.46pp contribution is internally noisy

Three EMEA markets jumped 5–7pp (MENA1, SSA, FR), but three others dropped 3–4pp (DE, EN, IT). Net is positive but with high cross-market churn. Worth noting:

DE drop (80.81% → 77.59%) on growing share is a fresh concern, not seen in W16
EN drop (89.14% → 86.06%) reverses some of the W15 recovery
IT drop (92.22% → 87.97%) — IT was previously a quiet anchor

The fact that 3 large EMEA markets reverse simultaneously while 3 others jump is unusual — could suggest a moderation cohort rotation or policy interpretation change that affected some markets favorably and others adversely.

▶ VN reversal — new dragger

VN: 91.86% → 87.58% on growing share

VN was previously stable. W17 reverses 4.28pp on top of a 0.80pp share gain. Mechanism: rate effect −0.30pp, weight effect +0.04pp (slight benefit since VN still above mean), interaction −0.03pp. Net −0.29pp drag.

This is the single largest single-market drag in W17. Not a continuation of any prior trend — fresh emergence. Worth investigating whether Vietnamese-language moderation cohort/policy changed, or whether content trends introduced more borderline cases.

By policy title — top contributors

▲

Top gainers: Dangerous Trends + Recreational Drugs + Personal Info recovered hard

Dangerous Trends - Serious Harm: 62.57%→73.15% (+10.58pp on 5% share, +0.50pp contribution). Recreational Drugs: 68.43%→83.23% (+14.80pp, +0.21pp). Personal Information - High Risk: 52.84%→72.40% (+19.56pp, +0.19pp). All three were W16 lows that mean-reverted — typical of categories that oscillate around a stable baseline.

⚠

Major drags: Tobacco, Firearms, Alcohol, Invasive Cosmetic, Combat Sports — heavy reversals

Tobacco 80.32%→75.76% (−4.57pp, drag −0.37pp, biggest single-policy drag). Firearms & Explosive Weapons 73.76%→64.97% (−8.79pp, drag −0.34pp). Alcohol 74.79%→66.79% (−8.00pp, drag −0.32pp). Invasive Cosmetic 84.99%→76.45% (−8.53pp, drag −0.30pp; reversed last week's +22.73pp). Combat Sports 82.41%→75.50% (−6.92pp, drag −0.26pp). These are mid-volume categories that swing 5–10pp WoW; treat as oscillation, not signal.

Policy	Acc W16	Acc W17	Δ Acc	Wt W16	Wt W17	Rate	Weight	Inter	Total	Sample
Dangerous Trends - Serious Harm	62.57%	73.15%	+10.58	5.15%	4.64%	+0.545	+0.009	−0.054	+0.500	638 → 650
Recreational Drugs - Depiction and Promotion	68.43%	83.23%	+14.80	0.69%	1.24%	+0.102	+0.024	+0.081	+0.207	57 → 95
Personal Information - High Risk	52.84%	72.40%	+19.56	0.68%	1.35%	+0.133	−0.075	+0.131	+0.188	62 → 100
Violent Behaviors	55.50%	61.59%	+6.09	1.67%	0.93%	+0.102	+0.064	−0.045	+0.121	154 → 111
Hate Speech and Hateful Ideologies	49.77%	64.87%	+15.10	0.77%	0.65%	+0.116	+0.018	−0.018	+0.116	157 → 124
High Risk Driving	60.65%	64.66%	+4.01	2.67%	2.97%	+0.107	−0.010	+0.012	+0.108	410 → 427
Highly Imitable Acts	51.64%	57.15%	+5.51	2.25%	2.62%	+0.124	−0.040	+0.020	+0.098	174 → 178
Top-7 positive subtotal									+1.338	—
Tobacco and Nicotine ★ heaviest policy	80.32%	75.76%	−4.57	10.69%	11.67%	−0.488	+0.220	−0.045	−0.375	1836 → 1764
Firearms & Explosive Weapons	73.76%	64.97%	−8.79	3.80%	3.56%	−0.334	−0.030	+0.021	−0.336	484 → 473
Alcohol	74.79%	66.79%	−8.00	4.12%	4.68%	−0.330	+0.041	−0.045	−0.316	421 → 398
Invasive Cosmetic Procedures	84.99%	76.45%	−8.53	2.24%	1.33%	−0.191	−0.190	+0.078	−0.303	189 → 158
Combat sports, Extreme Sports, and Stunts	82.41%	75.50%	−6.92	4.50%	4.93%	−0.311	+0.078	−0.030	−0.263	478 → 451
Frauds & Scams	61.43%	53.10%	−8.33	1.93%	2.59%	−0.161	−0.045	−0.055	−0.234	215 → 299
Moderate Bullying	47.48%	44.58%	−2.90	1.63%	2.43%	−0.047	−0.121	−0.023	−0.204	172 → 245
Top-7 negative subtotal									−2.031	—

Net visible policy moves: ≈ −0.69pp. The top gainers (+1.34pp) don't fully offset the top draggers (−2.03pp). Most of the headline +0.75pp comes from outside this table — i.e., from the persistent-0 fix.

Tobacco & Nicotine — heaviest policy reversed

▼

Tobacco fell 4.57pp on growing share — biggest single-policy drag in W17

Tobacco accuracy: 80.32% → 75.76% (−4.57pp). Share: 10.69% → 11.67% (+0.98pp). More volume into a now-declining segment. Total contribution: −0.375pp. Without this drag, the underlying real-quality move (excluding the persistent-0 fix) would have been +0.38pp instead of +0.01pp.

−4.57pp

Accuracy

80.32% → 75.76%

+0.98pp

Share of mix

10.69% → 11.67%

+6.6%

Volume (raw)

243K → 259K

−0.375pp

Contribution to OMA

biggest single-policy drag

Week	Acc	WoW Acc	Wt %	WoW Wt	Volume	Sample
W14 (Apr 4–10)	76.02%	—	11.09%	—	223,360	513
W15 (Apr 11–17)	78.86%	+2.84	12.20%	+1.11	288,421	1,883
W16 (Apr 18–24)	80.32%	+1.46	10.69%	−1.51	243,222	1,836
W17 (Apr 25–May 1)	75.76%	−4.57	11.67%	+0.98	259,300	1,764

Sample is large and stable (~1,800 across W15–W17) — the W17 accuracy drop is not noise. Tobacco has now erased both prior weeks' gains, dropping below W14's level. Combined with growing share, this is a real concern.

▶ Tobacco's leverage on OMA — why a 4.57pp drop matters this much

The math of being the heaviest policy

Tobacco is consistently the largest policy by share (10.7–11.7% in W14–W17). Even small accuracy moves translate to meaningful OMA contributions:

1pp accuracy change × 11% weight = ~0.11pp OMA contribution
This week's −4.57pp accuracy alone produced a −0.49pp rate effect
Growing share (+0.98pp) into a below-mean policy slightly cushioned via weight effect (+0.22pp)
Net: −0.375pp drag (largest single-policy drag in the table)

Status of W16 action items

Tracking the open items from W16's RCA against W17 data, in line with the week-over-week continuity principle.

W16 action follow-up

1"Persistent 0% policies — measurement-pipeline drag" (W16 P0) — RESOLVED IN W17. The drag dropped from −0.75pp to −0.002pp. 11 of 13 historic 0% policies left the list. This single resolution accounts for 99% of the W17 OMA gain. Confirm with data team whether it's a real fix or sampling artifact.

2"Investigate Violent Behaviors triple-negative" (W16 P0) — RECOVERED. Violent Behaviors: 55.50% → 61.59% (+6.09pp), share contracted. Total contribution flipped from drag to +0.12pp gain.

3"Personal Information - High Risk dropped 31pp" (W16 P0) — RECOVERED. 52.84% → 72.40% (+19.56pp). Confirms the W16 reading was sample-noise-driven.

4"ID + LATAM combined drag" (W16 P1) — REFRAMED. ID is at 89.25% in W17 (essentially flat), no longer a quality concern. LATAM still slightly soft (W17 81.99%, mild drag).

5"Verify offsets are real, not artifacts" (W16 P0 data integrity) — CONFIRMED ARTIFACT. Invasive Cosmetic dropped 8.53pp in W17, undoing W16's +22.73pp jump. Disparaging Religion etc. all show similar pattern. These are confirmed sample-noise categories.

☑

Net status: most W16 action items resolved or weakened by the W17 data

The biggest W16 finding (persistent-0% drag) appears resolved this week — but the resolution itself is the W17 story. The other W16 concerns (ID, Violent Behaviors, etc.) are also less prominent in W17 data. Two new concerns this week: VN reversal (−4.28pp, fresh) and Tobacco regression (−4.57pp, 2nd consecutive direction-flip).

Recommended actions for W18 prep

1Confirm whether the persistent-0% fix is real with the data team. The drag dropped −0.75pp → −0.002pp in one week. Ask: filter rule change, label correction, sampling logic update, or upstream pipeline change? Knowing the cause tells us whether the W18 baseline (87.25%) holds or if the drag returns.

2Investigate Tobacco reversal urgently. Tobacco is the heaviest policy and has now dropped 4.57pp on growing share. This is real signal (sample 1,764), not noise. If it persists, W18 will see a meaningful headline drag.

3Investigate VN reversal. Vietnamese-language moderation went 91.86% → 87.58% on growing share. Fresh emergence, not seen in W16. Check labeling cohort, content trends.

4Reframe W17 reporting externally. The headline "OMA +0.75pp" is misleading without context. When sharing with stakeholders, lead with: "OMA +0.75pp, but ~99% of this comes from a measurement-pipeline fix, not from quality improvement. Underlying real-quality move is +0.0072pp."

5Set W17 (87.25%) as the new baseline. Going forward, W18+ comparisons should anchor on W17 measured (low drag) rather than W16 measured (high drag) to avoid mixing the measurement fix into ongoing trends.

Global OMA W16

86.00%

▼ 0.04pp

from 86.04% · ≈ flat

EMEA · Wt 32.85%

83.89%

▲ +0.62pp

−578% offset of global Δ

AMS · Wt 21.57%

83.10%

▼ 0.45pp

373% of global Δ

APAC · Wt 45.57%

88.91%

▼ 0.14pp

304% of global Δ

◈ Overview −0.04pp

Methodology & headline

◉ Markets 12 dragging

ID #1 at 842%

★ Top Policies 10+ severe

Violent Behaviors leads

☺ Tobacco +0.33pp

heaviest policy, recovering

⛈ Tobacco × Markets EMEA 54%

where the gain came from

◢ ID Deep-dive −3.41pp

Indonesia by policy

⚡ Actions 7

Investigate the offsets

Methodology & headline summary

W16 (Apr 18–24) vs W15 (Apr 11–17). Each segment's contribution to OMA is decomposed into rate, weight, and interaction effects. Because the global Δ is tiny (−0.04pp), individual segment % of Δ figures can balloon — focus on absolute pp contributions to gauge true scale.

Rate effect = GWt_W15 × (Acc_W16 − Acc_W15) — pure accuracy change at prior weight
Weight effect = (GWt_W16 − GWt_W15) × (Acc_W15 − Global Acc_W15) — mix shift relative to global mean
Interaction = (GWt_W16 − GWt_W15) × (Acc_W16 − Acc_W15) — joint change

⚙

The headline is misleading — the underlying mix is highly turbulent

86.04% → 86.00% looks like a non-event. But: Violent Behaviors fell −19.32pp ?, Personal Information - High Risk fell −31.01pp ?, Disparaging Religion fell −75.96pp ?. They were offset by equally severe gains: Adult Sexualized Behaviors recovering +3.87pp on heavy weight, Tobacco +2.16pp continuing its rebound, Invasive Cosmetic +22.73pp. Net ≈ 0.

⚠

Geographic drag concentrated in ID + LATAM = 1648% of global Δ

ID (−3.40pp accuracy, 842% of Δ) and LATAM (−3.03pp, 806% of Δ) each contributed more than 8× the global decline. SSA, PH, MENA1, BD, JP, TR all add 200%+ each. ID is now in its 3rd consecutive WoW decline: W13→W14 −3.16pp, W14→W15 +0.58pp, W15→W16 −3.40pp.

▲

Adult Sexualized Behaviors + Tobacco delivered +0.77pp combined offset — what saved the headline

Adult Sexualized Behaviors (+3.87pp accuracy on 5.0% weight, contribution +0.43pp) and Tobacco & Nicotine (+2.16pp on 10.81% weight, contribution +0.33pp) together offset 2189% of the global decline. MENA2 recovered +6.84pp accuracy regionally (+0.30pp). Without these three, the headline would read closer to −1.5pp.

▶ Why "% of Δ" looks extreme this week

Small denominator, large numerator

Global Δ = −0.04pp. When a segment contributes −0.30pp (a normal magnitude), it's ~750% of the global change. This is mathematically correct but visually scary.

The right interpretation: treat absolute pp contributions as the signal. Anything > 0.10pp is materially large in absolute terms — and the W16 table has 10+ such items on each side, indicating high underlying volatility.

If next week one of the offsets fails to repeat (e.g., Tobacco continues recovering but Adult Sexualized Behaviors regresses), the headline could swing 1–2pp easily. The current calm is fragile.

▶ Data integrity flag — several policies report 0% accuracy

0%-accuracy policies need verification

Four policies show 0% accuracy in both W15 and W16 yet still contribute meaningfully to the global delta via weight changes:

Animal Abuse & Graphic Content: 0% → 0%, weight 0.13% → 0.39% (contribution −0.225pp)
Youth Physical Abuse, Assault & Neglect: 0% → 0%, weight 0.22% → 0.38%
Graphic Content: 0% → 0%, weight grew
Adult Sexual Abuse: 0% → 0%, weight 0.41% → 0.50%

A persistent 0% on a non-trivial sample is implausible as a true accuracy figure. Likely causes: data filter excluding all "approve" cases for these policies, sampling artifact, or definitional change. Verify before treating these as real signal.

By market — top contributors

⚠

12 markets dragging at 100%+ each — but 8 markets offset more than the entire decline

Drag side: ID 842%, LATAM 806%, SSA 440%, PH 398%, MENA1 347%, BD 294%, JP 252%, TR 241%. Offset side: MENA2 +6.84pp recovery, VN grew on improvement, BR +3.43pp, IT +5.37pp.

Market	Acc W15	Acc W16	Δ Acc	Wt W15	Wt W16	Rate	Weight	Inter	Total	% of Δ
ID	90.22%	86.81%	−3.40	8.70%	8.85%	−0.296	+0.006	−0.005	−0.295	842.3%
LATAM	85.54%	82.51%	−3.03	8.79%	9.24%	−0.266	−0.002	−0.014	−0.282	805.8%
SSA	79.43%	75.32%	−4.11	3.37%	3.52%	−0.139	−0.010	−0.006	−0.154	439.8%
PH	87.45%	83.99%	−3.45	4.15%	3.95%	−0.143	−0.003	+0.007	−0.139	397.7%
MENA1	86.79%	84.98%	−1.81	5.60%	7.50%	−0.101	+0.014	−0.034	−0.122	347.2%
BD	86.85%	84.98%	−1.87	5.28%	5.68%	−0.099	+0.003	−0.007	−0.103	294.1%
JP	92.01%	90.48%	−1.53	2.28%	1.07%	−0.035	−0.072	+0.018	−0.088	252.1%
TR	88.70%	84.36%	−4.34	1.97%	1.92%	−0.085	−0.001	+0.002	−0.084	241.0%
ES	89.25%	82.39%	−6.87	1.08%	1.00%	−0.074	−0.003	+0.006	−0.071	203.6%
MX	82.78%	81.87%	−0.91	4.65%	5.26%	−0.042	−0.020	−0.006	−0.068	193.8%
Top-10 negative subtotal									−1.408	4017.4%
MENA2	77.79%	84.63%	+6.84	4.39%	4.00%	+0.300	+0.032	−0.027	+0.305	−870.2%
VN	90.16%	91.69%	+1.53	6.84%	7.73%	+0.105	+0.037	+0.014	+0.155	−443.5%
BR	83.73%	87.16%	+3.43	4.46%	4.75%	+0.153	−0.007	+0.005	+0.150	−429.1%
IT	86.41%	91.78%	+5.37	2.47%	2.39%	+0.133	+0.000	−0.005	+0.128	−364.9%
MY	83.36%	90.47%	+7.11	1.67%	1.52%	+0.119	−0.004	−0.007	+0.108	−309.6%
Top-5 positive subtotal									+0.846	−2417.3%

JP weight collapse (2.28% → 1.07%, −1.21pp) is the largest single mix-shift event among draggers. Despite the small accuracy decline (−1.53pp), the weight effect (−0.072pp) is unusually large because JP W15 accuracy (92%) was well above the global mean — shrinking it removes a high-quality contributor from the mix.

▶ ID #1 dragger — 4 weeks of consecutive declines

ID: a recurring pattern, not a one-off

ID OMA accuracy fell from 90.22% → 86.81% in W16 (−3.40pp). This is the third significant decline in four weeks: W13 92.79% → W14 89.63% → W15 90.22% → W16 86.81%. Cumulative drop: −5.97pp from W13 baseline.

The market is also gaining global share (8.70% → 8.85%) while accuracy worsens — the interaction effect is small but negative. Suggests either Indonesia-specific moderation quality is degrading, or the additional volume is concentrated in harder-to-judge content.

Action: Request structured Indonesia retrospective. The trend is now clear enough to need a dedicated investigation.

▶ LATAM #2 dragger — what's behind the −3.03pp accuracy drop

LATAM: pure rate-effect dominance

LATAM accuracy fell from 85.54% → 82.51%, a 3.03pp drop. Weight grew slightly (8.79% → 9.24%) which marginally amplified damage via interaction (−0.014pp).

The rate component (−0.266pp) is by far the largest driver. Investigate whether a regional policy change, language model update, or sampling shift hit the LATAM portfolio specifically in W16.

▶ MENA2 #1 offset — +6.84pp recovery, is it durable?

MENA2: bounce-back from a chronic underperformer

MENA2 accuracy jumped 77.79% → 84.63% (+6.84pp). Weight contracted slightly (4.39% → 4.00%), so this is overwhelmingly a rate story.

Looking back, MENA2 has been a problem region — this single-week recovery is the largest market gain in the dataset. Whether it's durable depends on whether the W14–W15 issue was a one-off (sample anomaly, transient labeling problem) or whether deeper calibration work lifted the floor.

Confirm with the regional team whether structural changes were made.

By policy title — top contributors

⚠

Multiple policies dropped 10–35pp accuracy this week

Severe single-policy drops include Disparaging Religion (−75.96pp, 89.07%→13.11%), Light Body Exposure (−36.83pp), Personal Information - High Risk (−31.01pp), NSA Exceptions - Mature (−22.85pp), Suicide & NSSI (−21.99pp), Violent Behaviors (−19.32pp). Even with small weights, these aggregate fast.

▲

Tobacco & Nicotine — 2nd-heaviest policy (10.81%) acted as a stabilizer for the second straight week

Tobacco accuracy continued recovering: 79.04% → 81.20% (+2.16pp) and its share contracted 12.23% → 10.81% (−1.42pp). Because Tobacco accuracy is well below the global mean (−7.0pp from 86.04%), shrinking its weight is a strong net positive. Combined contribution: +0.334pp (offsetting 952% of the global Δ).

▶ Tobacco & Nicotine — deep dive into a sustained recovery

Tobacco's outsized impact

At 10.81% of W16 sample weight, Tobacco & Nicotine is the 2nd-largest single policy (after Youth Regulated Goods at 12.29%). Its accuracy moves the global needle directly.

Multi-week trajectory: clear recovery from W14 trough

W13: 84.66% acc, 13.23% wt — recent peak
W14: 76.07% acc, 11.06% wt — −8.59pp single-week collapse
W15: 79.04% acc, 12.23% wt — partial recovery (+2.97pp)
W16: 81.20% acc, 10.81% wt — continued recovery (+2.16pp)

Tobacco quality has rebounded ~5.13pp from the W14 trough, but is still 3.46pp below its W13 baseline. The trajectory is clearly positive.

Shift-share decomposition (W16 vs W15)

Rate effect: +0.264pp — accuracy gain at prior weight
Weight effect: +0.099pp — shrinking a below-mean segment helps
Interaction: −0.031pp — small, accuracy ↑ while weight ↓
Total: +0.334pp (≈ −952% of the global Δ)

What to watch

Sample volume: 1,878 → 1,537 cases (−18%). Some of the weight contraction may reflect a sampling change. Verify the methodology hasn't changed.

Below-mean accuracy persistence: at 81.20%, Tobacco is still 4.80pp below the global mean. If volume rebounds before quality recovers further, the helpful weight-effect direction will reverse — Tobacco could flip back to a major drag.

Action: Lock in the recovery — confirm whether the W14 trough was an isolated event and whether the 3-week rebound has structural support, not just regression-to-mean.

Policy	Acc W15	Acc W16	Δ Acc	Wt W15	Wt W16	Rate	Weight	Inter	Total	% of Δ
Violent Behaviors ?	76.78%	57.46%	−19.32	1.47%	1.77%	−0.284	−0.029	−0.058	−0.371	1059.4%
Gambling - Depiction and Promotion	69.68%	59.89%	−9.79	1.51%	2.07%	−0.148	−0.092	−0.054	−0.293	836.9%
Dangerous Trends - Serious Harm	68.09%	63.14%	−4.95	4.83%	4.95%	−0.239	−0.022	−0.006	−0.266	758.7%
Personal Information - High Risk ?	84.12%	53.12%	−31.01	0.67%	0.71%	−0.208	−0.001	−0.011	−0.220	627.4%
Youth Non-Sexualized Nudity	76.77%	74.61%	−2.16	4.86%	5.60%	−0.105	−0.069	−0.016	−0.189	540.2%
Youth Body Exposure - Light (4-17)	40.38%	37.08%	−3.30	0.67%	0.98%	−0.022	−0.146	−0.010	−0.178	507.6%
Youth Regulated Goods and Services	73.69%	72.65%	−1.04	12.10%	12.29%	−0.126	−0.023	−0.002	−0.151	430.1%
Light Body Exposure ?	70.00%	33.17%	−36.83	0.08%	0.30%	−0.029	−0.036	−0.082	−0.147	419.9%
High Risk Driving	64.91%	60.73%	−4.18	2.33%	2.53%	−0.097	−0.041	−0.008	−0.147	419.7%
Regulated Goods - Marketing/Trade	47.96%	48.53%	+0.57	1.44%	1.80%	+0.008	−0.135	+0.002	−0.129	367.7%
Top-10 negative subtotal									−2.090	5967.7%
Adult Sexualized Behaviors	54.88%	58.75%	+3.87	5.77%	5.00%	+0.224	+0.239	−0.030	+0.433	−1236.1%
Tobacco and Nicotine ★ 2nd heaviest policy	79.04%	81.20%	+2.16	12.23%	10.81%	+0.264	+0.099	−0.031	+0.334	−952.5%
Invasive Cosmetic Procedures ?	65.14%	87.86%	+22.73	1.30%	2.26%	+0.295	−0.201	+0.219	+0.313	−894.6%
Combat sports, Extreme Sports & Stunts	75.02%	82.54%	+7.51	4.04%	4.28%	+0.304	−0.026	+0.018	+0.296	−844.0%
Moderate Bullying	48.14%	50.83%	+2.69	2.26%	1.62%	+0.061	+0.247	−0.017	+0.290	−826.3%
Top-5 positive subtotal									+1.666	−4753.5%

Severe single-policy regressions: Disparaging Religion 89.07%→13.11% (−75.96pp) ?; Suicide & NSSI 57.37%→35.38% (−21.99pp); NSA Exceptions - Mature 53.62%→30.76% (−22.85pp); Adult Sexual Solicitation 57.81%→46.33% (−11.47pp). These weren't in the top-10 by total contribution because their weights are tiny (<1%), but the rate magnitudes warrant individual investigation. Several policies report 0% accuracy in both weeks (Animal Abuse, Youth Physical Abuse, Graphic Content, Adult Sexual Abuse) ? — likely a data integrity issue, not real signal.

▶ Violent Behaviors #1 — −19.32pp drop on growing weight

Violent Behaviors: triple-negative, all three effects against

Accuracy collapsed 76.78% → 57.46% (−19.32pp). Weight grew (1.47% → 1.77%), so the additional volume entered a now-failing segment — interaction effect (−0.058pp) compounds the damage.

This is one of the largest reputational-risk policy categories. A 19pp accuracy drop combined with growing volume is a serious signal — escalate immediately.

▶ Personal Information - High Risk — −31pp single week

Personal Info High Risk: catastrophic single-week drop

Accuracy fell 84.12% → 53.12% (−31.01pp) on stable weight (~0.69%). The pure rate effect (−0.208pp) entirely explains this row's contribution.

A 31pp drop on a privacy-related, high-stakes policy is alarming. Possible drivers: policy interpretation change, new content vector (e.g., new types of doxxing patterns), or model/labeler retraining gone wrong. Investigate before W17.

▶ Adult Sexualized Behaviors — +0.43pp top offset, what drove it

A.S.B: the largest single offset

Adult Sexualized Behaviors recovered 54.88% → 58.75% (+3.87pp). Weight contracted 5.77% → 5.00% (−0.77pp). Both effects are favorable: rate (+0.224pp) and weight (+0.239pp) — shrinking a below-mean segment helps.

This single policy contributed +0.433pp — by itself, more than 12× the global Δ in the offsetting direction. Worth understanding what drove the accuracy jump (calibration, content shift, sampling) since A.S.B is a chronic problem area.

▶ Disparaging Religion — 89.07% → 13.11% (−75.96pp)

Disparaging Religion: most severe rate drop

This policy collapsed by 75.96pp on a tiny sample weight (~0.08–0.13%). Global impact is "only" −0.099pp (246%), but the rate magnitude is unprecedented.

Almost certainly a sample/policy/labeling artifact — a 76pp single-week swing is implausible as a true accuracy change. Verify the W16 sample is representative; if it is, escalate as a critical operational failure.

Tobacco & Nicotine — dedicated deep-dive

☺

Tobacco contributes +0.334pp — the 2nd largest single-policy offset in W16

Accuracy continued recovering (79.04% → 81.20%, +2.16pp) for the second straight week. Both share and absolute volume contracted, and because Tobacco accuracy sits well below the global mean (−4.84pp), shrinking its weight further amplifies the global benefit.

+2.16pp

Accuracy

79.04% → 81.20%

−1.42pp

Share of mix

12.23% → 10.81%

−31.4%

Volume (raw)

288K → 198K

+0.334pp

Total contribution

−952% of global Δ

⚠

Volume dropped 31% — but share dropped only 11.6% (relative). Both are real shrinkage.

A common reading error: "fewer Tobacco cases" doesn't automatically mean "smaller share". Whole-pipeline volume also contracted (5.65M → 4.54M, −19.6%), so part of Tobacco's volume drop is just the pipeline shrinking. The genuine share contraction is −1.42pp (12.23% → 10.81%, relatively −11.6%) — Tobacco shrunk faster than the pipeline, so its mix share really did fall.

Week	Acc	WoW Acc	Wt %	WoW Wt	Volume (raw)	WoW Vol	Annot. sample
W13 (Mar 28–Apr 3)	84.66%	+6.27	13.23%	—	262,884	—	796
W14 (Apr 4–10)	76.07%	−8.59	11.06%	−2.17	223,139	−15.1%	509
W15 (Apr 11–17)	79.04%	+2.97	12.23%	+1.17	288,581	+29.3%	1,878
W16 (Apr 18–24)	81.20%	+2.16	10.81%	−1.42	197,907	−31.4%	1,537

Net 4-week trajectory: Accuracy −3.46pp from W13 peak (84.66% → 81.20%); share −2.42pp (13.23% → 10.81%); volume −24.7% (262K → 198K). Quality has rebounded 5.13pp from the W14 trough but hasn't fully recovered to baseline.

▶ Volume vs share — why both dropped, and what that signals

Volume change: Tobacco shrank faster than the pipeline

OMA total volume W15 → W16: 5,645,117 → 4,544,936 (−19.6%).
Tobacco volume W15 → W16: 288,581 → 197,907 (−31.4%).

If Tobacco had shrunk at the same rate as the pipeline, it would have been ~232K (still 12.23% share). The fact that it dropped to 198K means Tobacco genuinely lost share — about 11.6% relative share contraction, or 1.42 percentage points absolute.

Share change: real but not dramatic

12.23% → 10.81% looks like a meaningful drop. But put in context: Tobacco's share has been oscillating in the 11–13% band for 4 weeks (W13 13.23%, W14 11.06%, W15 12.23%, W16 10.81%). The W16 figure is at the bottom of that band but not an outlier.

Why both directions help OMA right now

Tobacco accuracy (81.20%) is 4.84pp below the global mean (86.00%). For below-mean segments:

Accuracy ↑ directly lifts the global average via the rate effect.
Share ↓ removes a drag from the mix via the weight effect.
Both happening together is the ideal direction for an underperforming policy.

This is why the W16 contribution (+0.334pp) is so much bigger than what the +2.16pp accuracy gain alone would predict.

▶ Shift-share decomposition — exactly where the +0.334pp comes from

Three components, all favorable in net

Rate effect: +0.264pp = 12.23% × (+2.16pp) — pure accuracy gain at prior weight
Weight effect: +0.099pp = (−1.42pp) × (79.04% − 86.04%) — shrinking a below-mean segment helps
Interaction: −0.031pp = (−1.42pp) × (+2.16pp) — small, since accuracy ↑ while weight ↓ (opposite directions)
Total: +0.334pp

If Tobacco had only improved accuracy (+2.16pp) without the share drop, contribution would be just +0.264pp. The share contraction adds another +0.099pp — about 30% extra leverage. The interaction is small because the two factors moved in opposite directions, which limits the joint effect.

▶ Risk: when this stops being a tailwind

The favorable direction can reverse fast

Scenario A — Volume rebounds before accuracy fully recovers: If W17 sees Tobacco volume pop back to ~12% share (a ~+1.2pp share gain) while accuracy plateaus at ~81%, the weight effect flips to negative. A +1.2pp share increase × (81.0% − 86.0%) ≈ −0.06pp drag. Combined with no rate gain, Tobacco could contribute close to zero or slightly negative.

Scenario B — Accuracy recovery stalls or reverses: Tobacco hit 84.66% in W13 — that's the recent ceiling. Without structural fixes, regression to ~79% (the W15 level) is plausible. A −2pp accuracy fall × 11% weight ≈ −0.22pp drag in a single week.

Scenario C — Both reverse: Volume back to 13% AND accuracy back to 79% would create a triple-negative similar to what happened in W14 (−8.59pp accuracy collapse). The W16 +0.334pp tailwind would flip to ~−0.40pp drag — a 0.7pp swing on a single policy.

Action: Monitor whether the W14 trough was caused by an isolated event (sample anomaly, transient labeling problem) or a structural issue. The 3-week recovery looks real but is on a thin base.

▶ Annotation sample vs moderation volume — different signals

Two different "weights" — don't conflate them

The OMA dashboard has two volume-like metrics for each policy:

Weight (raw volume): total moderation traffic for that policy. W16: 197,907 cases. This represents real production decisions made by the moderation system.
Annotation sample: how many cases were sampled for human evaluation. W16: 1,537 cases. This is the OHA/OMA evaluation effort.

Tobacco annotation sample: 1,878 → 1,537 (−18.2%). Tobacco's share of total annotation samples: 6.73% → 6.76% — essentially flat. So at the evaluation/sampling level, Tobacco's representation didn't change.

The 31% volume drop is in moderation traffic, not in evaluation effort. This means: Tobacco is being moderated less in production, not just sampled less. Likely drivers: content trends, policy enforcement changes, or seasonality.

Tobacco × Markets — where the +2.16pp gain came from

EMEA · Tobacco share 47.55%

74.39%

▲ +2.16pp

+1.16pp within Tobacco · 54% of gain

AMS · Tobacco share 8.38%

67.45%

▲ +15.86pp

+0.84pp within Tobacco · 39% of gain

APAC · Tobacco share 44.06%

89.79%

▼ 0.12pp

−0.02pp within Tobacco · roughly flat

▲

EMEA delivered 54% of the Tobacco gain — but it's a single-market story

MENA2 alone drove +1.63pp of the +2.16pp Tobacco improvement (75% of gain). Tobacco accuracy in MENA2 jumped 54.25% → 86.21% (+31.96pp) ?. DE (+0.71pp) and EN (share contraction effect) added the rest. The recovery is concentrated, not broad.

⚠

APAC nearly cancelled itself out — strong gainers offset by ID + BD + KR drops

VN (+10.54pp accuracy, +6.12pp share growth in Tobacco), MY (+7.23pp), TH (+7.96pp) delivered big within-region gains. But ID ? (Tobacco share 17.17% → 10.21%, accuracy −4.47pp) alone offset −1.09pp. BD, KR, KH also dragged. Region-level result: ≈ flat.

04a

Top markets driving Tobacco's +2.16pp gain

Market	Region	Acc W15	Acc W16	Δ Acc	Wt W15	Wt W16	Δ Wt	Δ Vol%	Total	% of Tob Δ
MENA2 ?	EMEA	54.25%	86.21%	+31.96	5.40%	3.99%	−1.41	−49.3%	+1.626	−75.3%
VN	APAC	87.50%	98.04%	+10.54	2.00%	8.12%	+6.12	+178.1%	+1.373	−63.6%
DE	EMEA	58.00%	64.26%	+6.26	7.82%	6.32%	−1.50	−44.6%	+0.711	−32.9%
LATAM ?	AMS	48.89%	73.34%	+24.45	2.92%	3.52%	+0.60	−17.2%	+0.679	−31.4%
EN	EMEA	65.42%	58.59%	−6.82	9.74%	3.61%	−6.13	−74.6%	+0.589	−27.3%
MY	APAC	86.92%	94.16%	+7.23	1.11%	4.13%	+3.02	+154.8%	+0.537	−24.9%
TH	APAC	84.55%	92.51%	+7.96	1.48%	3.72%	+2.24	+72.3%	+0.420	−19.4%
BR ?	AMS	55.25%	63.94%	+8.69	2.97%	2.57%	−0.40	−40.7%	+0.319	−14.8%
Top-8 positive subtotal									+6.255	−289.6%
ID ?	APAC	88.14%	83.66%	−4.47	17.17%	10.21%	−6.96	−59.2%	−1.090	+50.4%
MENA1	EMEA	88.12%	74.00%	−14.13	5.03%	11.57%	+6.54	+57.9%	−1.040	+48.1%
SSA ?	EMEA	83.14%	65.42%	−17.72	1.31%	4.99%	+3.68	+160.8%	−0.734	+34.0%
UA	EMEA	91.61%	89.69%	−1.92	7.72%	3.60%	−4.12	−68.1%	−0.588	+27.2%
KR ?	APAC	96.04%	81.92%	−14.12	2.87%	1.14%	−1.73	−72.8%	−0.455	+21.1%
BD ?	APAC	93.47%	84.21%	−9.26	3.38%	1.01%	−2.37	−79.5%	−0.435	+20.2%
Top-6 negative subtotal									−4.342	+201.0%

"Δ Vol%" shows raw Tobacco volume change in that market W15→W16 (e.g., MENA2 −49.3% means Tobacco moderation traffic in MENA2 nearly halved). "% of Tob Δ" is each market's contribution to the +2.16pp Tobacco accuracy gain. Negative entries dragged the gain down.

04b

Why these markets had outsized impact

▶ MENA2 — single market = 75% of Tobacco's gain

MENA2: pure rate-effect explosion

Tobacco accuracy in MENA2 jumped 54.25% → 86.21% (+31.96pp). This is the largest single-market accuracy swing in the entire dataset.

Mechanism breakdown:

Rate effect: +1.73pp (5.4% within-Tobacco share × +31.96pp accuracy gain)
Weight effect: +0.35pp (share dropped 5.4%→4.0%, beneficial since MENA2 was below Tobacco's 79% mean)
Interaction: −0.45pp (share ↓ × accuracy ↑ — opposing directions cancel)
Total: +1.63pp = 75% of Tobacco's overall +2.16pp gain

Why MENA2 mattered so much: a) starting accuracy was extremely low (54%), so improvement headroom was huge; b) it carried a meaningful share (5.4%) of Tobacco moderation, so each accuracy point of improvement multiplied; c) volume contracted 49% so the additional gain wasn't diluted.

Verify: a 32pp single-week swing in one market is implausible as organic improvement. Likely candidates: labeling guideline change, sample composition shift, or regional moderator team retrained. Investigate before W17.

▶ VN — Tobacco volume tripled while accuracy rose

VN: ideal-direction expansion

Tobacco volume in VN: 5,776 → 16,065 (+178%). Within-Tobacco share rose 2.00% → 8.12% (+6.12pp). Accuracy rose 87.50% → 98.04% (+10.54pp).

This is the best-case pattern: more volume, higher accuracy. All three shift-share components are positive: rate (+0.21pp), weight (+0.52pp — VN was ABOVE Tobacco mean, so growing helps), interaction (+0.64pp).

Why this matters: if VN sustains 98% Tobacco accuracy at 8% share, it becomes a structural Tobacco anchor. Track whether the volume surge is a one-off (data backlog catch-up?) or new baseline.

▶ ID — biggest drag on Tobacco's gain (−50% of Tobacco Δ)

ID: the largest single drag within Tobacco

ID accounted for 17.17% of Tobacco moderation in W15 — the largest single market. In W16: share crashed to 10.21% (−6.96pp absolute, −41% relative) AND accuracy fell from 88.14% → 83.66% (−4.47pp).

Mechanism: Volume dropped 59% (49,542 → 20,210). Accuracy dropped on top of the volume drop. ID was above Tobacco's mean, so losing share also hurt via weight effect.

Rate effect: −0.77pp
Weight effect: −0.63pp (above-mean segment shrinking is bad)
Interaction: +0.31pp (partially offset because share ↓ × accuracy ↓ aligns negatively, which mathematically becomes positive)
Total: −1.09pp = removed 50% of what could have been a much bigger Tobacco gain

Tobacco's +2.16pp gain would have been roughly +3.25pp without the ID drag. Indonesia is now the single largest variable in Tobacco's trajectory.

▶ MENA1 — accuracy collapsed while volume surged (worst pattern)

MENA1: textbook triple-negative

MENA1 Tobacco share more than doubled (5.03% → 11.57%, +6.54pp absolute). Accuracy collapsed (88.12% → 74.00%, −14.13pp). Volume +57.9% (14,502 → 22,896).

This is the worst possible direction for a Tobacco market: more volume into a now-failing segment.

Rate effect: −0.71pp (5% share × −14.13pp accuracy)
Weight effect: +0.59pp (share grew, but MENA1 was ABOVE Tobacco mean at 88% W15 — so partially offset)
Interaction: −0.92pp (share ↑ × accuracy ↓ — the worst combination)
Total: −1.04pp drag

Pattern: Why did MENA1 Tobacco volume surge while quality crashed? Possible: sudden enforcement campaign or content trend in the region pushed more Tobacco cases into review faster than moderator capacity could absorb. Compare to the broader MENA1 market data (which dropped −1.81pp overall) — Tobacco is the dominant contributor to MENA1's regional decline.

▶ Why some markets had outsized impact — three-factor framework

What makes a market matter for Tobacco's number

A market's contribution to Tobacco's accuracy delta depends on three factors:

Share of Tobacco's volume. The bigger the slice, the more leverage. ID (17%), EN (10%), DE (8%), UA (8%) lead. Low-share markets like LV, EE, NO can have huge accuracy swings with negligible global impact.
Accuracy gap from Tobacco's overall mean (~80%). Markets sitting well below mean (LATAM 49%, BR 55%, DE 58%, MENA2 54%) are "headroom" markets — small accuracy gains translate disproportionately. Markets near or above mean (PK 92%, KR 96%) have less upside per unit of effort.
Share-direction × accuracy-direction alignment. Best case: above-mean market growing share AND improving (VN). Worst case: above-mean market shrinking AND declining (ID). The interaction term captures this combinatorial effect.

Why MENA2 dominated: low starting accuracy (huge headroom) + meaningful share (5%+) + share contraction at the right time (compounded the rate effect).

Why ID dragged so much: largest share by far (17%), accuracy was above mean at W15 (so losing share hurt), and accuracy also fell (compounding interaction).

▶ Region-level Tobacco summary (full breakdown)

Where Tobacco moderation actually happens

EMEA: 47.55% of Tobacco's W16 mix (47.55% share, 94K cases). Accuracy 74.39% — well below Tobacco's 81.20% mean and far below the region-aggregate Tobacco baseline. EMEA Tobacco improved +2.16pp this week, driven almost entirely by MENA2 + DE recovery.

APAC: 44.06% of Tobacco's mix (87K cases). Accuracy 89.79% — well above Tobacco's mean. APAC is the high-quality anchor, but its accuracy slipped −0.12pp this week as ID drag offset VN/MY/TH gains.

AMS: 8.38% of Tobacco's mix (17K cases). Smallest share but biggest WoW accuracy improvement at +15.86pp (LATAM and BR both jumped 8–25pp). Despite the small footprint, AMS contributed +0.84pp to Tobacco's gain (39%).

Implications:

EMEA is structurally the weakest Tobacco region — sustained EMEA improvement would yield the biggest global lift.
APAC's flatness this week is mostly an ID story; if ID stabilizes, APAC reverts to a +0.5pp/week tailwind.
AMS is volatile (LATAM Tobacco at 49% W15 is implausibly low) — the W16 jump may be partially correction-to-mean, not real improvement.

Indonesia (ID) — policy-level breakdown

ID OMA · APAC

86.81%

▼ 3.41pp

from W15 90.22% · 4-week: −5.97pp from W13 92.79%

#1 dragger: Youth Sexualized Behaviors

47.66%

▼ 22.34pp

137% of ID Δ · weight 9.08% → 15.27%

#2 dragger: Youth Regulated Goods

26.46%

▼ 51.86pp

133% of ID Δ · sample 37→12, partly noise

Tobacco in ID

83.66%

▼ 6.81pp

30% of ID Δ · weight 26.3% → 14.3% absorbed most damage

⚠

ID's W16 decline is a youth-content moderation problem, not a Tobacco problem

Youth Sexualized Behaviors contributes −4.66pp (137% of ID's −3.41pp) — the single biggest drag. Youth Regulated Goods adds another −4.55pp (133%). Adult Sexualized Behaviors −2.35pp (69%). Youth Non-Sexualized Nudity −1.74pp (51%, most sample-backed). Tobacco contributes −1.01pp (30%) — significant but smaller than the youth-content cluster. The youth-content failures with growing weight are the dominant story.

⚙

Heavy data caveat: ID's policy-level samples are extremely small

Most policies have 1–12 samples per week. Only 4 policies (Tobacco, Youth Sexualized Behaviors, Youth Non-Sexualized Nudity, Youth Body Exposure - Sig & Mod) have W16 samples ≥ 30. Treat anything below sample 30 as directional, not quantitative. Some of the most extreme single-week swings (Counterfeit +40pp, Adult Sexual Activity −60pp, Graphic Content - Public Interest −100pp) are statistical artifacts on samples of 1–5.

05a

Methodology — shift-share applied to ID

For ID-internal analysis, "Global Acc" in the formulas is replaced by ID's overall accuracy (90.22% in W15). Each ID policy is decomposed into rate, weight, and interaction effects relative to ID's mean.

Rate effect = GWt_W15 × (Acc_W16 − Acc_W15) — pure accuracy change at prior weight (within ID)
Weight effect = (GWt_W16 − GWt_W15) × (Acc_W15 − ID Acc_W15) — mix shift relative to ID mean (90.22%)
Interaction = (GWt_W16 − GWt_W15) × (Acc_W16 − Acc_W15) — joint change

−3.84pp

Sum of rate effects

accuracy degradation across policies

−5.50pp

Sum of weight effects

mix shifts toward below-mean policies

−5.05pp

Sum of interactions

share & accuracy moving against

⚙

Reconciliation note: shift-share sum is −14.4pp, ID's actual OMA Δ is −3.4pp

The mismatch is because this dataset reports "Title Accuracy", which differs from OMA accuracy in how cases are aggregated. Use the per-policy Rate / Weight / Interaction breakdown to compare policies against each other within ID, not as exact accounting against the −3.41pp headline. The directionality and ranking are valid; the absolute magnitudes are inflated by the metric mismatch.

05b

Policies dragging ID's accuracy — full shift-share decomposition

Policy	Acc W15	Acc W16	Δ Acc	Wt W15	Wt W16	Δ Wt	Rate	Weight	Inter	Total	% of ID Δ	Sample
Youth Sexualized Behaviors ?	70.00%	47.66%	−22.34	9.08%	15.27%	+6.19	−2.029	−1.251	−1.382	−4.662	137%	51 → 32
Youth Regulated Goods and Services ?	78.32%	26.46%	−51.86	8.87%	8.79%	−0.08	−4.602	+0.010	+0.044	−4.548	133%	37 → 12
Adult Sexualized Behaviors ?	58.38%	41.52%	−16.86	4.16%	7.55%	+3.39	−0.701	−1.079	−0.571	−2.352	69%	25 → 18
Youth Body Exposure - Light (4-17) ?	83.02%	43.38%	−39.63	0.44%	4.51%	+4.07	−0.174	−0.293	−1.614	−2.081	61%	12 → 11
Light Body Exposure ?	66.67%	19.91%	−46.76	0.93%	3.12%	+2.19	−0.436	−0.516	−1.024	−1.976	58%	3 → 4
Youth Non-Sexualized Nudity ?	81.80%	66.91%	−14.89	8.36%	10.48%	+2.12	−1.245	−0.178	−0.316	−1.739	51%	169 → 98
Tobacco and Nicotine ?	90.47%	83.66%	−6.81	26.33%	14.34%	−11.99	−1.793	−0.030	+0.817	−1.007	30%	91 → 28
Adult Sexual Activity ?	67.12%	6.48%	−60.64	1.59%	1.31%	−0.28	−0.966	+0.066	+0.173	−0.728	21%	9 → 5
Top-8 negative subtotal							−11.946	−3.270	−3.873	−19.10	560%

Reading the columns: Rate = pure accuracy regression at W15 weight. Weight = mix shift relative to ID's W15 mean (90.22%) — negative means the policy gained share while sitting below ID mean (a drag). Inter = compounding effect when share and accuracy move together adversely. Total = sum of the three.
Δ Wt is each policy's weight within ID's moderation mix. Some policies (Counterfeit Goods, Personal Information - High Risk, Graphic Content - Public Interest) excluded due to 0%/100% data integrity issues.

05c

Policies improving in ID — mostly statistical noise

Policy	Acc W15	Acc W16	Δ Acc	Wt W15	Wt W16	Rate	Weight	Inter	Total	% of ID Δ	Sample
High Risk Weight Loss & Muscle Gain	0.00%	100.00%	+100.00	1.37%	0.62%	+1.368	+0.673	−0.746	+1.295	−38%	2 → 1
Frauds & Scams	72.22%	100.00%	+27.78	5.60%	2.49%	+1.554	+0.560	−0.864	+1.250	−37%	18 → 4
Alcohol	62.31%	100.00%	+37.69	2.38%	1.88%	+0.898	+0.140	−0.190	+0.849	−25%	5 → 1
Physical Assault	38.35%	100.00%	+61.65	1.63%	0.02%	+1.003	+0.832	−0.989	+0.846	−25%	5 → 1
High Risk Driving	70.09%	100.00%	+29.91	3.12%	1.91%	+0.933	+0.243	−0.362	+0.814	−24%	11 → 6
Youth Body Exposure - Sig & Moderate ?	77.45%	91.46%	+14.01	5.19%	5.65%	+0.727	−0.060	+0.065	+0.733	−21%	60 → 39
Graphic Content - Realistic Fiction	2.06%	100.00%	+97.94	0.63%	1.24%	+0.619	−0.539	+0.599	+0.678	−20%	3 → 2
Top-7 positive subtotal						+7.102	+1.849	−2.487	+6.464	−190%

Most "improvements" are statistical artifacts on samples of 1–6. Only Youth Body Exposure - Sig & Moderate (sample 60→39, +14pp on growing weight) is a credible offset. Note that for high-rate-effect rows, weight effect can flip sign because below-mean policies that shrink (e.g., High Risk Weight Loss going from 1.37% → 0.62% with W15 acc 0%, far below ID mean) is a positive weight effect even though the rate effect dominates here.

▶ Youth Sexualized Behaviors — most credible single drag in ID

Youth Sexualized Behaviors: rate × growing share = compounding

Multi-week trajectory: W13 70.00% (sample 16) → W14 41.14% (sample 11) → W15 70.00% (sample 51) → W16 47.66% (sample 32).

The W14–W15 oscillation suggests this category sits on the noise threshold, but the W15 figure (70% on 51 samples) is the most reliable W15 baseline — and W16's 47.66% on 32 samples is large enough that the −22pp gap likely reflects a genuine quality issue.

The compounding factor: share grew 9.08% → 15.27% (+6.19pp). Volume jumped 16,656 → 25,077 (+50%). So more cases entered a now-failing category — interaction effect within ID.

Action: The growing volume + falling accuracy is the worst direction for any policy. Investigate whether the volume growth is content-driven (e.g., a viral trend) or operational (changed routing rules), and whether moderator capacity for this category scaled accordingly.

▶ Youth Non-Sexualized Nudity — strongest sample-backed drag

Youth Non-Sexualized Nudity: 81.80% → 66.91%, sample 169 → 98

This is the most statistically reliable drag in ID this week. Even after W16's sample reduction, 98 samples gives reasonable confidence in the −14.89pp signal.

Multi-week: W13 73.25% (sample 43) → W14 94.67% (sample 27) → W15 81.80% (sample 169) → W16 66.91% (sample 98). The category is volatile but the W15→W16 movement is supported by a meaningful sample on both sides.

Share also grew (8.36% → 10.48%, +2.12pp), so the rate × interaction compounded.

Action: This is one of the few ID policy signals that is sample-backed and trend-coherent. Add to the ID quality investigation immediately.

▶ Tobacco in ID — a different story from the global Tobacco picture

Why Tobacco's role in ID is different from its global role

Globally, Tobacco is a +2.16pp gainer. But in ID specifically:

Accuracy ↓: 90.47% → 83.66% (−6.81pp)
Share ↓: 26.33% → 14.34% (−11.99pp absolute, −45% relative)
Volume ↓: 48,288 → 23,555 (−51%)

This is the rare combination where weight effect more than offsets rate effect. Tobacco was an above-mean segment in ID (90% vs ID's 90.22% mean was barely above). Cutting its share in half removes some volume from a now-mediocre segment, which in shift-share terms is roughly neutral.

4-week trend (Tobacco accuracy in ID): 96.47% → 96.24% → 88.14% → 83.66%. This is a real degradation pattern, and ID is the single biggest contributor to the global Tobacco x Markets drag (−1.09pp on the Tobacco delta).

The ID Tobacco volume halving is itself worth investigating: is it a real moderation traffic decrease (content trend) or a routing/sampling change?

▶ Youth Body Exposure - Light — 9× volume surge with accuracy halving

An enforcement campaign signature?

Volume in this policy went 804 → 7,410 in a single week (a 9× jump). Accuracy went 83.02% → 43.38% (−39.6pp).

This pattern is unusual: a category that was near-marginal in W15 suddenly carries 9× volume at much worse quality. Hypothesis: a content trigger (viral trend, regulatory directive, or routing rule change) flooded this category with cases the moderation pipeline wasn't calibrated for.

Sample is small (12 → 11) so the magnitude is uncertain, but the volume pattern alone is worth investigating — that kind of step-change usually has a discrete cause.

▶ ID 4-week OMA trajectory — what's been declining

Indonesia's accuracy across W13–W16

W13: 92.79% — high baseline
W14: 89.63% (−3.16pp WoW)
W15: 90.22% (+0.58pp recovery)
W16: 86.81% (−3.41pp)
Cumulative: −5.97pp from W13 baseline

The recovery in W15 was partial and W16 erased it plus more. ID is now its lowest level in the analyzed window.

The decline is increasingly concentrated in Youth-related policies (per W16 data). This points to either: a) a Indonesia-specific content trend (more youth-content moderation cases hitting the system), b) a localization issue with policy guidelines for youth content in Bahasa Indonesia, or c) a moderator team rotation that affected youth-category calibration.

▶ Caveats: how to read this analysis

Important data limitations

1. The "Title Accuracy" column in this dataset isn't directly comparable to OMA accuracy. Sum of policy contributions reaches −14.4pp while ID's actual OMA delta is −3.41pp. The metric definitions diverge — use this table for relative comparison among ID policies, not for precise contribution accounting.

2. Sample sizes are very small. Most policies have W16 samples between 1 and 30. Only 4 policies cross the 30-sample threshold. This is fundamentally a small-N analysis at the policy × market level.

3. Several policies show 0% or 100% accuracy persistently. These are likely pipeline/data-integrity artifacts, not real moderator behavior. Excluded from the dragger table.

4. Within-ID share doesn't sum to 100%. The dataset's "Weight Proportion" column appears to be normalized within sub-buckets, not within ID overall. Volume-derived share has been used in this analysis instead.

Recommended actions

1Don't celebrate the −0.04pp headline. The mix is unstable: 10+ policies dropped 10–35pp this week, balanced by equally large gainers. If next week one of the gainers fails to repeat, headline could swing 1–2pp.

2Investigate Violent Behaviors (−19.32pp accuracy, weight growing) — triple-negative on a high-reputational-risk category. Escalate to policy ops.

3Personal Information - High Risk dropped 31pp — privacy-sensitive, suspicious magnitude. Audit sample composition and labeler agreement before W17.

4ID + LATAM combined drag of 1648% — both regions saw 3+pp accuracy drops. Region-level RCA needed to determine if this is a shared cause (model update, content shift) or independent.

5ID is on a 3-of-4-week declining trend (W13 92.79% → W16 86.81%, cumulative −5.97pp). This is no longer a single-week event — request a structured Indonesia retrospective.

6Verify the offsets are real, not artifacts. Disparaging Religion (−76pp), Invasive Cosmetic (+23pp), MENA2 region (+6.84pp), Adult Fetish & Kinks (+32pp) — these magnitudes invite sampling/labeling scrutiny before being trusted as signal.

7Data integrity: 4 policies report 0% accuracy in both weeks (Animal Abuse & Graphic Content, Youth Physical Abuse, Graphic Content, Adult Sexual Abuse) yet still drag the global via weight changes. Likely a data filter or definitional issue — fix before treating as RCA signal.

▶ Priority matrix — what to triage first

Triage prioritization

P0 (immediate, integrity risk): Personal Information - High Risk (−31pp), Violent Behaviors (−19.32pp). Both are reputational categories with material accuracy regression on growing or stable weight.

P0 (data integrity): Verify Disparaging Religion (−76pp), 0%-accuracy policies, and other extreme single-policy swings are not sample/labeling artifacts. Swings of this size are more likely measurement issues than real changes.

P1 (regional): ID + LATAM joint investigation. If the cause is shared (e.g., a regional model rollout), one fix solves both. Otherwise treat as independent.

P1 (trend): ID 4-week decline pattern — even if W16 isolated event resolves, the trend itself warrants attention.

P2 (lock in gains): Tobacco & Nicotine recovery (3 weeks now positive) and Adult Sexualized Behaviors offset — confirm structural drivers, not just regression-to-mean.

P3 (signal hygiene): Replace single-week % of Δ as the primary metric for non-trivial WoW reports — when global Δ < 0.1pp, use absolute pp contributions instead.

Global W14

84.28%

▼ 1.63pp

from 85.91% · 100% of decline

EMEA · Wt 32.4%

79.65%

▼ 6.48pp

128% of global decline

APAC · Wt 47.8%

88.61%

▲ 0.92pp

−31% offset the decline

AMS · Wt 19.7%

81.37%

▼ 0.12pp

1% of global decline

◈ Overview −1.63pp

Methodology, decomposition & fuzzy

▦ Hub × Type 129%

EMEA Appeal alone = 50.9%

◉ EMEA Markets 110%

MENA1 leads at 38.4%

★ Top Projects TOP 10

GB-MNL #1 at 25.1%

⚡ Actions 7

P0–P2 prioritized items

Methodology & summary

W14 (Apr 4–10) vs W13 (Mar 28–Apr 3). Each segment's total contribution is decomposed into three additive components. Positive % of Δ = contributed to the decline; negative = offset.

Rate effect = GWt_W13 × (Acc_W14 − Acc_W13) — pure accuracy change at prior weight
Weight effect = (GWt_W14 − GWt_W13) × (Acc_W13 − Global Acc_W13) — mix shift relative to global mean
Interaction = (GWt_W14 − GWt_W13) × (Acc_W14 − Acc_W13) — joint change

−2.64pp

Total rate effect

161% of decline

+0.76pp

Total weight effect

Offset 47%

+0.24pp

Total interaction

Offset 15%

⚠

Quality degraded across the board — here's why this matters

The rate effect (−2.64pp) alone would have caused a 3.4pp decline if the mix hadn't shifted favorably. The actual −1.63pp is the best-case outcome given how much accuracy fell — saved only by favorable weight rebalancing.

◆

APAC's growth was the safety net — here's how

APAC (88.6% accuracy, above global mean) grew from 47.2% → 47.8% of mix. This single shift absorbed nearly half the damage. Without it, the headline would read −3.1pp instead of −1.63pp.

▶ How to read this decomposition

Interpreting the three effects

Rate effect (161%) tells us accuracy degradation within segments — holding mix constant — more than fully explains the decline. This is the "quality got worse" signal.

Weight effect (offset 47%) means the mix actually shifted favorably: segments with above-average accuracy gained share. Without this, the decline would have been ~3.4pp instead of 1.63pp.

Interaction (offset 15%) captures the joint effect — segments that lost accuracy also tended to shrink in weight, providing a small additional buffer.

The sum: −2.64 + 0.76 + 0.24 = −1.63pp, matching the observed global decline exactly.

Fuzzy rate impact

−0.36pp

Fuzzy rate increase

21.8% of total decline

−1.28pp

Non-fuzzy accuracy decline

78.2% of total decline

★

Fuzzy rate rose +0.35pp — but three hubs tell completely different stories

AMS: decline is 100% fuzzy — real quality held steady. APAC: powered through the biggest fuzzy headwind (+0.49pp) with +1.41pp genuine improvement. EMEA: 96% of the −6.48pp drop is real accuracy errors, not borderline ambiguity.

Hub	FR W14	FR W13	Δ FR	Acc Δ total	Fuzzy explains	Non-fuzzy Δ	Verdict
AMS	1.76%	1.57%	+0.19pp	−0.12pp	−0.19pp	+0.06pp	Entire decline is fuzzy-driven. Non-fuzzy accuracy actually improved.
APAC	2.08%	1.59%	+0.49pp	+0.92pp	−0.49pp	+1.41pp	Fuzzy headwind absorbed — non-fuzzy quality improved strongly (+1.41pp).
EMEA	3.12%	2.86%	+0.26pp	−6.48pp	−0.26pp	−6.21pp	96% of EMEA's decline is non-fuzzy. Fuzzy is a minor factor here.
Global	2.35%	2.00%	+0.36pp	−1.63pp	−0.36pp	−1.28pp	Fuzzy = 22%, non-fuzzy = 78%

Key insight: The three hubs tell very different stories. AMS's small decline is 100% fuzzy — actual quality held steady. APAC powered through a large fuzzy increase with even larger genuine improvement. EMEA's massive drop is overwhelmingly real accuracy errors — fuzzy rate barely moved. This confirms EMEA's issue is fundamentally about moderation quality, not borderline-case ambiguity.

▶ AMS — decline is 100% fuzzy-driven

AMS: a fuzzy story, not a quality story

AMS accuracy fell just −0.12pp, and the entire decline is explained by the fuzzy rate increase (+0.19pp). Once fuzzy is stripped out, AMS non-fuzzy accuracy actually improved by +0.06pp.

This means AMS's labeling quality is holding steady or improving — the headline number is being dragged by borderline cases being reclassified or new ambiguous content types entering the pipeline.

Action: Consider fuzzy calibration or policy clarification for the specific content types driving the 0.19pp fuzzy increase. This is a recoverable loss.

▶ APAC — strong quality masked by fuzzy headwind

APAC: quality is better than the headline suggests

APAC's reported accuracy improved +0.92pp, but the underlying non-fuzzy improvement is actually +1.41pp — being partially masked by a +0.49pp fuzzy rate increase (the largest of any hub).

APAC absorbed the biggest fuzzy headwind and still delivered the best headline improvement. However, the fuzzy trend (+0.49pp WoW) needs monitoring — if it continues, it will eventually overwhelm the quality gains.

Action: Investigate whether policy updates or new content types in APAC are driving the fuzzy surge. The quality fundamentals are strong, but the fuzzy trajectory is concerning.

▶ EMEA — fuzzy is a rounding error; the problem is real

EMEA: genuine moderation quality crisis

EMEA's fuzzy rate only increased +0.26pp, explaining just 4% of its massive −6.48pp accuracy decline. The remaining −6.21pp is pure non-fuzzy accuracy degradation.

This definitively rules out "borderline cases" as an explanation for EMEA's performance. The problem is fundamentally about labeler accuracy, policy interpretation, or operational execution — not content ambiguity.

EMEA also has the highest absolute fuzzy rate (3.12% vs 2.08% APAC, 1.76% AMS), suggesting a structural baseline of ambiguity in its content mix, but the week-over-week change is small.

Hub × project type

▼

EMEA's three project types account for 129% of the decline

EMEA Appeal alone is 50.9%: accuracy collapsed 82.7% → 76.6% (−6.06pp) while still carrying 15.6% of global weight. APAC General Recall is the largest single offset (−30.6%), improving to 90.1% while gaining share.

Hub	Type	Acc W14	Acc W13	Δ Acc	GWt W14	GWt W13	Rate	Weight	Inter	Total	% of Δ
EMEA	Appeal	76.6%	82.7%	−6.06	15.6%	19.1%	−1.156	+0.113	+0.212	−0.831	50.9%
EMEA	General Recall	84.2%	91.6%	−7.37	12.1%	10.0%	−0.734	+0.119	−0.156	−0.771	47.1%
EMEA	Analytics Appeal	77.7%	89.7%	−12.01	4.7%	3.2%	−0.387	+0.055	−0.174	−0.505	30.9%
AMS	General Recall	84.4%	85.8%	−1.37	14.3%	9.5%	−0.130	−0.005	−0.066	−0.200	12.2%
APAC	Appeal	85.4%	85.7%	−0.27	15.9%	18.8%	−0.052	+0.006	+0.008	−0.037	2.3%
Negative subtotal										−2.345	143.4%
AMS	Appeal	72.9%	78.7%	−5.86	5.0%	10.5%	−0.614	+0.394	+0.320	+0.100	−6.1%
AMS	Analytics Appeal	79.0%	62.7%	+16.35	0.5%	0.6%	+0.102	+0.038	−0.027	+0.113	−6.9%
APAC	General Recall	90.1%	88.6%	+1.49	27.5%	24.1%	+0.359	+0.091	+0.051	+0.501	−30.6%
Positive subtotal										+0.714	−43.7%

AMS Appeal — accuracy did fall (rate = −0.61pp), but its accuracy is well below the global mean, so the weight halving from 10.5% → 5.0% was net positive for the global number (+0.39pp weight effect), flipping total contribution to +0.10pp.

▶ EMEA Appeal deep dive — why −6.06pp accuracy drop?

EMEA Appeal: rate effect dominance

The −1.156pp rate effect is the single largest driver in this decomposition. EMEA Appeal dropped from 82.7% to 76.6%, a −6.06pp swing, while still carrying 15.6% global weight.

The weight did shrink (19.1% → 15.6%), which partially offset the damage (+0.113pp weight effect, +0.212pp interaction), but the sheer magnitude of the accuracy collapse overwhelms both offsets.

Key question: Is this driven by specific BPO sites, policy updates, or labeler calibration drift? See the "Top Projects" tab for project-level decomposition.

▶ APAC General Recall — why it's the biggest offset

APAC GR: the stabilizer

APAC General Recall improved from 88.6% to 90.1% (+1.49pp) while also gaining weight (24.1% → 27.5%). This is the ideal scenario: an above-average segment both improves and grows.

All three effects are positive: rate (+0.359pp), weight (+0.091pp), interaction (+0.051pp), summing to +0.501pp — the single largest offset at −30.6% of the decline.

EMEA market breakdown

⚠

5 markets drive 110% of the global decline — almost entirely rate-driven

MENA1 + EN + SSA + DE + MENA2. The damage is concentrated: MENA1 alone is 38.4%. Only SSA compounds all three effects — weight grew into a below-mean, declining segment.

Market	Acc W14	Acc W13	Δ Acc	GWt W14	GWt W13	Rate	Weight	Inter	Total	% of Δ
MENA1	80.5%	90.4%	−9.89	6.26%	6.46%	−0.639	−0.009	+0.020	−0.628	38.4%
EN (GB)	78.8%	88.5%	−9.67	3.56%	4.18%	−0.404	−0.016	+0.061	−0.360	22.0%
SSA	75.9%	84.2%	−8.22	3.65%	2.46%	−0.202	−0.021	−0.099	−0.321	19.7%
DE	77.4%	86.8%	−9.42	2.93%	2.90%	−0.273	+0.000	−0.003	−0.276	16.9%
MENA2	75.8%	80.9%	−5.08	4.28%	4.37%	−0.222	+0.005	+0.005	−0.213	13.0%
IT	84.2%	93.2%	−9.07	2.43%	2.27%	−0.205	+0.012	−0.015	−0.208	12.7%
IL	67.2%	83.8%	−16.55	0.36%	0.43%	−0.071	+0.001	+0.011	−0.059	3.6%
UA	74.3%	77.3%	−3.04	1.08%	1.04%	−0.032	−0.003	−0.001	−0.036	2.2%

SSA is the only top market where all three effects are negative — weight expanded (2.46%→3.65%), accuracy sits below the global mean, and accuracy also fell. A triple headwind worth investigating.

▶ MENA1 deep dive — largest market contributor at 38.4%

MENA1: pure rate problem

MENA1 dropped from 90.4% to 80.5% (−9.89pp) while maintaining roughly stable weight (6.46% → 6.26%). The rate effect (−0.639pp) almost entirely explains its contribution.

This is a nearly pure accuracy regression — no confounding mix shifts. The investigation should focus on what changed in MENA1 labeling quality, policy interpretation, or task distribution during W14.

▶ SSA triple headwind — all three effects negative

SSA: compounding failure mode

SSA is unique among all segments: rate, weight, and interaction are all negative.

Rate (−0.202pp): accuracy fell from 84.2% to 75.9%, a −8.22pp drop.

Weight (−0.021pp): SSA's weight grew from 2.46% to 3.65%, but since SSA accuracy (84.2%) was below the W13 global mean (85.9%), this expansion hurts.

Interaction (−0.099pp): the weight grew AND accuracy fell simultaneously — the worst combination.

Key question: Was the SSA weight increase intentional (ramp-up)? If so, quality support did not scale with volume.

▶ IL — steepest single-market accuracy drop (−16.55pp)

IL: low weight limits global impact

IL has the most dramatic accuracy decline of any market (83.8% → 67.2%, −16.55pp), but its small weight (0.36%) limits global impact to just −0.059pp (3.6% of decline).

Still worth flagging: a 16.5pp drop likely indicates a systemic issue — new policy, labeler turnover, or task type change — that could worsen if IL weight increases.

EMEA — top 10 individual projects (shift-share)

▼

Top 3 projects drive 67% of the global decline

GB-ALR-MNL (25.1%): weight surged 6x into crashing accuracy. MENA2-CAS (22.0%): weight quadrupled into a chronically below-mean segment. MENA1-ANK (19.7%): pure accuracy regression. The common thread: weight expansion without quality support.

Project	Type	Acc W14	Acc W13	GWt W14	GWt W13	Rate	Weight	Inter	Total	% of Δ
GCP-TT-Video appeal-GB-en-ALR-MNL	Appeal	69.9%	100.0%	2.25%	0.36%	−0.108	+0.266	−0.568	−0.410	25.1%
TT-Video-Analytics Appeal-MENA2-ar-T&S-CAS	Analytics Appeal	67.1%	73.5%	2.25%	0.51%	−0.033	−0.215	−0.112	−0.360	22.0%
TT-Video-General Recall General-MENA1-ku-CNX-ANK	General Recall	84.4%	96.8%	2.59%	2.61%	−0.322	−0.002	+0.002	−0.321	19.7%
TT-Video appeal-KE/TZ/UG-sw-TP-NBO	Appeal	69.4%	81.4%	1.12%	0.77%	−0.092	−0.016	−0.043	−0.151	9.2%
GCP-TT-Video-General Recall General-GB-en-TP-ALB	General Recall	58.9%	92.7%	0.06%	1.40%	−0.473	−0.091	+0.451	−0.113	6.9%
GCP-TT-Video appeal-IT-it-TP-BRV	Appeal	84.7%	92.3%	0.87%	1.60%	−0.122	−0.046	+0.055	−0.113	6.9%
TT-Video appeal-MENA1-other-TP-MAK	Appeal	63.4%	96.1%	0.22%	0.53%	−0.174	−0.032	+0.101	−0.104	6.4%
GCP-TT-Video-General Recall General-DE-de-TLS-LEJ	General Recall	75.4%	85.4%	1.00%	0.46%	−0.046	−0.003	−0.054	−0.102	6.3%
TT-Video appeal-MENA1-ar-CNX-IBD	Appeal	74.2%	78.9%	1.48%	1.17%	−0.056	−0.021	−0.014	−0.091	5.6%
TT-Video-General Recall General-MENA1-ar-TP-MAK	General Recall	N/A	100%	0.00%	0.53%	−0.534	−0.075	+0.534	−0.075	4.6%

Weight expansion is the recurring theme: 6 of 10 projects saw weight increase — when that expansion targets below-mean or declining-accuracy segments, the interaction effect compounds the damage. Only GR-MENA1-ku-CNX-ANK is a pure rate story (stable weight, −12.4pp accuracy drop).

▶ GCP-TT-Video appeal-GB-en-ALR-MNL — #1 contributor at 25.1%, here's the mechanism

GB MNL: the weight surge trap

This project's weight surged 6.25x (0.36% → 2.25%) while accuracy crashed from 100% → 69.9%. The interaction effect (−0.568pp) is the largest single component — weight grew dramatically while accuracy fell dramatically.

The weight effect is actually positive (+0.266pp) because the project was above the global mean in W13 (100% vs 85.9%). But the interaction overwhelms it: expanding into what became a low-accuracy segment is a compounding failure.

Key question: Was this a deliberate ramp-up of a previously small project? If so, quality controls didn't scale with volume.

▶ AA-MENA2-ar-T&S-CAS — weight quadrupled into a below-mean segment

MENA2 CAS: weight-driven damage

Weight grew from 0.51% → 2.25% (4.4x) while accuracy was already below the global mean (73.5%) and fell further to 67.1%. The weight effect alone (−0.215pp) is the largest component — this is a mix-shift problem, not primarily a rate problem.

All three effects are negative: rate (−0.033), weight (−0.215), interaction (−0.112). A triple headwind totaling −0.360pp (22.0% of decline).

Action: Validate whether this weight increase was intentional. Expanding a chronically below-mean segment without quality uplift compounds the global decline.

▶ TT-Video-General Recall General-MENA1-ku-CNX-ANK — pure rate collapse, no mix excuse

MENA1 ANK: classic accuracy regression

This is the cleanest rate-driven case in the top 10: weight barely moved (2.61% → 2.59%), so the rate effect (−0.322pp) almost entirely explains the −0.321pp total contribution.

Accuracy dropped from 96.8% → 84.4% (−12.4pp) — a steep fall from a high base. No mix-shift or weight excuses here; something changed in execution quality.

Action: Investigate what changed for Kurdish-language GR in MENA1 during W14 — policy update, new labeler cohort, or calibration drift.

Recommended actions

1EMEA Appeal quality RCA — focus on EN, MENA1, DE BPO sites where rate effect is the dominant driver.

2DE-LEJ site investigation — two projects with catastrophic accuracy (0.0% and N/A), possible vendor execution failure.

3Four projects went to N/A (zero weight) in W14 — confirm whether this is sampling shortfall or project suspension.

4SSA weight expansion — the only market with a triple-negative (rate + weight + interaction all negative); validate if the volume increase is intentional.

5EMEA GCP weight surge (0.85% → 2.98%) into a 74%-accuracy segment — check if this is a ramp-up or reallocation, and whether quality support is in place.

6APAC fuzzy rate jumped +0.49pp (largest increase) — investigate whether policy updates or new content types are driving borderline cases. Non-fuzzy quality is strong, but the fuzzy trend needs monitoring.

7AMS decline is entirely fuzzy-driven — non-fuzzy accuracy actually improved. Consider whether fuzzy calibration or policy clarification could recover the 0.19pp loss.

▶ Priority matrix — impact vs effort

Triage prioritization

P0 (immediate): DE-LEJ site — likely site-level outage/failure, accounts for 63.7% of decline from just two projects. Quick root cause identification could recover the most impact.

P1 (this week): Four N/A projects — verify if delivery gaps are fixable. If unplanned, restoring these could offset 167% of the decline (they overlap with rate-driven decline).

P1 (this week): MENA1 accuracy regression — 38.4% of decline, pure rate effect. Check if a policy update or labeler calibration issue occurred during W14.

P2 (track): SSA triple headwind and EMEA GCP weight surge — these are structural issues that need monitoring over W15–W16 to determine if they're transient or persistent.

P2 (track): APAC fuzzy rate surge (+0.49pp) — quality fundamentals are strong but the fuzzy trajectory needs monitoring. AMS fuzzy calibration is a quick-win candidate.

Global OMA W15

86.04%

▲ +1.92pp

from W14 84.12% · biggest weekly gain

EMEA · Wt 32.59%

83.29%

▲ +3.71pp

+62% of global gain

AMS · Wt 20.21%

83.55%

▲ +2.34pp

+24% of global gain

APAC · Wt 46.94%

89.05%

▲ +0.70pp

+15% of global gain

◈ Overview +1.92pp

Methodology & headline

◉ Markets EN +20%

EMEA-led recovery

★ Top Policies 5 lifters

Combat sports leads

⛈ Tobacco × Markets EMEA 90%

where the gain came from

◢ ID Deep-dive +0.58pp

mirror image of W16

⚡ Actions 5

Lock in the gains

Methodology & headline summary

W15 (Apr 11–17) vs W14 (Apr 4–10). OMA jumped +1.92pp — the largest weekly gain in the observed window, fully reversing the W14 decline and finishing 0.10pp above the W13 baseline.

Rate effect = GWt_W14 × (Acc_W15 − Acc_W14) — pure accuracy change at prior weight
Weight effect = (GWt_W15 − GWt_W14) × (Acc_W14 − Global Acc_W14) — mix shift relative to global mean
Interaction = (GWt_W15 − GWt_W14) × (Acc_W15 − Acc_W14) — joint change

▲

The recovery is policy-led, not market-led — a handful of high-stakes categories rebounded across all regions

The top 5 policies alone contributed +2.42pp (126% of the global gain). Combat Sports +0.59pp, Designated Dangerous Entities +0.53pp, Violent Behaviors +0.44pp, Alcohol +0.44pp, Highly Imitable Acts +0.42pp. Several of these (Violent Behaviors, Personal Information - High Risk) had collapsed in W14 — W15 looks like a correction, not a structural improvement.

⚙

Rate effect drove the recovery (+2.0pp); Weight effect added another (+1.1pp)

Among sample-credible policies: Rate sum +2.02pp (accuracy genuinely improved), Weight sum +1.09pp (a few low-accuracy policies like Highly Imitable Acts and Alcohol contracted in share — beneficial since they were below mean), Interaction sum −0.45pp (slightly opposing). Quality recovery is the dominant story.

⚠

Hidden drags during a recovery week — watch these for W16 reversal risk

Dangerous Trends - Serious Harm dropped −9.36pp (drag −0.47pp) — the single biggest policy drag. Adult Sexualized Behaviors −3.67pp (drag −0.39pp). MX −6.07pp accuracy (drag −0.25pp). PK −1.71pp. The strong headline conceals these regressions — if any expand, the W16 picture changes quickly. (Spoiler: in W16, several of these flipped further while Adult Sexualized Behaviors recovered.)

▶ How concentrated was the recovery?

Concentration analysis

Top 3 markets (EN, LATAM, MENA1) contributed +1.07pp — 56% of the global gain.
Top 5 markets contributed +1.56pp — 82%.
Top 3 policies contributed +1.56pp — 81%.
Top 5 policies contributed +2.42pp — 126% (i.e., the rest of the policies net negative).

Recovery is genuinely concentrated — a handful of policies and markets did most of the work. This is fragile: if next week one or two of these reverse, the headline swings substantially.

▶ How does W15 fit in the W13–W16 trajectory?

4-week trajectory

W13: 85.94% baseline
W14: 84.12% (−1.82pp WoW) — significant drop
W15: 86.04% (+1.92pp WoW) — biggest weekly gain
W16: 86.00% (−0.04pp WoW) — flat but turbulent underneath

The W14–W15 swing pattern (−1.82pp → +1.92pp) is unusually large. Such a near-perfect reversal often suggests operational/sampling causes rather than a true two-step quality change: e.g., a labeling guideline correction issued mid-W14 only fully took effect in W15.

Volume context: OMA pipeline grew +21% from W14 (4.66M) to W15 (5.65M). The bigger sample base may have stabilized noisy categories.

By market — top contributors

▲

EMEA delivered 62% of the +1.92pp recovery, and it's broad — not single-market

EMEA aggregate accuracy rebounded 79.59% → 83.29% (+3.71pp). The contributors are spread: EN +9.90pp, MENA1 +5.90pp, SSA +3.49pp, ES +8.40pp. AMS added +24% (LATAM +4.39pp, CA +7.58pp, BR +2.19pp). APAC contributed +15%, with BD +5.44pp and PH +3.10pp leading.

⚠

During this recovery week, MX and PK regressed

MX dropped 88.85% → 82.78% (−6.07pp), the biggest single-market regression and offset −0.25pp. PK dropped 92.40% → 90.69% (−1.71pp on heavier weight, drag −0.17pp). Both are noteworthy because they went the opposite direction of the broad recovery.

Market	Acc W14	Acc W15	Δ Acc	Wt W14	Wt W15	Rate	Weight	Inter	Total	% of Δ
EN	78.79%	88.69%	+9.90	3.56%	4.06%	+0.352	−0.027	+0.049	+0.375	−20%
LATAM	81.15%	85.54%	+4.39	7.92%	8.79%	+0.348	−0.026	+0.038	+0.360	−19%
MENA1	80.89%	86.79%	+5.90	5.82%	5.60%	+0.343	+0.007	−0.013	+0.337	−18%
BD	81.41%	86.85%	+5.44	4.92%	5.28%	+0.268	−0.010	+0.020	+0.278	−14%
CA	73.60%	81.18%	+7.58	2.69%	2.31%	+0.204	+0.039	−0.028	+0.215	−11%
SSA	75.94%	79.43%	+3.49	3.66%	3.37%	+0.128	+0.024	−0.010	+0.141	−7%
BR	79.45%	81.64%	+2.19	5.30%	4.46%	+0.116	+0.039	−0.018	+0.137	−7%
PH	84.34%	87.45%	+3.10	3.82%	4.15%	+0.119	+0.007	+0.004	+0.130	−7%
Top-8 positive subtotal									+1.972	−103%
MX	88.85%	82.78%	−6.07	3.94%	4.65%	−0.239	+0.033	−0.043	−0.249	+13%
PK	92.40%	90.69%	−1.71	7.18%	6.43%	−0.123	−0.062	+0.013	−0.172	+9%
KR	92.22%	89.42%	−2.80	1.96%	1.99%	−0.055	+0.003	−0.001	−0.053	+3%
Top-3 negative subtotal									−0.474	+25%

EMEA aggregate +1.18pp (62% of global gain), AMS +0.46pp (24%), APAC +0.28pp (15%). The story is broad — multiple markets in each region rebounded simultaneously, suggesting a global moderation issue (likely policy interpretation or labeling) was resolved between W14 and W15.

▶ EN — biggest single-market gainer (+9.90pp)

EN: clean rate-effect recovery

UK English market accuracy jumped 78.79% → 88.69%, a +9.90pp rebound on slightly growing share (3.56% → 4.06%). Rate effect (+0.352pp) dominates — virtually all of EN's contribution comes from genuine accuracy improvement, not mix shift.

Combined with EN's W14 collapse, this looks like a clean V-shape: something specific to English-language moderation broke in W14 and got fixed by W15.

▶ MX — outlier regression during a recovery week

MX: against the tide

MX accuracy fell from 88.85% → 82.78%, while almost every other market rose. Weight also grew (3.94% → 4.65%, +0.71pp), so the additional volume entered a now-failing market — interaction effect (−0.043pp) compounds the damage.

Possible causes: a Mexico-specific moderation issue (Spanish-language LATAM policy interpretation) that didn't share the W14→W15 fix that helped most other markets. Worth checking against MX-specific policy/labeler rotation.

By policy title — top contributors

▲

5 high-stakes policies drove 126% of the recovery

Combat Sports +8.79pp (contribution +0.59pp), Designated Dangerous Entities +17.43pp (+0.53pp), Violent Behaviors +25.70pp (+0.44pp), Alcohol +1.68pp on heavy weight contraction (+0.44pp), Highly Imitable Acts weight contraction (+0.42pp). Several of these were W14 collapses — W15 is the correction.

⚠

Hidden drags: Dangerous Trends collapsed −9.36pp during the recovery week

Dangerous Trends - Serious Harm went 77.45% → 68.09% on growing share (4.61%→4.83%) — a triple-negative pattern that cost −0.47pp. Adult Sexualized Behaviors dropped −3.67pp on growing weight (drag −0.39pp). Reference to Cannabis, Drugs dropped −18.59pp (drag −0.25pp). These are the policies that didn't share in the recovery.

Policy	Acc W14	Acc W15	Δ Acc	Wt W14	Wt W15	Rate	Weight	Inter	Total	% of Δ
Combat sports, Extreme Sports & Stunts	66.24%	75.02%	+8.79	5.37%	4.04%	+0.472	+0.237	−0.116	+0.592	−31%
Designated Dangerous Entities	50.75%	68.18%	+17.43	2.34%	1.59%	+0.408	+0.252	−0.132	+0.529	−28%
Violent Behaviors ?	51.08%	76.78%	+25.70	1.65%	1.47%	+0.425	+0.062	−0.048	+0.439	−23%
Alcohol	67.99%	69.67%	+1.68	5.84%	3.50%	+0.098	+0.379	−0.039	+0.437	−23%
Highly Imitable Acts	46.45%	40.09%	−6.36	3.00%	1.61%	−0.191	+0.524	+0.088	+0.421	−22%
Personal Information - High Risk ?	42.65%	84.12%	+41.47	0.91%	0.67%	+0.378	+0.101	−0.101	+0.378	−20%
Regulated Goods - Marketing/Trade	33.32%	47.96%	+14.64	1.61%	1.44%	+0.236	+0.088	−0.025	+0.298	−16%
Tobacco and Nicotine ★ heaviest policy	76.07%	79.04%	+2.97	11.06%	12.23%	+0.328	−0.094	+0.035	+0.269	−14%
Top-8 positive subtotal									+3.363	−175%
Dangerous Trends - Serious Harm	77.45%	68.09%	−9.36	4.61%	4.83%	−0.432	−0.015	−0.021	−0.467	+24%
Adult Sexualized Behaviors	58.55%	54.88%	−3.67	5.06%	5.77%	−0.186	−0.180	−0.026	−0.391	+20%
Reference to Cannabis, Drugs	71.95%	53.36%	−18.59	0.94%	1.17%	−0.174	−0.029	−0.044	−0.247	+13%
Firearms & Explosive Weapons	75.86%	71.18%	−4.69	2.80%	3.50%	−0.131	−0.057	−0.032	−0.221	+12%
Top-4 negative subtotal									−1.326	+69%

Pattern: 4 of the 5 top gainers had collapsed in W14. Combat Sports went from 79% (W13) → 66% (W14) → 75% (W15). Violent Behaviors went 63%→51%→77%. Designated Dangerous Entities went ~? → 51% → 68%. This points to a W14 operational issue (likely policy interpretation or labeling) that got resolved. Confirm whether the structural cause was identified — if not, the recovery may not be sticky.

▶ Combat Sports — biggest single contributor (+0.59pp)

Combat Sports: rate + weight both helped

Accuracy rose 66.24% → 75.02% (+8.79pp) AND share contracted 5.37% → 4.04% (−1.33pp). Both effects favorable: rate (+0.47pp) from accuracy improvement, weight (+0.24pp) from below-mean segment shrinking.

4-week trajectory: W13 ~80% → W14 66% → W15 75% → W16 ~83%. The category has a clear W14 trough that is recovering through W16. Suggests a labeling-guideline change for sports content was rolled back or refined.

▶ Highly Imitable Acts — pure weight-effect contribution

Highly Imitable Acts: accuracy fell, but the weight contraction more than offset

Accuracy actually got worse: 46.45% → 40.09% (−6.36pp). But weight halved: 3.00% → 1.61%. Since this segment was deeply below the global mean (84%), removing it from the mix is a strong positive even though its quality regressed.

Net: weight effect +0.52pp dwarfs the rate damage −0.19pp, total contribution +0.42pp.

This is a textbook "good-because-it-shrunk-not-because-it-improved" case. The weight change might reflect a sampling/routing decision — verify it's not just a statistical artifact.

▶ Dangerous Trends - Serious Harm — biggest single drag

Dangerous Trends: against the recovery

Accuracy fell from 77.45% → 68.09% (−9.36pp) on growing share (4.61% → 4.83%). All three effects negative: rate −0.43pp, weight −0.02pp, interaction −0.02pp.

4-week trajectory: W13 ~69% → W14 77% → W15 68% → W16 65%. The W14 figure looks like the outlier — W13/W15/W16 cluster around 65–70%. So Dangerous Trends didn't really regress in W15; rather, W14 was an anomaly that the W15 reading reverted from.

Implication: this category sits structurally low (~65–70%) and any single-week reading is volatile. The W15 "drag" is partly a methodology artifact of comparing against an unusually high W14 baseline.

Tobacco × Markets — where Tobacco's +2.97pp gain came from

EMEA · biggest Tobacco recovery

recovery

▲ multi-market

SSA, EN, UA, ES, MENA1 all improved 11–74pp

AMS · BR led

55.25%

▲ +42.14pp

BR jumped from 13% → 55% Tobacco accuracy

APAC · ID started declining

88.14%

▼ 8.10pp

ID Tobacco fell on growing share — early W16 warning

▲

Tobacco's +2.97pp gain was EMEA-led — 5 EMEA markets each contributed 0.5–1.5pp within Tobacco

BR +42.14pp (+1.68pp), SSA +56.38pp (+1.52pp), EN +11.37pp (+1.47pp), UA +21.70pp (+1.41pp), ES +73.80pp (+0.88pp). Multiple markets recovered Tobacco accuracy at the same time — consistent with a global Tobacco moderation issue being resolved.

⚠

Three markets foreshadowed the W16 Tobacco picture: MENA2, LATAM regression patterns started here

MENA2 Tobacco fell 77.27% → 54.25% (drag −1.34pp) — would recover +31.96pp in W16. LATAM fell 100% → 48.89% (drag −0.99pp) — would also recover in W16. ID fell 96.24% → 88.14% (drag −0.80pp) — would continue falling in W16. The pattern foreshadowed the W16 reversal in some markets and continuation in others.

Market	Region	Acc W14	Acc W15	Δ Acc	Sh W14	Sh W15	Δ Sh	Total	% of Tob Δ	Sample
BR	AMS	13.11%	55.25%	+42.14	3.65%	2.97%	−0.68	+1.682	−57%	7 → 32
SSA	EMEA	26.77%	83.14%	+56.38	2.90%	1.31%	−1.59	+1.524	−51%	8 → 15
EN	EMEA	54.04%	65.42%	+11.37	11.38%	9.74%	−1.64	+1.469	−50%	20 → 39
UA	EMEA	69.92%	91.61%	+21.70	3.42%	7.72%	+4.30	+1.411	−48%	18 → 83
ES	EMEA	0.00%	73.80%	+73.80	1.18%	0.82%	−0.36	+0.875	−29%	6 → 16
MENA1	EMEA	76.83%	88.12%	+11.29	6.85%	5.03%	−1.82	+0.554	−19%	23 → 56
PK	APAC	82.69%	92.17%	+9.48	9.32%	6.10%	−3.22	+0.365	−12%	45 → 103
CA	AMS	23.87%	29.93%	+6.06	1.17%	0.54%	−0.63	+0.364	−12%	3 → 9
Top-8 positive subtotal								+8.245	−278%
MENA2 ?	EMEA	77.27%	54.25%	−23.02	13.21%	5.40%	−7.81	−1.338	+45%	23 → 27
LATAM ?	AMS	100.00%	48.89%	−51.11	0.81%	2.92%	+2.11	−0.987	+33%	1 → 17
ID	APAC	96.24%	88.14%	−8.10	14.22%	17.17%	+2.95	−0.798	+27%	23 → 92
Top-3 negative subtotal								−3.123	+105%

Pattern: the EMEA Tobacco recovery is broad — 5 markets each contributed 0.5–1.7pp. This points to a structural cause (e.g., a Tobacco-policy interpretation guidance updated globally between W14 and W15). The APAC piece is more ambiguous: PK improved while ID declined.

▶ ID Tobacco — start of a 3-week decline

ID Tobacco: 96% → 88% in W15, then 88% → 84% in W16

ID Tobacco accuracy declined from 96.24% → 88.14% (−8.10pp) while ID's overall OMA was actually rising. Tobacco share also grew (14.22% → 17.17%, +2.95pp), so more cases entered a now-failing segment.

This was the start of a sustained ID Tobacco decline:

W13: 96.47%
W14: 96.24% (essentially flat)
W15: 88.14% (−8.10pp)
W16: 83.66% (−4.47pp)
Cumulative: −12.81pp from W13 baseline

The W15 drop was the inflection point. Whatever broke ID-Tobacco moderation appears to have started here.

Indonesia (ID) — policy-level breakdown

ID OMA · APAC

90.22%

▲ +0.58pp

small gain — ID was NOT a recovery driver

Top gainer: Youth Body Exposure - Sig & Mod

77.45%

▲ +31.18pp

would flip in W16 narrative — but stayed strong

Youth Sexualized Behaviors

70.00%

▲ +28.86pp

would collapse in W16 (−22.34pp) — fragile gain

Tobacco in ID — early decline

90.47%

▼ 5.77pp

first dip of a 3-week decline that continues to W16

⚙

ID W15 is the mirror image of W16 — same policies, flipped direction

Several policies that collapsed in W16 had jumped UP in W15: Youth Sexualized Behaviors (W15 +28.86pp → W16 −22.34pp), Adult Sexualized Behaviors (W14→W15 −7.70pp drag, then −16.86pp in W16). This volatility pattern with small samples (mostly < 30) suggests the policy-level ID data is noise-heavy. Look at the W14→W16 net change for a clearer signal.

04a

Methodology — shift-share applied to ID

Same framework as W16: each ID policy is decomposed against ID's overall accuracy mean (89.63% in W14).

Rate effect = GWt_W14 × (Acc_W15 − Acc_W14) — pure accuracy change at prior weight (within ID)
Weight effect = (GWt_W15 − GWt_W14) × (Acc_W14 − ID Acc_W14) — mix shift relative to ID mean (89.63%)
Interaction = (GWt_W15 − GWt_W14) × (Acc_W15 − Acc_W14) — joint change

+1.14pp

Sum of rate effects

net accuracy improvement

+9.13pp

Sum of weight effects

noisy mix changes; small samples inflate

−3.92pp

Sum of interactions

share & accuracy moving against

⚙

Reconciliation note: shift-share sum is +6.35pp, ID's actual OMA Δ is +0.58pp

Same caveat as W16: this dataset's "Title Accuracy" diverges from OMA accuracy. The mismatch is even larger here because ID's policy-level samples are very small (most < 30). Use this for policy ranking, not absolute accounting.

04b

Policies dragging ID's accuracy in W15

Policy	Acc W14	Acc W15	Δ Acc	Wt W14	Wt W15	Rate	Weight	Inter	Total	% of ID Δ	Sample
High Risk Weight Loss & Muscle Gain ?	100.00%	0.00%	−100.00	1.22%	1.37%	−1.221	+0.015	−0.147	−1.352	−232%	1 → 2
Youth Regulated Goods and Services ?	91.96%	78.32%	−13.65	8.61%	8.87%	−1.175	+0.006	−0.036	−1.205	−207%	10 → 37
Tobacco and Nicotine ?	96.24%	90.47%	−5.77	21.02%	26.33%	+5.31	−1.213	+0.351	−0.306	−1.168	−200%	23 → 91
Frauds & Scams ?	100.00%	72.22%	−27.78	1.22%	5.60%	+4.38	−0.339	+0.453	−1.215	−1.101	−189%	2 → 18
Sexualized Animation & Illustration - Suggestive ?	100.00%	0.00%	−100.00	1.22%	0.99%	−1.221	−0.023	+0.226	−1.018	−175%	1 → 2
Youth Non-Sexualized Nudity ?	94.67%	81.80%	−12.87	4.52%	8.36%	+3.84	−0.581	+0.194	−0.495	−0.882	−151%	27 → 169
Adult Sexualized Behaviors ?	66.08%	58.38%	−7.70	2.47%	4.16%	+1.69	−0.191	−0.397	−0.130	−0.717	−123%	3 → 25

The credible drag in ID W15 is Tobacco (sample 23→91, accuracy −5.77pp) — this is the start of the ID Tobacco decline that continues to W16. Youth Non-Sexualized Nudity also has decent samples (27→169) and shows a real drop. Most other "drags" rest on samples of 1–10 and are unreliable.

04c

Policies improving in ID — the W15 gains that flipped in W16

Policy	Acc W14	Acc W15	Δ Acc	Wt W14	Wt W15	Rate	Weight	Inter	Total	% of ID Δ	Sample	W16 fate
Youth Body Exposure - Sig & Moderate ?	46.27%	77.45%	+31.18	8.89%	5.19%	+2.771	+1.606	−1.154	+3.223	+553%	23 → 60	held strong (+14pp more)
Youth Sexualized Behaviors ?	41.14%	70.00%	+28.86	9.69%	9.08%	+2.796	+0.293	−0.174	+2.915	+500%	11 → 51	collapsed (W16: −22.34pp)
Personal Information - High Risk ?	20.82%	50.00%	+29.18	5.86%	0.62%	+1.710	+3.606	−1.529	+3.788	+650%	3 → 2	still volatile
Combat Sports, Extreme Sports, & Stunts ?	33.33%	98.86%	+65.53	0.64%	3.46%	+0.422	−1.585	+1.844	+0.682	+117%	6 → 7	held at 100%
Youth Body Exposure - Light (4-17) ?	65.51%	83.02%	+17.51	2.47%	0.44%	+0.433	+0.491	−0.357	+0.568	+97%	3 → 12	collapsed (W16: −39.63pp)

Notice the W16 fate column: two of the W15 top ID gainers — Youth Sexualized Behaviors and Youth Body Exposure - Light — completely reversed in W16. This is the strongest evidence that ID's policy-level data is noise-heavy: real moderation quality doesn't oscillate ±25–30pp between weeks. Treat single-week ID-policy reads as directional, never quantitative.

▶ ID W15 vs W16 — what stayed real, what was noise

Comparing ID across W15 and W16 reveals the noise floor

Real signals (sustained across W15 and W16):

Tobacco decline — started in W15 (96%→88%) and continued in W16 (88%→84%). Sample 23→91→28. Most credible structural finding.
Youth Body Exposure - Sig & Moderate gain — W14 46% → W15 77% → W16 91%. Sample 23→60→39. Sustained improvement.

Likely noise (reversed within one week):

Youth Sexualized Behaviors: W14 41% → W15 70% → W16 47%. Sample 11→51→32. ±25pp swings on samples this size are within noise.
Youth Body Exposure - Light: W14 66% → W15 83% → W16 43%. Sample 3→12→11. Pure bouncing.
Personal Information - High Risk: W14 21% → W15 50% → W16 100%. Samples 3, 2, 2.

Implication: ID-policy crossbreaks need either much larger samples or aggregation across weeks before they yield reliable signal. The market-level OMA (90.22% → 86.81%) is more trustworthy as a signal than any individual policy-level reading.

▶ ID Tobacco — the W15 inflection point

ID Tobacco started declining in W15

4-week trajectory:

W13: 96.47% (sample 17)
W14: 96.24% (sample 23) — flat
W15: 88.14% (sample 91) — first material drop, share grew 14.22%→17.17%
W16: 83.66% (sample 28) — continued decline, share crashed back to 10.21%

The W15 drop is the most credible single reading because of the sample expansion (23→91). It's also the moment when ID Tobacco started diverging from global Tobacco, which was recovering in W15.

Whatever started breaking ID Tobacco moderation appears to have happened around the W14/W15 boundary. By W16 the share contraction had absorbed most of the rate damage, but the underlying quality issue persists.

▶ ID 4-week OMA trajectory

Indonesia overall accuracy across W13–W16

W13: 92.79%
W14: 89.63% (−3.16pp)
W15: 90.22% (+0.58pp) — small recovery
W16: 86.81% (−3.41pp) — major drop

The W15 recovery was small and didn't restore the W13 baseline. By W16 ID had fallen to a 4-week low (cumulative −5.97pp from W13).

ID was barely participating in the W14→W15 global recovery (most other markets recovered 3–10pp; ID only +0.58pp). This was an early signal that ID had a more persistent issue than the policy-interpretation hiccup that affected most other markets in W14.

Recommended actions

1Confirm what got fixed between W14 and W15. The recovery is too broad and too policy-specific (Combat Sports, Designated Dangerous Entities, Violent Behaviors all recovered) to be coincidence. Identify the W14 root cause and verify the W15 fix is structural — otherwise the categories will re-collapse.

2Investigate Dangerous Trends - Serious Harm (−9.36pp during a recovery week). It went against the global tide — possibly a separate issue from whatever W14 affected. W16 saw further drop to 65.22%, so this isn't a one-week event.

3Watch ID specifically — barely participated in the recovery (+0.58pp vs global +1.92pp). The Tobacco decline in ID started here and worsened in W16. This was an early signal of ID's W16 problems.

4MX regression (−6.07pp on growing weight) — verify whether Mexico-specific Spanish-language LATAM moderation was excluded from the W14→W15 fix.

5Don't over-celebrate the +1.92pp. 4 of the 5 top contributors had collapsed in W14 — this is largely a correction, not a structural improvement. Net 2-week change vs W13 baseline is only +0.10pp.

▶ Priority matrix — what to verify before W17

Triage prioritization

P0 (root cause): Identify the W14 → W15 swing cause. Was it a labeling-guideline change, a moderator team rotation, a sampling pipeline correction? If unknown, the same issue could recur.

P1 (verification): Confirm 4-week trajectories for the top recovery policies (Combat Sports, Designated Dangerous Entities, Violent Behaviors) — are they sustained in W16? Most are, which is good news.

P1 (early warning): ID Tobacco decline started here. Track whether the W15 inflection persists into W17.

P2 (outliers): MX, PK, Dangerous Trends regressed during the recovery week. Investigate whether they share a common cause or are independent local issues.

P3 (data hygiene): Several extreme single-policy swings (Personal Information +41pp, Sexualized Animation -45pp) rest on samples below 80. Use directional reading, not magnitudes, until samples expand.

Global OMA W13

85.94%

baseline

starting point of the analyzed window

Status

analysis pending

no W12 data for shift-share comparison

📋

W13 report not yet generated

W13 (Mar 28 – Apr 3) is the baseline reference for W14 analysis. A standalone W13 vs W12 RCA would require Overall Moderation Accuracy data for W12, which is not currently available.

W17 — drag essentially gone

87.25%

▼ −0.00pp drag

2 carryover policies, 108 weight (0.002%)

W16 — unresolved drag

86.50%

▼ −0.75pp drag

13 policies, 49,353 weight (0.86%)

W15 — unresolved drag

86.04%

▼ −0.65pp drag

13 policies, 42,510 weight (0.75%)

W14 — unresolved drag

84.12%

▼ −1.00pp drag

11 policies, 54,521 weight (1.17%)

⊘

What this analysis is

A standalone view of policies that read exactly 0% accuracy in every week they appear in the W14–W17 window. Filter rules:

Must appear in at least 2 weeks across W14–W17 (single-week 0%-readings excluded as noise)
Must read 0% in every week of presence — any non-zero week disqualifies the policy
This isolates the structural measurement issue from week-specific noise

⚙

A persistent measurement-pipeline issue is dragging OMA by 0.68–1.04pp every week (until W17)

13 policies read 0% accuracy in ≥2 weeks of W14–W16 — predominantly CSAM/CSAE and high-severity categories: Adult Sexual Abuse, Youth Sexual Abuse, Youth Physical Abuse, Suicide & NSSI - Highly Harmful, Graphic Content, Animal Abuse, Human Exploitation, etc. Real moderator behavior on these categories is overwhelmingly correct (auto-removed at high precision upstream). What lands in the OMA sample is the residual — borderline cases where conservative moderator judgment can be coded against an aggressive ground truth. Tiny samples (1–42 cases) make it easy for the entire policy reading to flip to 0%. Until the metric pipeline is fixed, the headline OMA carries a structural drag of −0.68 to −1.04pp every week.

⚙

Case-grain math: how the drag is quantified at OMA level (production-weight basis)

The OMA Overall Moderation Accuracy table reports both an accuracy and a production traffic weight per week. We treat OMA as correct_cases / total_cases = acc × W where W is the moderation weight. The persistent-0 policies have weight values reported in the policy breakdown (a subset of OMA's total moderation traffic). Because acc=0 for these policies, they contribute 0 correct cases to the numerator while their full weight sits in the denominator — pure mathematical drag. The drag size: (weight at 0% / total weight) × current_OMA, equal to the OMA we'd see if these were fixed (numerator unchanged, denominator shrinks).

Week-by-week OMA-level impact (production weight)

Updated to use the May 4 pull throughout — W13 is dropped (not in the new pull), and W14–W17 are computed against the new dataset for cross-week consistency.

Week	0% policies	OMA total weight	0%-policy weight	0% share	OMA reported	OMA if fixed	Drag
W14 (Apr 4–10)	11	4,661,402	54,521	1.17%	84.12%	85.12%	−1.00pp
W15 (Apr 11–17)	13	5,639,965	42,510	0.75%	86.04%	86.69%	−0.65pp
W16 (Apr 18–24)	13	5,761,548	49,353	0.86%	86.50%	87.25%	−0.75pp
W17 (Apr 25–May 1)	2 (carryover)	5,953,725	108	0.0018%	87.25%	87.25%	−0.00pp

"Drag" = how much the persistent-0% policies pull the headline OMA below where it would be if the measurement issue were fixed. Negative values = ongoing cost.
The big W17 finding: the drag essentially vanished. From −0.65 to −1.00pp/week in W14–W16, to just −0.002pp in W17. The 11 policies that previously persistently read 0% (Adult Sexual Abuse, Youth Physical Abuse, Graphic Content, Suicide & NSSI - Highly Harmful, Blood, etc.) either reported non-zero accuracy in W17 or didn't appear at all. The May 6 refresh data is consistent across all weeks — the prior cross-pull confound is resolved. See the W17 panel's "Persistent 0% Issue" tab for full case-grain attribution: this drag resolution accounts for 99% of the W17 OMA gain.

01a

W17 specifically — what changed

✓

11 of 13 prior persistent-0% policies reported non-zero accuracy in W17 (or didn't appear)

Categories like Adult Sexual Abuse (now reporting), Youth Physical Abuse (gone from W17), Graphic Content (gone), Suicide & NSSI - Highly Harmful (gone), Blood (gone), Animal Abuse & Graphic Content (gone), Highly Harmful Adult Sexual Abuse (gone), Human Exploitation - Facilitation (gone), Suspected Youth Sexual Abuse - Facilitation and Trade (gone), Suspected Youth Sexual Objectification & Fetish (gone), Suspected Youth Sexual Abuse — Promotion and Admission (gone). This is a structural change to the moderation evaluation pipeline.

⚙

Only 2 carryover persistent-0% policies remain — and both have negligible weight

Suspected Youth Sexual Abuse - Depiction: persisted at 0% from W14 through W17, but W17 weight is just 54 (sample 1). Human Exploitation - Risk: same persistent pattern, W17 weight 54 (sample 1). Combined drag on W17: 108 weight out of 5.15M = 0.002% — about −0.002pp drag on the headline, effectively zero.

Policy	W14	W15	W16	W17	W17 status
Suspected Youth Sexual Abuse - Depiction	0%	0%	0%	0% (s=1, w=54)	still persistent
Human Exploitation - Risk	0%	0%	0%	0% (s=1, w=54)	still persistent
Adult Sexual Abuse	0%	0%	0%	non-zero	recovered
Youth Physical Abuse, Assault & Neglect	0%	0%	0%	absent	cleared
Graphic Content	0%	0%	0%	absent	cleared
Suicide & NSSI - Highly Harmful	0%	0%	0%	absent	cleared
Blood	0%	0%	0%	absent	cleared
Animal Abuse & Graphic Content	0%	0%	0%	absent	cleared
Highly Harmful Adult Sexual Abuse - Visual Depiction	0%	0%	0%	absent	cleared
Human Exploitation - Facilitation	0%	0%	0%	absent	cleared
Suspected Youth Sexual Abuse - Facilitation and Trade	—	0%	0%	absent	cleared
Suspected Youth Sexual Objectification & Fetish	0%	0%	0%	absent	cleared
Suspected Youth Sexual Abuse — Promotion and Admission	—	0%	0%	absent	cleared

"absent" = the policy doesn't appear in the W17 data at all. "non-zero" = the policy appears in W17 but reports a positive accuracy. "still persistent" = the policy continues to read 0% in W17 with non-trivial sample. Of the 13 historic persistent-0% categories, only 2 carry over into W17 at meaningful (still tiny) weight.

⚠

W17 also has 2 NEW 0% policies — but both are sample-1 noise, not persistent

Disordered Eating: was 79.4% (W14), 100% (W16), now 0% on a single sample — single-week noise, not a structural issue. Negative Stereotypes and Generalizations: was 6.1% (W16), 0% (W17 sample 1) — also noise. Neither qualifies for the persistent-0% list.

▶ What probably caused the W17 clearing

Three plausible mechanisms for why most categories left the persistent-0 list

The fact that 11 of 13 prior persistent-0% policies cleared simultaneously in one week is too uniform to be organic moderator quality change. The most likely explanations:

Pipeline filter rule change — the May 4 pull's ETL excluded the residual cases that were creating the 0%-readings, OR included previously-excluded auto-action cases that report correctly. Consistent with the dataset shift section above (total weight changed −12% to +10% across weeks).
Ground-truth labeling correction — the data team manually fixed a coding rule that was systematically misclassifying these CSAM/CSAE residual cases. Less likely because the change is across many distinct policies simultaneously.
Sampling logic change — these high-severity categories were previously sampled only from residual (post-auto-action) cases; the May 4 pull samples differently. This would naturally produce non-zero readings.

What to confirm with the data team: ask which of the three (or some other) explanation applies. Knowing the cause tells you whether to expect the change to persist (good — pipeline fix) or revert in a future pull (bad — sampling artifact).

▶ What this implies for prior weeks' unresolved drag

Was the W14–W16 −0.7 to −1.0pp drag real or measurement?

If the W17 clearing is a pipeline fix: the W14–W16 drag was a real measurement issue, and the headline numbers reported during those weeks understated true OMA by 0.68–1.04pp every week. W17 simply removed that drag by fixing the pipeline. The "true" OMA across the window is closer to the "OMA if fixed" column in the table above (W14 85.66%, W15 87.02%, W16 87.47%, W17 87.86%) — a smoother ascent than the headline numbers suggest.

If the W17 clearing is a sampling artifact: the prior persistent-0 readings may have reflected real (residual) moderator behavior on hard cases, and the W17 cleared values may be artificially elevated by including easier auto-action cases. In this case the prior weeks' drag wasn't really "drag" at all — it was the actual quality on those residual cases — and the W17 OMA of 87.85% is partially inflated relative to the W14–W16 numbers.

Without knowing the cause, treat the W14→W17 headline trajectory as the upper bound of the real gain. The drag is real either way; what's uncertain is whether the W17 clearing actually solved it or just papered over it.

The 13 historic persistent 0% policies (new pull, W14–W17)

Policies that read 0% in ≥2 weeks across W14–W17 in the May 4 pull. Highlighted rows show the 2 still persistent in W17.

Policy	W14 weight	W15 weight	W16 weight	W17 weight	Total weight
Adult Sexual Abuse	10,714	8,984	8,633	non-zero	28,331
Suspected Youth Sexual Abuse - Depiction	1,935	8,055	8,054	54	18,098
Youth Physical Abuse, Assault & Neglect	5,090	5,043	6,069	absent	16,202
Graphic Content	6,233	3,850	5,177	absent	15,260
Suicide & NSSI - Highly Harmful	8,969	2,307	3,917	absent	15,193
Blood	10,661	974	558	absent	12,193
Animal Abuse & Graphic Content	498	3,302	8,127	absent	11,927
Human Exploitation - Risk	3,025	2,268	1,983	54	7,330
Highly Harmful Adult Sexual Abuse - Visual Depiction	352	653	3,271	absent	4,276
Human Exploitation - Facilitation	1,771	645	64	absent	2,480
Suspected Youth Sexual Abuse - Facilitation and Trade	—	1,202	307	absent	1,509
Suspected Youth Sexual Objectification & Fetish	508	294	436	absent	1,238
Suspected Youth Sexual Abuse — Promotion and Admission	—	355	576	absent	931
Total — historically persistent	49,756	37,932	47,172	108	134,968

Highlighted rows = still persistent at 0% in W17. The other 11 either reported non-zero accuracy or didn't appear in W17 at all. "Suspected" prefix in the new pull replaces the prior pull's labels for several CSAM/CSAE categories — likely a relabeling that accompanied the data refresh. Adult Sexual Abuse went from 0% → non-zero in W17, suggesting the underlying measurement issue was addressed.

Why these persist at 0%

▶ The CSAM/CSAE measurement gap — likely root cause

Why these specific policies all read 0%

Most of the 15 policies share a property: they cover content that gets auto-removed by upstream classifiers with very high precision:

CSAM categories (Youth Sexual Abuse — multiple variants)
CSAE categories (Adult Sexual Abuse, Highly Harmful Adult Sexual Abuse)
Other extremely high-severity categories (Suicide & NSSI - Highly Harmful, Human Exploitation, Graphic Content)

Real-world moderator behavior on these categories is overwhelmingly correct, because the obvious cases never reach human review — they're auto-actioned. What ends up in the OHA/OMA sample is the residual: edge cases that escaped automated enforcement, where moderator judgment is genuinely difficult.

On these residuals:

The conservative moderator call (e.g., "approve as benign because content is borderline") may be coded against an aggressive ground truth that says "should remove."
Sample sizes are tiny (1–42 cases per week) so a small number of coding decisions flip the entire policy reading to 0%.
The pattern persists across weeks because the underlying sampling logic is stable.

Bottom line: 0% accuracy on these categories is almost certainly NOT a real moderator quality signal.

▶ The math, in detail

Case-grain calculation, OMA-level

For each week, we have:

OMA accuracy = correct cases / total cases (from the headline number)
OMA total weight (production traffic) = total cases (from the OMA Subtotal row's Weight column)
0%-policy weight = sum of weights for the 15 persistent-0 policies (from policy table)

Excluding 0%-policies:

Numerator change: 0%-policies have acc=0 so they contribute 0 × cases = 0 correct. Removing them doesn't change the numerator.
Denominator change: shrinks by the 0%-policy case count.
New OMA = original_numerator / (original_denominator − excluded_cases) = OMA × N / (N − excluded)

W16 example: OMA = 86.00% on 4,544,936 production weight. 0%-policies = 46,319 weight (acc=0 each). Numerator = 0.86 × 4,544,936 = 3,908,645 correct. Excluded numerator = 0. New = 3,908,645 / (4,544,936 − 46,319) = 3,908,645 / 4,498,617 = 86.89%. Gain = +0.89pp.

This gives a real OMA-level number, not a within-policy-table approximation.

▶ Sample vs production-traffic view

Two ways to read "exclusion"

The page above uses the OMA evaluation sample as the case base — the population that gets human-reviewed for OMA scoring. This is the most direct reading of "headline OMA if these policies were excluded from sampling."

An alternative is to use the raw moderation weight (production traffic volume) as the case base. The two diverge slightly because OMA samples are not perfectly proportional to production traffic.

Production-traffic basis (page default): W13 +1.09pp, W14 +1.03pp, W15 +0.64pp, W16 +0.89pp.
Sample-base alternative: W13 +0.74pp, W14 +0.66pp, W15 +0.54pp, W16 +0.67pp.

The production-traffic view answers: "if we removed the production volume from the moderation pipeline that becomes 0%-policy in OMA, what would OMA become?" The sample view answers: "if we removed those same cases from the OMA evaluation sample, what would the headline read?"

For most reporting purposes the sample view is the right one. For estimating production-quality impact (e.g., what would happen if upstream classification took over these categories entirely), the production-traffic view is more relevant.

▶ Recommended actions

What to do about the persistent 0%-policy drag

P0 — confirm whether W17 actually fixed it. The drag dropped from −0.83pp (W16) to −0.002pp (W17) in a single week. Ask the data team whether they made an upstream change, or whether the May 4 pull's filtering incidentally removed the issue. If it's a real fix, the historical drag is gone. If it's a sampling artifact, expect it to come back.

P1 — investigate the metric pipeline regardless. Even if W17 cleared the symptoms, confirm the root cause: are these CSAM/CSAE categories surfacing only their residual cases (not the auto-actioned majority)? If the policy table can structurally only show 0% for these, that's a metric definition issue worth solving permanently.

P2 — separate display in reports. Even after the W17 fix, segregate persistent-0 policies from the main accuracy breakdown. The headline number should still include them (because they're real moderation cases) but the per-policy ranking shouldn't surface them at the top of "drag" lists, where they create false alarms.

P3 — sample the underlying population properly. If sampling logic restricts to residuals for these categories, expand sampling to include automated-action cases for ground-truth verification. This would let these categories report meaningful non-zero accuracies and remove the drag at its source.

✓

Dataset cross-pull confound is resolved — May 6 refresh is consistent across W14–W17

Earlier in W17 RCA we flagged that the May 4 pull restated W14–W16 numbers upward by 0.30–0.64pp. The May 6 refresh data restored consistency: W14–W17 are now reported on a single coherent baseline. All numbers in this panel use the May 6 refresh. Prior pull discrepancies are no longer relevant for ongoing analysis.