Project: Okul Database · Hub: Okul Database — Guides

Common Gotchas & Pitfalls


1. Soft Delete Missing — Silent Data Pollution ⚠️

Problem: Nearly every table has a deleted_at column. Records are never physically deleted. Forgetting the filter silently includes deleted records.

-- WRONG — includes deleted schools
SELECT COUNT(*) FROM schools;   -- returns ~8000
 
-- RIGHT
SELECT COUNT(*) FROM schools WHERE deleted_at IS NULL;  -- returns ~5000

Affected tables (non-exhaustive): schools, school_filters, leads, users, customers, customer_agreements, payments, school_comments, school_interactions, campuses, media, discounts, articles.

Rule: Add WHERE deleted_at IS NULL as your first instinct on any query.


2. Using schools Instead of school_filters

Problem: schools is normalized. To get city, school type, product, and engagement data requires joining 10+ tables. school_filters is a pre-computed denormalized table with 100+ columns.

-- WRONG — slow, complex, easy to get wrong
SELECT s.name, st.type, l.name as city, p.name as product_name,
       COUNT(le.id) as leads
FROM schools s
JOIN schools_types st ON st.id = s.school_type_id
JOIN locations l ON l.id = s.city_id
JOIN school_product_prices spp ON spp.school_id = s.id
JOIN products p ON p.id = spp.product_id
LEFT JOIN leads le ON le.school_id = s.id AND le.deleted_at IS NULL
WHERE s.deleted_at IS NULL
GROUP BY s.id
 
-- RIGHT — one table, same result
SELECT name, school_type_name, city_name, product_name, lead_count
FROM school_filters
WHERE deleted_at IS NULL

3. Test Leads Inflating Counts ⚠️

Problem: Leads with is_test = 1 are created for internal testing. They inflate all lead-based metrics significantly.

-- WRONG — includes test leads
SELECT COUNT(*) FROM leads WHERE deleted_at IS NULL;
 
-- RIGHT
SELECT COUNT(*) FROM leads
WHERE deleted_at IS NULL AND (is_test = 0 OR is_test IS NULL);

Note: is_test can be NULL for old records (before the column was added), not just 0. Use the OR is_test IS NULL form.


4. Treating Pipedrive Tables as Real-Time

Problem: pipedrive_deals, pipedrive_organizations, pipedrive_persons sync from an external CRM. There is an inherent sync lag. They may be minutes to hours behind.

For live lead status: use leads.stage_id and leads.school_answered_deal_status.
For historical deal analysis: Pipedrive tables are fine.


5. source_action Column Doesn’t Exist Before April 2026

Problem: leads.source_action was added on 2026-04-02. Any lead created before that date has NULL in this column, not a missing source.

-- WRONG — treats all NULL as "unknown source"
SELECT source_action, COUNT(*) FROM leads
WHERE deleted_at IS NULL
GROUP BY source_action;
-- NULL bucket is massive because of old records
 
-- RIGHT — only look at post-2026-04-02 leads for source_action
SELECT source_action, COUNT(*) FROM leads
WHERE deleted_at IS NULL
  AND created_at >= '2026-04-02'
  AND (is_test = 0 OR is_test IS NULL)
GROUP BY source_action;

Similarly, otp_codes.ip_address and otp_codes.user_agent were added later.


6. configs FK — Must Match BOTH type AND item_id

Problem: The configs table uses a two-column key. Different type groups can have the same item_id. Joining only on item_id returns multiple rows or wrong values.

-- WRONG — returns multiple rows because item_id=1 exists in many types
SELECT c.item_value
FROM customer_users cu
JOIN configs c ON c.item_id = cu.role_id;   -- ambiguous!
 
-- RIGHT
SELECT c.item_value
FROM customer_users cu
JOIN configs c ON c.type = 'customer_user_role'
  AND c.item_id = cu.role_id
  AND c.deleted_at IS NULL;

See ref-configs for all config type names.


7. school_answers_deal_status Uses Large item_ids (2453+)

Problem: Most lookup item_ids are small integers (1-50). The school_answered_deal_status config group uses ids like 2453, 2454… This surprises people comparing to stage_ids.

-- stage_id (Pipedrive) = small numbers: 1, 6, 11, 14...
-- school_answered_deal_status = large numbers: 2453, 2454...
-- They're different fields, don't confuse them
 
-- Enrolled = stage_id = 11 (Pipedrive)
-- Enrolled = school_answered_deal_status = 2453 (school confirmed)
-- Both indicate conversion but are tracked separately

See ref-lead-stages and ref-interaction-values for full value tables.


8. JSON Column Comparisons

Problem: customer_agreements.products, customer_agreements.product_arge, and notifications.options are JSON. String LIKE comparisons are unreliable.

-- WRONG — fragile, matches partial strings
SELECT * FROM customer_agreements WHERE products LIKE '%"premium"%';
 
-- RIGHT — use JSON functions
SELECT * FROM customer_agreements
WHERE JSON_SEARCH(products, 'one', 'premium', NULL, '$[*].type') IS NOT NULL;
 
-- Or extract and compare
SELECT JSON_UNQUOTE(JSON_EXTRACT(products, '$[0].name')) as first_product
FROM customer_agreements WHERE deleted_at IS NULL;

9. payments.status = 'success' is Required for Revenue

Problem: The payments table has 10 status values. Many incomplete, retried, or cancelled payments exist. Summing amount without a status filter massively overstates revenue.

-- WRONG — counts all payment rows
SELECT SUM(amount) as revenue FROM payments WHERE deleted_at IS NULL;
 
-- RIGHT — only collected payments
SELECT SUM(amount) as revenue FROM payments
WHERE deleted_at IS NULL
  AND status = 'success'
  AND paid_at IS NOT NULL;

10. school_filters vs schools — Async Lag

school_filters is updated asynchronously when school data changes. For very recent changes (< 1 min), there may be a brief discrepancy. For any analysis over days/weeks, this is irrelevant.


11. is_latest = 1 in school_interactions Means Only Latest Per School

Problem: school_interactions.is_latest marks only the single most recent interaction per school/customer combination. Filtering WHERE is_latest = 1 gives you a snapshot of current status, not history.

-- Current status per school (latest interaction)
WHERE is_latest = 1 AND deleted_at IS NULL
 
-- Full interaction history for a school
WHERE school_id = ? AND deleted_at IS NULL ORDER BY interacted_at DESC

12. users.name vs users.first_name + users.last_name

Both exist. name is the display name (may be a full name or just first name). first_name + last_name are separate fields but may be NULL for older accounts.

-- Safe name resolution
COALESCE(NULLIF(CONCAT(first_name, ' ', last_name), ' '), name) as full_name

13. school_comments.approved_at IS NOT NULL for Visible Reviews

Comments go through a moderation workflow. Only approved comments appear on the site.

-- Only publicly visible comments
WHERE deleted_at IS NULL AND approved_at IS NOT NULL

14. agreement_type NOT = contract status

customer_agreements.agreement_type is the BILLING MODEL (fixed, subscription, etc.), not the status. customer_agreements.status is the lifecycle state (active, inactive, rejected…).

Don’t confuse them.