Project: Okul Database · Hub: Okul Database — Guides
Common Gotchas & Pitfalls
1. Soft Delete Missing — Silent Data Pollution ⚠️
Problem: Nearly every table has a deleted_at column. Records are never physically deleted. Forgetting the filter silently includes deleted records.
-- WRONG — includes deleted schools
SELECT COUNT(*) FROM schools; -- returns ~8000
-- RIGHT
SELECT COUNT(*) FROM schools WHERE deleted_at IS NULL; -- returns ~5000Affected tables (non-exhaustive): schools, school_filters, leads, users, customers, customer_agreements, payments, school_comments, school_interactions, campuses, media, discounts, articles.
Rule: Add WHERE deleted_at IS NULL as your first instinct on any query.
2. Using schools Instead of school_filters
Problem: schools is normalized. To get city, school type, product, and engagement data requires joining 10+ tables. school_filters is a pre-computed denormalized table with 100+ columns.
-- WRONG — slow, complex, easy to get wrong
SELECT s.name, st.type, l.name as city, p.name as product_name,
COUNT(le.id) as leads
FROM schools s
JOIN schools_types st ON st.id = s.school_type_id
JOIN locations l ON l.id = s.city_id
JOIN school_product_prices spp ON spp.school_id = s.id
JOIN products p ON p.id = spp.product_id
LEFT JOIN leads le ON le.school_id = s.id AND le.deleted_at IS NULL
WHERE s.deleted_at IS NULL
GROUP BY s.id
-- RIGHT — one table, same result
SELECT name, school_type_name, city_name, product_name, lead_count
FROM school_filters
WHERE deleted_at IS NULL3. Test Leads Inflating Counts ⚠️
Problem: Leads with is_test = 1 are created for internal testing. They inflate all lead-based metrics significantly.
-- WRONG — includes test leads
SELECT COUNT(*) FROM leads WHERE deleted_at IS NULL;
-- RIGHT
SELECT COUNT(*) FROM leads
WHERE deleted_at IS NULL AND (is_test = 0 OR is_test IS NULL);Note: is_test can be NULL for old records (before the column was added), not just 0. Use the OR is_test IS NULL form.
4. Treating Pipedrive Tables as Real-Time
Problem: pipedrive_deals, pipedrive_organizations, pipedrive_persons sync from an external CRM. There is an inherent sync lag. They may be minutes to hours behind.
For live lead status: use leads.stage_id and leads.school_answered_deal_status.
For historical deal analysis: Pipedrive tables are fine.
5. source_action Column Doesn’t Exist Before April 2026
Problem: leads.source_action was added on 2026-04-02. Any lead created before that date has NULL in this column, not a missing source.
-- WRONG — treats all NULL as "unknown source"
SELECT source_action, COUNT(*) FROM leads
WHERE deleted_at IS NULL
GROUP BY source_action;
-- NULL bucket is massive because of old records
-- RIGHT — only look at post-2026-04-02 leads for source_action
SELECT source_action, COUNT(*) FROM leads
WHERE deleted_at IS NULL
AND created_at >= '2026-04-02'
AND (is_test = 0 OR is_test IS NULL)
GROUP BY source_action;Similarly, otp_codes.ip_address and otp_codes.user_agent were added later.
6. configs FK — Must Match BOTH type AND item_id
Problem: The configs table uses a two-column key. Different type groups can have the same item_id. Joining only on item_id returns multiple rows or wrong values.
-- WRONG — returns multiple rows because item_id=1 exists in many types
SELECT c.item_value
FROM customer_users cu
JOIN configs c ON c.item_id = cu.role_id; -- ambiguous!
-- RIGHT
SELECT c.item_value
FROM customer_users cu
JOIN configs c ON c.type = 'customer_user_role'
AND c.item_id = cu.role_id
AND c.deleted_at IS NULL;See ref-configs for all config type names.
7. school_answers_deal_status Uses Large item_ids (2453+)
Problem: Most lookup item_ids are small integers (1-50). The school_answered_deal_status config group uses ids like 2453, 2454… This surprises people comparing to stage_ids.
-- stage_id (Pipedrive) = small numbers: 1, 6, 11, 14...
-- school_answered_deal_status = large numbers: 2453, 2454...
-- They're different fields, don't confuse them
-- Enrolled = stage_id = 11 (Pipedrive)
-- Enrolled = school_answered_deal_status = 2453 (school confirmed)
-- Both indicate conversion but are tracked separatelySee ref-lead-stages and ref-interaction-values for full value tables.
8. JSON Column Comparisons
Problem: customer_agreements.products, customer_agreements.product_arge, and notifications.options are JSON. String LIKE comparisons are unreliable.
-- WRONG — fragile, matches partial strings
SELECT * FROM customer_agreements WHERE products LIKE '%"premium"%';
-- RIGHT — use JSON functions
SELECT * FROM customer_agreements
WHERE JSON_SEARCH(products, 'one', 'premium', NULL, '$[*].type') IS NOT NULL;
-- Or extract and compare
SELECT JSON_UNQUOTE(JSON_EXTRACT(products, '$[0].name')) as first_product
FROM customer_agreements WHERE deleted_at IS NULL;9. payments.status = 'success' is Required for Revenue
Problem: The payments table has 10 status values. Many incomplete, retried, or cancelled payments exist. Summing amount without a status filter massively overstates revenue.
-- WRONG — counts all payment rows
SELECT SUM(amount) as revenue FROM payments WHERE deleted_at IS NULL;
-- RIGHT — only collected payments
SELECT SUM(amount) as revenue FROM payments
WHERE deleted_at IS NULL
AND status = 'success'
AND paid_at IS NOT NULL;10. school_filters vs schools — Async Lag
school_filters is updated asynchronously when school data changes. For very recent changes (< 1 min), there may be a brief discrepancy. For any analysis over days/weeks, this is irrelevant.
11. is_latest = 1 in school_interactions Means Only Latest Per School
Problem: school_interactions.is_latest marks only the single most recent interaction per school/customer combination. Filtering WHERE is_latest = 1 gives you a snapshot of current status, not history.
-- Current status per school (latest interaction)
WHERE is_latest = 1 AND deleted_at IS NULL
-- Full interaction history for a school
WHERE school_id = ? AND deleted_at IS NULL ORDER BY interacted_at DESC12. users.name vs users.first_name + users.last_name
Both exist. name is the display name (may be a full name or just first name). first_name + last_name are separate fields but may be NULL for older accounts.
-- Safe name resolution
COALESCE(NULLIF(CONCAT(first_name, ' ', last_name), ' '), name) as full_name13. school_comments.approved_at IS NOT NULL for Visible Reviews
Comments go through a moderation workflow. Only approved comments appear on the site.
-- Only publicly visible comments
WHERE deleted_at IS NULL AND approved_at IS NOT NULL14. agreement_type NOT = contract status
customer_agreements.agreement_type is the BILLING MODEL (fixed, subscription, etc.), not the status.
customer_agreements.status is the lifecycle state (active, inactive, rejected…).
Don’t confuse them.
Related
- guide-data-scientist-quickstart — Full quick start guide
- guide-key-relationships — Relationship maps
- ref-configs — configs table lookup
- ref-lead-stages — stage_id and approved_status_id values