Proje: Okul Platform · Hub: Okul Platform — Architecture

TÜBİTAK AR-GE #7260485 — Data & AI Infrastructure Architecture

Current Stack (Baseline)

LayerTechnology
Relational DBMySQL
Full-text searchElasticsearch
FilteringRule-based (hand-crafted)
AI/MLNone

Data volumes:

  • 16,000+ school profiles
  • 900,000+ parent reviews
  • 750,000+ annual user interactions
  • 145,000+ leads/year
  • 1,200,000+ registered users

Planned Stack (AR-GE Output)

Vector Database

  • Candidates: Qdrant or Weaviate
  • Purpose: Store embeddings for school profiles, parent queries, reviews, and expert knowledge chunks
  • Use: Semantic similarity search, cross-agent shared latent space, RAG retrieval

Embedding Pipeline

  • Models: multilingual-e5-large-instruct or BAAI/bge-m3
  • Input: School profiles, veli (parent) queries, reviews, domain expert knowledge
  • Output: Dense vector representations for semantic search and dual-agent shared space

Feature Store

  • Purpose: Behavioral signal aggregation for predictive analytics (WP4)
  • Tracked events: search, filter, click, lead creation, conversion
  • Feeds into: B2B coaching agent, predictive insight models

RAG Pipeline

Architecture: Modular RAG with hybrid search

ComponentDetail
RetrievalHybrid: semantic (vector) + BM25 keyword
Re-rankingCross-encoder re-ranker
GenerationLLM with retrieved context
Hallucination control3-layer validation: source grounding NLI + consistency check + confidence scoring (Monte Carlo Dropout)

Hallucination target: ≤ 5%

Knowledge Gap Detection Pipeline

3-stage pipeline triggered when user queries cannot be answered with sufficient confidence:

  1. Intent Classification — BERT-türkçe; classifies query intent
  2. Coverage Analysis — confidence score threshold < 0.6 triggers gap flag
  3. Question Generation — LLM-based; auto-generates clarification or knowledge acquisition prompts

Performance target: F1 ≥ 0.75

Tacit Knowledge Extraction (HITL)

3-layer Human-in-the-Loop framework for converting expert tacit knowledge into structured form:

  1. Structured Interview Module — guided expert input UI
  2. Rule Extraction — derives explicit rules from expert responses
  3. Embedding-based Learning — Siamese network for learning from expert-validated pairs

Target: ≥ 80% autonomous decision rate after training

Dual-Agent System

AgentAudienceRole
B2C Counseling AgentParents (veliler)School discovery, match explanation, Q&A
B2B Coaching AgentSchools (okullar)Profile optimization, lead conversion insights
  • Both agents share a latent space (same vector DB, aligned embeddings)
  • Cross-agent cosine similarity target: ≥ 0.85
  • Cross-agent verification: agents validate each other’s outputs to reduce bias

Closed-Loop Knowledge Cycle

Every interaction feeds back into the system:

User Interaction
    → Feature Store (behavioral signal)
    → Knowledge Gap Detection (new gap identified?)
    → HITL or LLM fills gap
    → Embedding pipeline (new knowledge chunked + embedded)
    → Vector DB updated
    → Next query benefits from new knowledge

Compliance & Infrastructure

ConcernDetail
Data privacyKVKK, GDPR compliant
SecurityISO/IEC 27001, EU AI Act
MLOpsMLflow + Weights & Biases for experiment tracking
CI/CDGitHub Actions
ContainersDocker + Kubernetes
CloudAWS (EC2, S3, RDS)

Integration Strategy

All new AI components integrate via API adapters — the existing Laravel codebase is not modified directly. New services expose REST endpoints consumed by the platform.

API response target: < 2 seconds

Work Package → Component Mapping

WPComponents Built
WP1Vector DB setup, embedding pipeline, RAG foundation, Knowledge Gap Detection
WP2B2C agent, B2B agent, dual-agent shared latent space
WP3HITL framework, closed-loop knowledge cycle
WP4Feature store, behavioral analytics, predictive models
WP5Full integration, load testing, optimization