Proje: Okul Platform · Hub: Okul Platform — Architecture

TÜBİTAK AR-GE #7260485 — Data & AI Infrastructure Architecture

Current Stack (Baseline)

Layer	Technology
Relational DB	MySQL
Full-text search	Elasticsearch
Filtering	Rule-based (hand-crafted)
AI/ML	None

Data volumes:

16,000+ school profiles
900,000+ parent reviews
750,000+ annual user interactions
145,000+ leads/year
1,200,000+ registered users

Planned Stack (AR-GE Output)

Vector Database

Candidates: Qdrant or Weaviate
Purpose: Store embeddings for school profiles, parent queries, reviews, and expert knowledge chunks
Use: Semantic similarity search, cross-agent shared latent space, RAG retrieval

Embedding Pipeline

Models: multilingual-e5-large-instruct or BAAI/bge-m3
Input: School profiles, veli (parent) queries, reviews, domain expert knowledge
Output: Dense vector representations for semantic search and dual-agent shared space

Feature Store

Purpose: Behavioral signal aggregation for predictive analytics (WP4)
Tracked events: search, filter, click, lead creation, conversion
Feeds into: B2B coaching agent, predictive insight models

RAG Pipeline

Architecture: Modular RAG with hybrid search

Component	Detail
Retrieval	Hybrid: semantic (vector) + BM25 keyword
Re-ranking	Cross-encoder re-ranker
Generation	LLM with retrieved context
Hallucination control	3-layer validation: source grounding NLI + consistency check + confidence scoring (Monte Carlo Dropout)

Hallucination target: ≤ 5%

Knowledge Gap Detection Pipeline

3-stage pipeline triggered when user queries cannot be answered with sufficient confidence:

Intent Classification — BERT-türkçe; classifies query intent
Coverage Analysis — confidence score threshold < 0.6 triggers gap flag
Question Generation — LLM-based; auto-generates clarification or knowledge acquisition prompts

Performance target: F1 ≥ 0.75

Tacit Knowledge Extraction (HITL)

3-layer Human-in-the-Loop framework for converting expert tacit knowledge into structured form:

Structured Interview Module — guided expert input UI
Rule Extraction — derives explicit rules from expert responses
Embedding-based Learning — Siamese network for learning from expert-validated pairs

Target: ≥ 80% autonomous decision rate after training

Dual-Agent System

Agent	Audience	Role
B2C Counseling Agent	Parents (veliler)	School discovery, match explanation, Q&A
B2B Coaching Agent	Schools (okullar)	Profile optimization, lead conversion insights

Both agents share a latent space (same vector DB, aligned embeddings)
Cross-agent cosine similarity target: ≥ 0.85
Cross-agent verification: agents validate each other’s outputs to reduce bias

Closed-Loop Knowledge Cycle

Every interaction feeds back into the system:

User Interaction
    → Feature Store (behavioral signal)
    → Knowledge Gap Detection (new gap identified?)
    → HITL or LLM fills gap
    → Embedding pipeline (new knowledge chunked + embedded)
    → Vector DB updated
    → Next query benefits from new knowledge

Compliance & Infrastructure

Concern	Detail
Data privacy	KVKK, GDPR compliant
Security	ISO/IEC 27001, EU AI Act
MLOps	MLflow + Weights & Biases for experiment tracking
CI/CD	GitHub Actions
Containers	Docker + Kubernetes
Cloud	AWS (EC2, S3, RDS)

Integration Strategy

All new AI components integrate via API adapters — the existing Laravel codebase is not modified directly. New services expose REST endpoints consumed by the platform.

API response target: < 2 seconds

Work Package → Component Mapping

WP	Components Built
WP1	Vector DB setup, embedding pipeline, RAG foundation, Knowledge Gap Detection
WP2	B2C agent, B2B agent, dual-agent shared latent space
WP3	HITL framework, closed-loop knowledge cycle
WP4	Feature store, behavioral analytics, predictive models
WP5	Full integration, load testing, optimization

2026-04-21-tubitak-arge-7260485-ai-ekosistemi — Project decision, rationale, timeline, team

Okul.com.tr — Eğitim Ekosistemi

Gezgin

TÜBİTAK AR-GE #7260485 — Data & AI Infrastructure Architecture

TÜBİTAK AR-GE #7260485 — Data & AI Infrastructure Architecture

Current Stack (Baseline)

Planned Stack (AR-GE Output)

Vector Database

Embedding Pipeline

Feature Store

RAG Pipeline

Knowledge Gap Detection Pipeline

Tacit Knowledge Extraction (HITL)

Dual-Agent System

Closed-Loop Knowledge Cycle

Compliance & Infrastructure

Integration Strategy

Work Package → Component Mapping

Grafik Görünümü

İçindekiler

Backlinkler

Okul.com.tr — Eğitim Ekosistemi

Gezgin

TÜBİTAK AR-GE #7260485 — Data & AI Infrastructure Architecture

TÜBİTAK AR-GE #7260485 — Data & AI Infrastructure Architecture

Current Stack (Baseline)

Planned Stack (AR-GE Output)

Vector Database

Embedding Pipeline

Feature Store

RAG Pipeline

Knowledge Gap Detection Pipeline

Tacit Knowledge Extraction (HITL)

Dual-Agent System

Closed-Loop Knowledge Cycle

Compliance & Infrastructure

Integration Strategy

Work Package → Component Mapping

Related

Grafik Görünümü

İçindekiler

Backlinkler