NEXUS Framework

Quantitative Evaluation

Core metrics comparison between LLMs, baseline (Mirofish) and NEXUS, with supporting visualizations.

Table 1. Core Metrics – LLMs and Systems
SystemEISReport QualityHallucination RiskRelevanceNoveltyGrounding
GPT-4o-0.2929-0.20160.3222-
o1-preview-0.2443-0.25680.9331-
qwen-max-0.2538-0.17280.4096-
deepseek-chat-0.2688-0.13230.8928-
Mirofish0.56020.48610.50420.5170.89320.51
NEXUS0.6631 (+18.36%)0.5216 (+7.29%)0.4852 (-3.78%)0.5418 (+4.79)0.9774 (+9.4360%)0.5367 (+5.23%)
“-” denotes missing. NEXUS % vs Mirofish. Green↑=gain, Red↓=reduction. (num digits omitted)
Table 2. Retrieval & Multi‑Agent Metrics
MetricMirofishNEXUSAbsolute (pct)MetricMirofishNEXUSAbsolute (pct)
RETRIEVAL & MULTI‑AGENT
retrieval_quality0.8620.8848+0.0228
+2.65%
multi_agent_quality0.60350.6302+0.0267
+2.67%
evidence_density0.71880.7321+0.0133
+1.85%
agent_agreement0.66930.6922+0.0228
+3.41%
evidence_per_claim0.33330.3448+0.0114
+3.43%
agent_disagreement_risk0.80560.7675-0.0381
-4.73%
retrieval_risk0.13980.1094-0.0305
-21.80%
agent_query_relevance0.50.5533+0.0533
+10.66%
multi_agent_confidence0.77390.8005+0.0267
+2.67%
Green↑=gain, Red↓=reduction. Some cells two lines: absolute change & percentage. (num digits omitted)
Table 3. Knowledge Graph & Insight Report Metrics
MetricMirofishNEXUSAbsolute (pct)MetricMirofishNEXUSAbsolute (pct)
KNOWLEDGE GRAPH & INSIGHT REPORT
kg_quality0.3520.682+0.33
+93.74%
insight_quality0.48610.5216+0.0354
+7.30%
kg_risk0.37830.3364-0.0419
-11.07%
insight_hallucination risk0.50420.4852-0.0190
-3.78%
relation_consistency0.72170.7901+0.0683
+9.47%
relevance0.5170.5418+0.0248
+4.80%
claim_structurality0.15650.1946+0.0381
+24.34%
grounding0.510.5367+0.0267
+5.24%
graph_density_proxy0.50310.6376+0.1345
+26.73%
report_coherence0.06490.8648+0.7999
+1233.42%
graph_reasoning_signal0.30910.3396+0.0305
+9.86%
report_length_score0.16670.407+0.2404
+144.23%
path_reasoning0.27890.3056+0.0267
+9.56%
report_quality0.19190.4001+0.2082
+108.48%
additional_supplement_signal-0.3503+0.3503
-
report_query_alignment0.07630.0771+0.0008
+1.06%
confidence_signal0.03430.0343+0.0343
-
report_structure_quality0.50340.8556+0.3522
+69.97%
confidence_signal (dup)--+69.97%
“-” zero/missing. Duplicate preserved as original. (num digits omitted)
📊 Figure 1. 指标分布图 (distribution_figure)
distribution_figure
点击图片放大
🛰️ Figure 2. 分组雷达图 (radar_grouped_academic)
radar_grouped_academic
点击图片放大
📈 Figure 3. 稳定性分析图 (stable_figure)
stable_figure
点击图片放大

NEXUS Architecture