Independent Researchers / ACL 2026 · 2026-04-19 · notable
LLaTiSA — Difficulty-Stratified Time Series Reasoning for VLMs
ACL 2026 Findings paper introducing a four-level taxonomy for time series reasoning and the 83k-sample HiTSR benchmark. LLaTiSA beats GPT-4o on L1 localization (86.8% vs 54.2%) and matches GEM's ECG accuracy with 2.5% of its training data. #1 trending paper on HuggingFace with 79 upvotes.
VLMs can now reason about time series at four levels of difficulty — LLaTiSA beats GPT-4o on basic pattern localization with far less training data.
What is it?
LLaTiSA is an ACL 2026 Findings paper that defines a four-level cognitive complexity taxonomy for time series reasoning: L1 perceptual localization, L2 pattern recognition, L3 semantic reasoning, and L4 predictive inference. The authors release HiTSR, an 83k-sample benchmark with Chain-of-Thought annotations covering all four levels, plus a VLM model fine-tuned with multi-stage curriculum training.
How does it work?
The model combines rendered time series visualizations with precision-calibrated numerical tables as dual inputs to a Vision-Language Model backbone (Qwen3-VL-8B). Multi-stage curriculum training — first on L1 tasks, then progressively harder levels — builds foundational temporal perception before tackling semantic reasoning. For ECG interpretation, the model was trained on 30k samples (2.5% of GEM's 1.186M) and achieved 84.0% lead coverage vs GEM's 71.1%.
Why does it matter?
Time series appear throughout real-world AI — clinical monitoring, industrial sensors, finance, environmental data — but VLMs are systematically weak at temporal perception. HiTSR gives the field a reproducible, difficulty-stratified benchmark to track progress, and the curriculum-trained model shows targeted fine-tuning can beat much larger proprietary baselines on both OOD generalization and data efficiency.
Who is it for?
ML researchers building VLMs for scientific or industrial time-series domains
Try it
Code and dataset: https://github.com/RainingNovember/LLaTiSAKey numbers
- HiTSR samples: 83k
- L1 localization accuracy: 86.8% vs GPT-4o 54.2%
- L3 semantic reasoning: 67.0% vs ChatTS 59.0%
- ECG training data: 2.5% of GEM baseline