Independent Researchers / ACL 2026 · 2026-04-19 · notable

LLaTiSA — Difficulty-Stratified Time Series Reasoning for VLMs

ACL 2026 Findings paper introducing a four-level taxonomy for time series reasoning and the 83k-sample HiTSR benchmark. LLaTiSA beats GPT-4o on L1 localization (86.8% vs 54.2%) and matches GEM's ECG accuracy with 2.5% of its training data. #1 trending paper on HuggingFace with 79 upvotes.

LLaTiSA GitHub repository social card — time series reasoning benchmark for VLMs

VLMs can now reason about time series at four levels of difficulty — LLaTiSA beats GPT-4o on basic pattern localization with far less training data.

What is it?

LLaTiSA is an ACL 2026 Findings paper that defines a four-level cognitive complexity taxonomy for time series reasoning: L1 perceptual localization, L2 pattern recognition, L3 semantic reasoning, and L4 predictive inference. The authors release HiTSR, an 83k-sample benchmark with Chain-of-Thought annotations covering all four levels, plus a VLM model fine-tuned with multi-stage curriculum training.

How does it work?

The model combines rendered time series visualizations with precision-calibrated numerical tables as dual inputs to a Vision-Language Model backbone (Qwen3-VL-8B). Multi-stage curriculum training — first on L1 tasks, then progressively harder levels — builds foundational temporal perception before tackling semantic reasoning. For ECG interpretation, the model was trained on 30k samples (2.5% of GEM's 1.186M) and achieved 84.0% lead coverage vs GEM's 71.1%.

Why does it matter?

Time series appear throughout real-world AI — clinical monitoring, industrial sensors, finance, environmental data — but VLMs are systematically weak at temporal perception. HiTSR gives the field a reproducible, difficulty-stratified benchmark to track progress, and the curriculum-trained model shows targeted fine-tuning can beat much larger proprietary baselines on both OOD generalization and data efficiency.

Who is it for?

ML researchers building VLMs for scientific or industrial time-series domains

Try it

Code and dataset: https://github.com/RainingNovember/LLaTiSA

Key numbers

HiTSR samples: 83k
L1 localization accuracy: 86.8% vs GPT-4o 54.2%
L3 semantic reasoning: 67.0% vs ChatTS 59.0%
ECG training data: 2.5% of GEM baseline