Completeness Analysis

The completeness report answers one question: does your time series contain all the data it should? For sensor data, completeness isn't just about null values — it's about temporal continuity. A perfectly formatted file can still be massively incomplete if large stretches of time are simply absent from the record.

SENTINEL approaches completeness through the lens of time intervals (the gaps between consecutive readings) rather than counting null cells. This reflects how industrial sensor data actually fails: not with NaN placeholders, but with silence.


What the Report Shows

When you open the Completeness modal, you see:

  1. Overall coverage percentage — the headline number: what fraction of expected readings are present
  2. Coverage over time — a bar chart showing data density across the recording period
  3. What's happening — a plain-English description of the gap pattern
  4. Key numbers — sample rate, sampling consistency, total gaps, and longest gap
  5. Gap thresholds — an interactive slider to classify gaps by severity
  6. Technical details — the full statistical breakdown, interval distribution chart, gap list, and sparse periods (collapsed by default)
  7. What to do next — prioritised recommendations

How Overall Completeness is Calculated

For regularly sampled data

When SENTINEL detects a consistent sampling rate (see Expected Interval Detection below), completeness is computed as:

\[ \text{completeness} = \min\!\left(100,\; \frac{\text{actual points}}{\text{expected points}} \times 100\right) \]

where:

\[ \text{expected points} = \left[ \frac{\text{total duration}}{\text{expected interval}} \right] + 1 \qquad \text{total duration} = t_{\text{last}} - t_{\text{first}} \;\text{(seconds)} \]

Example: A sensor recording at 1 Hz for exactly one hour should produce 3,601 readings (0 s through 3,600 s inclusive). If the file contains 3,312 readings, completeness = 3312 / 3601 = 92.0%.

For irregularly sampled data

When no consistent sampling rate is detected, SENTINEL falls back to a gap-time estimate:

\[ \text{completeness} = \max\!\left(0,\; \frac{\text{total duration} - \text{gap time}}{\text{total duration}} \times 100\right) \]

where gap time is the sum of durations of all detected gaps (see Gap Detection). This is an approximation — it assumes that every gap represents missing data rather than an intentional pause — but it provides a useful lower-bound estimate.


Delta-T Analysis

The foundation of the completeness report is delta-T (Δt) analysis: the study of the time intervals between consecutive data points.

Given a time series with timestamps \(t_0, t_1, t_2, \ldots, t_n\), the Δt sequence is:

\[ \Delta t_k = t_k - t_{k-1} \quad \text{for } k = 1, 2, \ldots, n \]

Each \(\Delta t\) is expressed in seconds. Readings where the value is absent are excluded before computing intervals — this handles multi-channel files where one channel may be missing readings while others continue.

Analysing the distribution of these intervals reveals far more than any single summary statistic:

  • A perfectly regular sensor produces a spike at a single Δt value
  • Occasional dropouts produce a main spike plus a small tail at multiples of the expected interval
  • Sensors with variable duty cycles produce multiple peaks
  • Data with systematic outages produces a heavy right tail or a bimodal distribution

Statistical Metrics

SENTINEL computes the following statistics over the Δt sequence:

Central tendency

Metric Formula What it tells you
Mean \(\bar{\Delta t} = \frac{1}{n}\sum_{k=1}^{n} \Delta t_k\) Average interval; sensitive to large gaps
Median Middle value of the sorted sequence Typical interval; robust to outliers

For gap-affected data, the median is a better estimate of the "true" sampling rate than the mean. A large gap inflates the mean but leaves the median unchanged.

Spread

Metric Formula What it tells you
Standard deviation \(\sigma = \sqrt{\frac{1}{n}\sum_{k=1}^{n}(\Delta t_k - \bar{\Delta t})^2}\) Overall spread
Q1 / Q3 25th / 75th percentile Central range of intervals
IQR \(Q_3 - Q_1\) Interquartile range; robust spread measure

Coefficient of Variation (CV)

\[ \text{CV} = \frac{\sigma}{\bar{\Delta t}} \]

CV measures relative variability — how large the spread is compared to the mean. It is the primary indicator of sampling regularity:

CV range Sampling pattern Meaning
< 0.05 Very consistent Near-perfectly uniform sampling
0.05 – 0.15 Consistent Small timing jitter, typical of digital sensors
0.15 – 0.30 Moderate Noticeable variability; may affect some algorithms
≥ 0.30 Variable Highly irregular; resampling likely required

A regular clock-based sensor will have CV ≈ 0. A sensor with occasional late transmissions might have CV ≈ 0.05–0.10. A sensor whose polling interval drifts significantly might reach CV ≈ 0.3 or above.

Distribution shape

Metric What it indicates
Skewness > 1.0 Long-tailed distribution — most intervals are short but some are very long (typical of occasional gaps)
Multimodal Multiple distinct sampling rates present in the same file (e.g., two sensors merged, or a sensor that switches between fast and slow modes)
Number of modes Count of distinct peaks in the interval distribution

Expected Interval Detection

SENTINEL attempts to detect whether the data has a consistent underlying sampling rate, even if individual intervals vary slightly due to clock jitter or transmission delays.

The method uses Kernel Density Estimation (KDE) on the Δt sequence to smooth the interval distribution and find its dominant peak. If a single clear peak is found, that peak's location is taken as the expected_interval. The detection is confirmed only when the CV is below the regularity threshold (< 0.05), ensuring we don't report a spurious "expected interval" for genuinely irregular data.

The expected_interval is used to: - Compute the overall completeness percentage - Set the window size for sparse period detection - Drive the coverage timeline chart

If no expected interval is detected, the completeness report still runs — it uses the median interval as a proxy where needed.


Gap Detection

A gap is any Δt interval that significantly exceeds the typical sampling rate. SENTINEL uses Tukey's fence method (IQR-based) by default to set the gap threshold:

\[ \text{gap threshold} = Q_3 + 1.5 \times \text{IQR} \]

Any interval exceeding this threshold is classified as a gap. The threshold adapts automatically to the data — it is not a fixed constant.

Why IQR-based? The Tukey fence is robust to outliers: it uses percentile-based statistics (\(Q_1\), \(Q_3\)) rather than the mean or standard deviation, which can themselves be distorted by the very gaps we are trying to detect.

Two alternative detection methods are available:

Method Threshold formula When to use
IQR (default) \(Q_3 + 1.5 \times \text{IQR}\) Most datasets; robust to skewed distributions
Standard deviation \(\bar{\Delta t} + 3\sigma\) Symmetric, approximately normal Δt distributions
Percentile 95th percentile When you want a fixed fraction of intervals flagged

Gap properties

Each detected gap records: - Start timestamp — the last data point before the gap - End timestamp — the first data point after the gap - Durationend_timestamp − start_timestamp in seconds - Classification — isolated or clustered (see below)

Gap classification: isolated vs. clustered

Once gaps are detected, SENTINEL classifies each one based on whether other gaps occur nearby:

  • Isolated — the gap stands alone; there are no other gaps within a time window of 2 × median_interval before or after it
  • Clustered — the gap is part of a group of gaps concentrated in a short period

Clustering indicates that something systematic was happening during that stretch — a sensor fault, a network outage, a scheduled maintenance window — rather than a random individual dropout.


Missing Pattern Classification

SENTINEL classifies the overall character of the data loss into one of five categories:

No gaps detected

No gaps were detected above the threshold. The data is temporally complete.

Random dropouts

Gaps are isolated and scattered without a detectable timing pattern. There is no evidence that the probability of missing data depends on when it occurs or what the values are. (Statisticians refer to this as Missing Completely At Random, or MCAR.)

What this means in practice: Individual transmission failures, occasional sensor resets, or infrequent logging errors. The underlying process continued normally; only the recording was intermittently interrupted.

Recommended next step: Linear interpolation is appropriate for short isolated gaps. The missing data can be treated as a straightforward estimation problem.

Detection criterion: More than 50% of detected gaps are classified as isolated.

Irregular coverage

Gaps occur more frequently during certain periods, but the missingness is not directly linked to the recorded values. The cause is external — machine state, environmental conditions, operator activity — but is not directly observable in the data itself. (Also known as Missing At Random, or MAR.)

What this means in practice: A sensor that drops packets during periods of high radio interference; a logger that slows down when connected to a slow network; gaps that correlate with shift changes.

Recommended next step: Treat isolated and clustered gaps differently. Understand the external conditions before attempting gap-filling.

Detection criterion: Neither the random dropout nor systematic outage criteria are met.

Systematic outages

The majority of gaps are clustered in time, indicating that missingness is caused by a recurring or persistent failure mode. The gaps are not random — they happen for a reason that is at least partly visible in the data. (Also known as Missing Not At Random, or MNAR.)

What this means in practice: Sensor failures triggered by extreme readings (e.g., a temperature sensor that stops recording above 150°C); scheduled maintenance windows; systematic communication failures in a specific time-of-day pattern.

Recommended next step: Do not fill across clustered gaps without first understanding the cause. Each cluster should be investigated separately. Simple interpolation will produce physically implausible values across a long outage.

Detection criterion: More than 50% of detected gaps are classified as clustered.

Extended outage

A special case where at least one gap exceeds 50 times the median sampling interval. For a 1 Hz sensor, this means a gap of more than 50 seconds. For a 1-minute sensor, more than 50 minutes.

Extended outage gaps are qualitatively different from ordinary missing data — they represent periods where the sensor was almost certainly off, disconnected, or physically inaccessible. Standard interpolation methods produce meaningless results across an extended outage.

Detection criterion: Any single gap duration \(> 50 \times\) median \(\Delta t\).


Sparse Period Detection

Gaps capture individual missing intervals. But a dataset can have no single gap above the threshold while still losing significant amounts of data — if many small gaps are spread across a specific time window, each one just below the detection threshold.

Sparse period detection addresses this with a sliding window scan:

  1. A window of size W seconds slides across the time range in steps of W/2 (50% overlap)
  2. Within each window, SENTINEL counts the actual number of data points and compares to the expected count (W / expected_interval)
  3. Windows where actual < 85% of expected are flagged as sparse
  4. Overlapping flagged windows are merged into contiguous sparse periods

The window size W defaults to max(300, expected_interval × 60) — at least 5 minutes, or 60 sample-periods, whichever is larger. This ensures the window is wide enough to contain a statistically meaningful number of expected points.

What sparse periods reveal

A sparse period is shown with a completeness percentage indicating how much data was present in that window. For example:

  • 62% complete in 09:00–10:00 means only 62% of the expected hourly readings arrived in that window
  • 34% complete in 14:30–15:00 indicates a severe localised outage

Sparse periods appear in the completeness panel as a warning badge and are listed in the Technical Details section of the report. They are also visible as low bars in the coverage timeline chart.


Coverage Over Time Chart

The coverage chart provides an immediate visual answer to: when was the data missing?

The time range is divided into up to 60 equal-width buckets (fewer for short datasets, ensuring each bucket contains enough points to be meaningful). For each bucket:

\[ \text{bucket coverage} = \min\!\left(100,\; \frac{\text{actual points in bucket}}{\text{expected points in bucket}} \times 100\right) \]

Bars are colour-coded:

Colour Coverage Interpretation
Green ≥ 85% Good coverage — close to or at full density
Orange 50 – 84% Degraded coverage — significant data loss
Red < 50% Severe coverage — more than half the expected data is absent

The dashed green reference line marks the 85% target — the same threshold used by sparse period detection.

Reading the chart:

  • A uniformly green chart indicates consistent data collection throughout the recording period
  • A red cluster at a specific time indicates an outage during that window — investigate what was happening then
  • A gradual orange-to-red fade at the edges can indicate that the sensor was powering up or down
  • Alternating green/orange bars may indicate a sensor duty cycle (intentional sleep mode)

Quality Levels

SENTINEL combines all completeness metrics into an overall quality rating:

Excellent

  • CV < 0.05 (very consistent sampling)
  • Zero gaps detected
  • Regular sampling pattern confirmed
  • No sparse periods detected

The data is temporally complete and ready for analysis without any preprocessing.

Good

Either: - The global statistics are excellent but at least one localised sparse window was detected (worst window ≥ 70% complete), or - CV < 0.15, gap rate < 5%, and worst sparse window ≥ 50% complete

Minor issues exist but the data is suitable for most analytical methods.

Fair

  • CV < 0.30
  • Gap rate < 15%
  • Worst sparse window ≥ 30% complete

Noticeable completeness issues. Some analyses will be unreliable without gap-filling or resampling.

Poor

Everything else — high CV, many gaps, or severe sparse periods. Significant preprocessing is required.


Gap Threshold Slider

The interactive slider in the report lets you adjust the classification boundary between "normal" and "warning" gaps, and between "warning" and "critical" gaps. This is useful for applying domain knowledge:

  • T1 (Warning) — gaps longer than this are flagged as requiring attention
  • T2 (Critical) — gaps longer than this are flagged as operationally significant failures

For example, for a 1 Hz vibration sensor: - T1 = 5 s (more than 5 missing readings is worth investigating) - T2 = 60 s (more than a minute offline is a serious problem)

For a 15-minute weather station: - T1 = 20 min (one missed reading is borderline) - T2 = 60 min (four consecutive missed readings is a failure)

The slider operates on a log scale so that fine-grained control is possible at short intervals and coarse control at long intervals. The counts update in real time using pre-computed histogram bins; exact counts are recalculated server-side after a brief debounce delay.


Recommendations

The report generates a prioritised list of recommendations based on what was found. The logic follows this order of concern:

  1. Extended outage — flagged first because it requires manual investigation before any processing
  2. Systematic outages — cluster-pattern gaps that need separate treatment per cluster
  3. Sparse periods — localised density loss with specific time windows identified
  4. Excellent data — no further action needed
  5. Gap severity — specific guidance based on whether gaps are minor, moderate, or severe
  6. Sampling pattern — resampling recommendations for irregular or multimodal data
  7. High CV — general consistency warnings

Technical Reference

Default configuration

Parameter Default Description
Gap detection method IQR Tukey's fence: \(Q_3 + 1.5 \times \text{IQR}\)
IQR multiplier 1.5 Standard Tukey's fence coefficient
Cluster window multiplier \(2\times\) A gap is clustered if another gap exists within \(2 \times\) median \(\Delta t\)
Skewness threshold 1.0 \(\Delta t\) skewness \(> 1.0\) → long-tailed distribution classification
Multimodal sensitivity 0.1 Peak prominence threshold for multimodal detection
Sparse window threshold 85% Completeness ratio below which a window is flagged as sparse
Extended outage multiplier \(50\times\) Any gap \(> 50 \times\) median \(\Delta t\) triggers an extended outage classification
Coverage chart buckets 60 Maximum time buckets in the coverage timeline chart

How the analysis works

When you upload a time series or click Re-run Analysis, SENTINEL performs the following steps in sequence:

  1. Interval extraction — Computes the time gap between every pair of consecutive readings.
  2. Statistical profiling — Calculates central tendency, spread, and distribution shape across all intervals.
  3. Gap threshold determination — Sets the boundary between normal timing variation and a true gap, adapting automatically to your data.
  4. Gap detection and classification — Identifies intervals that exceed the threshold and labels each as isolated or clustered.
  5. Sparse period scan — Sweeps a sliding window across the timeline to catch stretches of reduced density that individually fall below the single-gap threshold.
  6. Missing pattern classification — Assigns an overall character to the data loss (random dropouts, irregular coverage, systematic outages, or extended outage).
  7. Overall completeness calculation — Produces the headline coverage percentage.
  8. Coverage chart generation — Divides the timeline into equal-width buckets and calculates per-bucket completeness for the bar chart.

Stored results and re-running analysis

Analysis results are saved automatically after each run. This means reloading the dashboard does not trigger a new analysis — your results are recalled instantly. Use Re-run Analysis to recalculate with the latest analysis engine whenever you want fresh results.

If you see "N/A" for the completeness percentage or a "not available" notice where the coverage chart should appear, the stored result was generated by an older version of SENTINEL. Click Re-run Analysis to update it. All other report sections — the gap list, interval distribution chart, sparse periods, and recommendations — remain fully functional in the meantime.