Skip to content

Conformal intervals become extreme/null after inverse log transform (expm1) — what's the recommended approach? #1119

@gbgoutha

Description

@gbgoutha

What happened + What you expected to happen

Conformal intervals produce extreme/null values after inverse log transform (expm1)

Description

When using ConformalIntervals with AutoARIMA on log1p-transformed data, the conformal scores (computed in log-space) produce extreme or null hi values after applying expm1 to inverse-transform back to original scale.

This is because conformal scores are additive in log-space but become multiplicative in original space after expm1.

Example

column log-space value after expm1
point 6.03 414
lo-70 5.31 202
hi-70 11.88 144,764

The hi is 350x the point forecast — clearly nonsensical for a demand prediction interval.

Root cause

In _conformity_scores (models.py L122-148), the absolute residuals |mean - y_test| are computed in the transformed (log) space. In _add_conformal_distribution_intervals (models.py L69-96), these scores are added/subtracted from the point forecast, still in log-space:

scores = np.vstack([mean - cs, mean + cs])  # all in log-space

### Versions / Dependencies

<details><summary>Click to expand</summary>
Dependencies:
statsforecast 2.0.3
</details>


### Reproducible example

```python
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
from statsforecast.utils import ConformalIntervals
import numpy as np
import pandas as pd

np.random.seed(42)
n = 260  # 5 years weekly
y = np.random.lognormal(mean=6, sigma=0.5, size=n)
df = pd.DataFrame({
    'unique_id': ['series_1'] * n,
    'ds': pd.date_range('2020-01-06', periods=n, freq='W-MON'),
    'y': y
})

# Log transform before fitting
df['y'] = np.log1p(df['y'])

sf = StatsForecast(models=[AutoARIMA(season_length=52, D=1)], freq='W-MON')
ci = ConformalIntervals(h=26, n_windows=3)

fcst = sf.forecast(df=df, h=26, prediction_intervals=ci, level=[70, 90])

# Inverse transform — hi values explode
for col in ['AutoARIMA', 'AutoARIMA-lo-70', 'AutoARIMA-hi-70',
            'AutoARIMA-lo-90', 'AutoARIMA-hi-90']:
    fcst[col] = np.expm1(fcst[col])

print(fcst[['AutoARIMA', 'AutoARIMA-lo-70', 'AutoARIMA-hi-70']].describe())
# hi-70 mean will be orders of magnitude larger than the point forecast

Issue Severity

High: It blocks me from completing my task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions