|
Basic Architecture

Core Forecasts
The DICast generates point forecasts
for user-defined locations (e.g., cities,
at observation sites, locations along
the highway system, agricultural fields,
etc.). At observational sites, forecast
parameter tuning based on past performance
helps improve the forecasts. This class
of sites is called core forecast sites.
Forecasts at non-core sites are derived
from forecasts at core sites.
The numerical weather model data used
by the DICast has three-hourly resolution.
Since the system is primarily model data
driven, forecasts are initially generated
at three-hour intervals. These times are
called the core forecast lead times.
DICast Forecast Modules
The DICast creates several independent
forecast estimates. Each forecast module
attempts to create the best forecast it
can by applying a specific forecast technique
to its input data set. Each DICast forecast
module uses one of three basic techniques
to generate forecasts. They are:
-
Dynamic Model Output Statistics (DMOS),
-
Interpolation of NWS MOS site forecasts,
and
-
Semi-static techniques.
Each forecast module produces an identically
formatted output file. No forecast module
is dependent on another forecast module.
That is, no forecast module's output is
used as input to another forecast module.
2.1 Dynamic MOS forecast modules
The Dynamic MOS (DMOS) forecast modules
are a dynamic variation of the traditional
NWS MOS procedures. DMOS, like traditional
MOS, finds relationships between model
output data and observations using linear
regression methods. However, while MOS
equations are calculated using many years
of data, DMOS uses only the last 3 months
(configurable) of data. New regression
equations are re-calculated once per week.
The DMOS technique has several advantages
over traditional MOS. The reliance on
only a short history allows DMOS equations
to be calculated and DMOS forecasts generated
for newly ingested models or models that
are changing due to enhancements. Traditional
MOS equation generation would require
the model to be stable (no changes) for
several years. Also, the MOS equations
are calculated painstakingly with a large
human quality control effort. This makes
it difficult to add MOS equations for
a new set of forecast sites. DMOS forecasts
can be made at these sites immediately
provided they have an observational history
of at least three months (configurable).
A disadvantage of DMOS is that the equations
it produces are less stable than MOS equations.
For this reason, quality control checks
must be put into place to assure that
the equations produced will not create
nonsensical outlier forecasts.
The DMOS subsystem applied to any model
has three components:
The interaction of these three components
is illustrated in Figure 1.

2.2 Regressor Calculation
Regressors are variables extracted or
derived from model data, which is likely
to have a relationship to one of the output
forecast variables. These regressors are
calculated at each forecast site for each
forecast lead-time. About 2/3 of the regressors
are variables directly extracted from
the model data. Other regressors are derived
by combining several variables to estimate
meteorological data not explicitly predicted
by the models.
Regressors are variables extracted or
derived from model data, which is likely
to have a relationship to one of the output
forecast variables. These regressors are
calculated at each forecast site for each
forecast lead-time. About 2/3 of the regressors
are variables directly extracted from
the model data. Other regressors are derived
by combining several variables to estimate
meteorological data not explicitly predicted
by the models.
Since the forecast sites are rarely at
model grid points, interpolation techniques
are used to generate forecasts at the
forecast sites. This requires an understanding
of the projection of the model grid and
the terrain assumptions used in each model.
As some of the regressors are estimates
of meteorological variables at the earth's
surface, correcting for the simplified
terrain used by the model is important
and varies from model to model. The regressors
from one model run are all stored in one
file. The regressor files are put into
a regressor history that the DMOS empirics
process uses to calculate regression equations.
2.3 DMOS Empirical Relationships Generator
The DMOS Empirical Relationships Generator
attempts to find relationships between
the regressors and the observations at
forecast sites. It does this using a linear
regression technique. There are tradeoffs
involved in determining the best regression
equation. The goodness of fit measure
of a regression equation is called its
r - squared value. Typically, adding more
regressors to an equation increases the
r-squared value. However, this also increases
the variance of the output forecasts since
more regressors are included that do not
have a strong relationship to the predictand.
Therefore, the desired set of regressors
has most of the information leading to
a good prediction and does not contain
noisy regressors.
Equations that do not have a sufficiently
high r-squared value are replaced with
a default equation. This default equation
is a predefined combination of regressors
defined by a meteorologist. A default
equation is an attempt to generically
replicate a meteorologist's logic in coming
up with a forecast. Special, usually derived
regressors have been developed for this
specific purpose. These default equations
generally do not produce the erroneous
forecasts that a low r-squared equation
might.
This best combination of regressors will
vary from site to site, between forecast
lead times, and clearly will be different
for each forecast variable. The relationships
will also vary from season to season and
from model to model. The empirics generator
is run once per week for each model to
find the equations which best fit the
most recent data. These equations are
stored in a DMOS empirics file and used
later by the DMOS forecast generator.
2.4 DMOS Forecast Generator
The DMOS Forecast Generator applies the
empirical relationships generated by the
DMOS Empirical Relationships Generator
to the most recent regressors. This generates
the DMOS forecast. The relationships between
regressors that have done well at predicting
the observations recently are used again
on today's regressor data to make a DMOS
forecast. If any of the regressors that
appear in a regression equation are missing,
a missing forecast is generated.
2.5 NWS MOS Forecast Modules
These forecast modules are based on the
MOS products generated by the National
Weather Service. These forecasts are not
a perfect match to the desired MDSS forecasts.
The MOS data consist of point forecasts
at sites chosen by the NWS. These MOS
sites are generally a subset of the MDSS
forecast sites. Also, the variables forecast
in the MOS output varies for each of the
NWS models. In addition, the variables
do not directly match the MDSS forecast
variables and it is possible that the
forecast lead times do not match the MDSS
forecast lead times.
At a site included in any particular
NWS MOS forecast, the forecast module
tries to reproduce the exact forecast.
Where MDSS variables are explicitly forecast
in the MOS product, they are simply copied.
Otherwise, if reasonable, the MDSS forecast
variable is derived from the MOS data.
For some variables, no derivation is reasonable
and these variables are left as missing
data. If the forecast lead times of the
MOS product do not match the MDSS forecast
times, the forecast module makes an interpolated
forecast where possible.
For the majority of the MDSS sites, no
MOS forecasts exist. Forecasts for these
sites are generated by interpolation techniques.
The interpolated forecasts are generated
using the forecasts generated at the MOS
sites. No satisfactory interpolation technique
has been found that works well for all
variables in rough terrain. For example,
the interpolation of surface winds in
the mountains does not work well using
any known technique. Semi static forecast
modules.
Two forecast modules are called semi-static
in that their forecasts depend only on
historical data, not on any predictive
forecast model. These two are the climatology
and persistence forecast modules. These
two look at the past weather over different
time ranges and base their forecast on
the average weather seen. The climatology
forecast module uses data from up to the
last 30 years. Monthly averages of the
MDSS forecast variables have been computed
and stored in a climatology file. These
monthly climatological values are interpolated
to the forecast date. The persistence
forecast module averages the observations
of the MDSS variables seen in recent days
to come up with its forecast. The persistence
and climatology forecast modules have
more effect on the forecasts for longer-term
forecasts periods (> 72 hours). These
modules will not provide a significant
contribution in the MDSS FP, which will
be configured to only provide guidance
out to 48 hours.
3. Forecast Integration
3.1 Integration Overview
The DICast forecast modules each generate
as complete a forecast as possible. This
includes a forecast for every forecast
variable at every forecast site for every
forecast lead time. These independent
forecast estimates are combined by the
integrator to generate one final consensus
forecast. Numerous combination techniques
have been developed. Investigation has
led to a decision to use an enhanced Widrow-Hoff
learning method. This method creates its
final forecast using a weighted average
of the individual module forecasts. The
weights are modified daily by nudging
the weights in the gradient direction
of the error in weight space. The effect
of this is that forecast modules that
have been performing well for a particular
forecast (variable, site, and lead time)
get more weight and the poorly performing
modules get less weight. Note that different
weight vectors exist for every forecast
generation time due to differing latencies
in the input data sets. The interaction
of components of the integrator is illustrated
in the figure below.

3.2 Integrator empirics
This DICast process runs once per day
and updates all the weights based on the
performance of the various forecast modules.
It reads the observations from the previous
day and compares the forecast modules'
output that predicted those observations.
For each forecast, the errors are computed
and the gradient vector in weight space
is computed. A step proportional to the
size of the combined error is taken in
that gradient direction to compute the
new weights.
3.3 Integrator
The integrator creates a final forecast
by making a bias-corrected confidence-weighted
sum of the individual module forecasts.
It reads the forecasts from the forecast
module output files, the weights from
the integrator empirics file, performs
its calculations, and stores its results.
3.4 Non-verifiable Data Extractor
The DICast forecasting techniques described
above only apply to core forecast variables.
These are variables that are regularly
measured and reported in meteorological
observation data. The DMOS forecast modules
and the integrator both require specific
observations to tune themselves. The weights
used in the combination are pre determined
by a meteorologist familiar with the models
and stored in a configuration file. The
model variables to be combined have been
extracted by the DMOS regressor calculation
process and stored in a regressor file.
The Non-Verifiable Data (NVD) extractor
reads in the appropriate models' regressor
files along with the weight configuration
file before creating its weighted combination
output.
3.5 Post processor
The post processor provides a variety
of processing options to merge the integrator's
forecasts and the NVD forecasts. It attempts
also to remove ridiculous forecasts, derive
other forecast variables, and spatially
and temporally interpolate the forecasts
to non-core forecast sites.
Quality control measures are applied
to the integrator's output to ensure that
no forecasts are well beyond reasonable
ranges. Forecast values near the limits
are returned to the bounding values. For
example, forecasts of 101% probability
of precipitation are turned into forecasts
of 100%. Forecasts well beyond the bounds
are replaced with a missing data flag.
Forecast variables required by users
are derived from the core set of MDSS
forecast variables. For example, relative
humidity is derived from temperature and
dew point temperature. The output of the
integrator contains only forecasts for
core forecast sites. Forecasts at the
non-core sites are generated by spatial
interpolation from the core sites' forecasts.
Temporal interpolation of the three-hourly
forecasts to one hour is used to generate
the desired final forecast temporal resolution.
|