J. Scott Armstrong, Principles of Forecasting, forecasting methods, futures research

Suggested best practices:

"Principles of Forecasting: A Handbook for Researchers and Practitioners" - J. Scott Armstrong, The Wharton School, Univ. of Pennsylvania - provides an essential reference work for any serious practitioner or student of forecasting and estimating methods. Most impressive - given the extent to which phony forecasting is used as a propaganda ploy - is the book's objective evaluations of reliability of the various forecasting methods covered, and candid explanations of the degrees of uncertainty.
&

The "principles" are "advice, guidelines, prescriptions, condition-action statements, and rules" - all supported by empirical evidence of efficacy to the extent that such evidence has been developed by scholarship. They are general recommendations that have to be fine tuned to best apply to individual circumstances.

By "principles," Armstrong explains, the book broadly means "advice, guidelines, prescriptions, condition-action statements, and rules" - all supported by empirical evidence of efficacy to the extent that such evidence has been developed by scholarship. The book also deals with such "principles" supported primarily by expert judgment - and some that are admittedly speculative. However, it clearly provides the bases for evaluation and covers the specific limited "conditions" for which each principle is applicable. These forecasting principles are general recommendations that have to be fine tuned to best apply to individual circumstances.
&
As an example of the tendency of even the most authoritative experts to base forecasts on methods that rest on mere opinion with no supporting empirical evidence, Armstrong cites the dubious expertise of Nobel Laureate economist Paul Samuelson. Based on Samuelson's uncritical acceptance of Keynesian theory - (theory based on opinion unsupported by any empirical evidence) - in the many editions of his widely used "Economics" textbook,

"Samuelson stated that private enterprise is afflicted with periodic acute and chronic cycles in unemployment, output, and prices, which government had a responsibility to 'alleviate.' As late as the 1989 edition, Samuelson said 'the Soviet economy is proof that, contrary to what many skeptics believed, a socialist command economy can function and even thrive.'"

Belief in the existence of competence in the Soviet economy - and belief in "chronic" capitalist economic malfunctions - are just a couple of the many gross stupidities of Keynesian theory and beliefs.

While judgmental forecasting methods are clearly limited to the human skills applied, mathematical forecasting methods are likely to provide a false sense of precision and certainty.

Uncertainty is always a factor - overconfidence a constant risk - Armstrong wisely warns. While judgmental forecasting methods are clearly limited to the human skills applied, mathematical forecasting methods are likely to provide a false sense of precision and certainty. With respect to models derived primarily from data - rather than theory: "An immense amount of research effort has so far produced little evidence that data-mining models can improve forecasting accuracy." However, when based on credible theory, econometric models can usefully "integrate judgmental and statistical sources."

Econometric and other mathematical models are always subject to any weaknesses that may exist in either the theory or the data used. GIGO - "garbage in - garbage out" - is the first law of mathematical forms of analysis. The invalidity of macroeconomic econometric models can be glaringly obvious. For example, researchers have found no econometric model that can reliably explain major currency exchange-rate movements after the fact - much less predict them.

All of these techniques are still evolving, and there is a pervasive need for further research and testing of both reliability and best practices. Each chapter of the handbook contains suggestions for further study.

Even in instances where role playing missed the actual outcome of a conflict, it proved better than expert judgment or arbitrary chance in indicating the possibility of the actual outcome in a substantial majority of instances.

War games and mock trials are typical uses. However, role playing can also be used in commercial evaluations for such things as changes in pricing policy or product design - and for union strike threats or other labor actions.
&
War - as usual - provides dramatic examples of success and failure. The Falkland Islands war is cited as an example of a conflict caused by failures by both the Argentines and the British to accurately evaluate each other's responses to actions taken prior to the conflict and in its initial stages. The author notes that war games conducted beforehand indicated the futility of limited bombing of North Vietnam - a prediction that was verified when political leaders decided to go ahead with it anyway.
&
Armstrong contrasts use of role playing with such alternatives as expert judgment, field or laboratory experiments, surveys of intentions, extrapolation from analogous situations, and game theory. Evidence of the actual validity of role playing in such things as mock juries and psychology is sparse. For example, mock conventions in universities proved no better than public surveys at predicting the outcome of political conventions in the years prior to the time when primaries began to dictate outcomes.
&
However, Armstrong provides evidence from five of his own role playing experiments that demonstrate a high degree of validity in a variety of situations. Even in instances where role playing missed the actual outcome of a conflict, it proved better than expert judgment or arbitrary chance in indicating the possibility of the actual outcome in a substantial majority of instances. Interactions during role playing do make a difference in achieving significantly higher percentages of valid results.
&
He also cites several successful uses of role playing to predict the outcome of decisions in personnel hiring and in customer satisfaction with medical services.
&

Proper technique includes "realism in casting, role instructions, situation description, and session administration" - factors the author outlines and evaluates.

"To obtain a reliable prediction, one would want to have a number of decisions, each based on a different group. To obtain a valid prediction, one would also want to vary key elements of the role play."

Thus, it is advisable to run about 10 sessions, with half using one description and half an alternative description - with more sessions indicated if responses vary substantially.
&

Surveys of Intentions:

"Methods of Forecasting from Intentions Data" - by Vicki G. Morwitz, Stern School of Business, New York Univ. - covers the use of surveys of what people intend to do as a means of predicting what they will do. Surveys are frequently used to predict the market for commercial goods and services. They are regularly used by the vast majority of market research clients. They can also be used "to develop best- and worst-case scenarios.
&

The utility of such surveys varies with the type of product, the people surveyed, and the skill with which the questions are phrased. Morwitz advises:

Intentions - plans, goals or expectations - "should be measured using probability scales."
Respondents "should be instructed to focus on their own individual characteristics."
Respondents should be segmented into functional classifications and their answers should be adjusted in light of their known biases.

Thus:

Answers from people who have previously engaged in similar behavior should be given more credibility than from those who have not.
However, those who incorrectly recall the time of their last purchase or other subject action are more likely to make "biased predictions" of future purchases or other actions.
Finally, researchers should be aware that asking about and measuring behavior intentions can change behavior.

Studies have shown that groups of people who were asked about auto or computer purchases actually bought more of them than similar groups not asked about such intentions. However, the impact on certain subgroups varied, with both positive and negative intentions apparently reinforced by the questioning. Brand loyalty seems to be enhanced by survey questions.

The "survey" propaganda ploy is well known and widely used. Political polls are a widely used example - often conducted - and worded - more to impact behavior than to measure it.

Responses about behavior that is either socially desirable or undesirable will be impacted by bias.

Biases are a major problem. Respondents frequently tend to overstate or understate intentions, and the causes of such biases are still poorly understood. There are a variety of biases - some obvious and some subtle - that can affect results.
&
Biases can arise because the sample surveyed is not representative of the population whose behavior is being measured. Actions generally based on impulse - such as impulse purchases - are hard to predict. Responses about behavior that is either socially desirable or undesirable will be impacted by bias. The author's suggestions are primarily attempts to deal with bias.
&
Probability scales - asking respondents to provide a numerical rating of the chances they will purchase an item or take some other action - such as "1 in 100" or "9 in 10" - appear to be more reliable than verbal ratings - such as "definitely" or "probably will" or "probably won't" - and both appear better than merely asking whether a purchase or other action is intended. This may be because some respondents answer "what they would like to do rather than what they are likely to do."
&
Accuracy can also be compromised for major purchases - like buying a car - if even a small percentage of those who answer no because they feel unlikely to make a purchase within the stated period ultimately do make a purchase. One survey for car purchases found that 92% of respondents had no intention to make a purchase - a percentage so large that the 7.3% of them who nevertheless purchased a car in that period accounted for 70% of all sales.
&
The content of accompanying instructions and the wording of the questions also affects the outcome. Different results arise from questions about "intentions" and those about "expectations." Different results also arise when asking questions about "goals" and when asking questions about actions actually within the total control of the respondents. Instructions focusing on individual characteristics appear to increase accuracy.
&
Appropriate segmentation of responder groups - such as by income, demography, product use, etc. - can also achieve some reduction in error margins.
&

Adjustments to eliminate bias are suggested. Studies have shown that purchase probability data for durable goods - like autos - generally substantially understate actual purchases, while data for non durables overstate actual purchases. Historical data - from the same or similar behavior - where available - can be used to adjust for these systemic inaccuracies, but that will still leave some substantial margin of error.
&
Morwitz reviews the published studies, but a significant level of uncertainty remains due to difficulties in studies of this subject. Like almost all social forecasting, this is still, at best, an inexact art.
&
A more realistic result - even after using applicable adjustment methods - is to accept that a range of results is the most that can be expected. A numerical result gives a false sense of precision. A range forces planners to take into account best-case and worst-case possibilities, and all points in between. Of course, the utility of this technique decreases in relation to the width of the range of possible results.
&
Armstrong lists six conditions that determine the accuracy of intended behaviors.

The importance of the predicted behavior.
Responses can be obtained from the decision makers.
Intentions are advanced enough to include plans.
Reports of respondent intentions are accurate.
Fulfillment of the plans are within the control of the respondents - rather than being goals.
The plan is unlikely to be impacted by new information during the forecast period.

The accuracy of forecasts involving intentions surveys is frequently improved when used with other data - such as from role playing or sales programs or expert opinions.
&

Expert opinions:

The Delphi expert opinion survey techniques are the most powerful.

"Improving Judgment in Forecasting" - by Nigel Harvey, Univ. College of London - discusses methods of reducing the impact of inconsistency and bias in forecasts based on expert judgment. By definition - those with experience, learning and a proven track record in the relevant field - the "experts" - are a good source of reliable forecasts. However, there are reasons why experts fail - and there are techniques to counter those reasons. Among these, the Delphi expert opinion survey techniques are the most powerful.
&

Checklists help "because people can rarely bring to mind all the information relevant to a task when they need to do so." Irrelevant information can confuse judgment. The weight given individual factors can vary from the forecaster's specific judgment of their relative importance.

Inconsistency and bias are the two primary negative influences affecting expert opinion.
&
Inconsistency may arise from variations in the way the forecasting problem is formulated, or variation in choice or application of method.

Breaking down the problem into component parts - "decomposition" - and other methods of structuring the forecasting and adjustment process can reduce inconsistency.
Multiplicity - using multiple sources for estimates, multiple decomposition methods and multiple estimations for each one, multiple role playing sessions and multiples of other forecasting methods - can reveal problems and provide more reliable combined results.
Judgment should be limited to those aspects of forecasting that can benefit from it. Mechanical methods should be used to process information.
Careful identification of the most important causal forces can also improve results, since evaluation accuracy declines as the complexity of applicable knowledge - "domain knowledge" - increases.
Checklists of relevant information can be highly effective in reducing inconsistency. This is most useful where many variables must be evaluated. Checklists help "because people can rarely bring to mind all the information relevant to a task when they need to do so." Long term memory is inherently imperfect, and even immediate memory can be overwhelmed. Irrelevant information can confuse judgment. The weight given individual factors can vary from the forecaster's specific judgment of their relative importance. Checklists assure valuation of possibilities that might otherwise be overlooked. Checklists can be compiled as experience is accumulated.

Requiring not just an expert opinion, but also the justification for the opinion, is a common method of reducing and/or revealing bias. Response to feedback can also counter the influence of bias, as in Delphi survey techniques. Finally, an ongoing measurement of forecast accuracy undermines biased judgment.	Bias may arise from the self interest of the expert, or may be inherent in the judgmental or statistical methods applied. & Bias may be reduced by using experts with no stake in the outcome. Where bias may be an important influence, care must be taken in evaluating the adjustments used by the experts. Using experts with differing backgrounds can cancel out some bias, and adjustments can be applied when biases are recognized as inherent in the forecasting system. & Requiring not just an expert opinion, but also the justification for the opinion, is a common method of reducing and/or revealing bias. Response to feedback can also counter the influence of bias, as in Delphi survey techniques. Finally, an ongoing measurement of forecast accuracy undermines biased judgment. & Using different people to devise plans from those estimating likelihood of success can reduce bias in evaluating the effectiveness of proposed actions. People tend to overconfidence in the outcome of their plans. This has been found true for business, athletics - and medical treatments. & Bias in judging the degree of uncertainty can be reduced by using multiple judgment methods. Percent probability or probability of falling within given ranges of uncertainty can be used - or both can be used and an average taken. The difficulties lie in averaging results with more than one variable. &
Forecasting performance can change with changes in the quality of the data as well as changes in forecast method or error measure.	Experts have a variety of forecasting methods to employ, and should rely on the one that achieves the best results for particular problems. Comparisons of the results of different forecasting software should be ongoing. However, performance over a few periods is not assurance of future performance, and performance can change with changes in the quality of the data as well as changes in forecast method or error measure. & Other criteria should also be considered. Cost - timeliness of results - the needed degree of accuracy and other user needs, such as transparency of forecast method and understanding of uncertainty of results - must be considered beforehand. Since different users within a given organization - sales, planning, production - may have different forecasting needs, the various users should make the needed compromises beforehand to agree on the forecasting methods to be adopted. Also, establishing specific forecasting criteria beforehand helps evaluate methodology by avoiding ad hoc changes over time that undermine comparability. &
"People tend to search for information that confirms rather than falsifies their hypotheses."	Accurate records and feedback are vital for the reduction of both inconsistency and bias. Personal memory of results becomes unreliable over time. "The hindsight bias is likely to cause forecasters to overestimate the quality of their forecasts." & Even with adequate records, objective evaluation is not always achieved. There may be "confirmation bias." "People tend to search for information that confirms rather than falsifies their hypotheses." Also undermining feedback in some cases are those instances where the forecast itself may be "self fulfilling" by inducing actions that increase the likelihood of the forecast. & Graphical presentation of data has been shown to facilitate evaluation better than tabular lists of numbers - but primarily for gradual trends, as in business and financial series. Extrapolation from graphical presentations - by drawing a "best fitting line" through and beyond the points on the graph - has been similarly shown to facilitate evaluation. &

Information acquisition and processing:

"Improving Reliability of Judgmental Forecasts" - by Thomas R. Stewart, Univ. of Albany, State Univ. of New York - can be achieved by adopting techniques that deal with the unreliability of information acquisition and information processing.
&

People instinctively use only a subset of available information in their forecasting and planning. Error is introduced into judgment forecasts "by the natural inconsistency of the human judgment process."

For validity - the accuracy of forecasts - the process must not only be reliably performed, but all the forecasting techniques and data must also be free of errors.

While rigorous analytical process may experience fewer errors, errors can still be introduced by such things as errors in inputs or various types of system failure - typically causing large, even catastrophic, errors.

Increasing the amount of information available may not improve the quality of the forecast.

The reliability of judgment forecasts is undermined by complexity - an uncertain environment - reliance on perception, memory, or pattern recognition - and reliance on intuition instead of analysis. Indeed, "accuracy of forecasts does not increase as information increases," because of the increase in mental burden and complexity. People instinctively use only a subset of available information in their forecasting and planning. Error is introduced into judgment forecasts "by the natural inconsistency of the human judgment process."
&
"Reliability is necessary, but not sufficient, for validity." For validity - the accuracy of forecasts - the process must not only be reliably performed, but all the forecasting techniques and data must also be free of errors.
&
Stewart discusses methods of analyzing the extent of unreliability and its impact on judgmental forecasting. Such factors as stress, time pressure, and forecaster confidence levels can impact the reliability of the forecasts made by cognizant professionals. Similarly, difficulties in acquiring or interpreting information can impact reliability. Studies involving medical professionals demonstrate wide variations in interpreting data for diagnostic purposes.
&
Stewart notes that judgment exists along a continuum between inexplicable "intuition" and step-by-step explainable analytical process. While rigorous analytical process may experience fewer errors, errors can still be introduced by such things as errors in inputs or various types of system failure - typically causing large, even catastrophic, errors.

Competent professional "intuition" isn't irrational or completely inexplicable. It usually involves consideration of readily evident relevant inputs - evaluated on the basis of pertinent professional experience - resulting in as great an inferential leap as needed to draw a conclusion. It is a professional level of "common sense." As more deliberate analytical processes are brought to bear, the inferential step is cut up into increasingly smaller and more numerous inferential steps supported by observations of increasing amounts of relevant data. As Stewart points out, this is really a continuum, rather than a comparison of extremes.

Stewart concludes:

The more uncertain the event, the lower the reliability of information acquisition and processing.
Increasing the amount of information available may not improve the quality of the forecast.
Since all expert forecasts are a combination of analysis and intuition - ("professional judgment") - the role of each should be carefully considered and the forecasting process should be structured to take advantage of the strengths of both processes while avoiding their limitations. (Stewart thus accurately emphasizes that you can't eliminate professional judgment from the forecasting process.)
Improvements in information displays can frequently increase accuracy.

To improve reliability, Stewart suggests:

Stress a small number of the most important cues for judgmental forecasting. This may involve segmenting a complex task into several simpler tasks. This suggestion applies when large amounts of information are available or when the subject involves inherent uncertainties and where no analytical method for processing the information is available.
Organize and present information in a form that clearly emphasizes the relevant information and separates out irrelevant data. Errors are introduced whenever the cues themselves must be evaluated or forecast or acquired perceptually - as in medical diagnoses, legal opinions and other professional opinions. This factor is less important for numerical data - as in business and economic forecasting.

Much of the data used in economic forecasting - especially in macroeconomic forecasting - always appears more precise than it is. Much of it is derived from accounting procedures.
&
As FUTURECASTS has repeatedly explained - even before the extreme examples of recent accounting scandals - the failure of macroeconomics is due in part to the impossibility of basing a science or any valid mathematical analytical technique on data derived from a nebulous professional art designed for other purposes that are frequently legitimately inconsistent with the production of valid economic data. Failure to properly evaluate economic data is an important factor in macroeconomic forecasting failures.

Several independent judgmental forecasts can be combined to increase reliability and average out unsystematic errors. Studies of weather forecasting, sales forecasting, and economic forecasting suggest that group forecasts tend to be more accurate than most individual forecasts. However, the existence of widespread systemic bias can undermine the superiority of aggregated group judgments. Also, when forecasts vary due to disagreements among forecasters, it is better to try to analyze and resolve those disagreements than just average them out. Nor will aggregation help negative or completely invalid forecasts. Stewart gives the example of budget forecasts by a legislature that wants to increase spending and an executive that wants to cut spending.
Use mechanical methods - especially computerized models - to process information. Even simple linear models can improve the reliability of information processing when mechanical processing doesn't involve the loss of important cues. The process remains dependent on human acquisition and judgment concerning the information to be inserted into the mechanical process - and breaks down as important information proves inappropriate for mechanical processing or is otherwise omitted. The risk of substantial - even catastrophic - failure of the mechanical process must be kept in mind.
The methods used for processing information should be justified - especially for tasks with inherently low predictability. The need to justify methods will create a more analytical - less intuitive - process that should increase reliability.

Analyzing components of forecasting problems:	"Decomposition for Judgmental Forecasting and Estimation" - by Donald G. MacGregor, Decision Research, Eugene, Or. - provides suggestions for when and how to break down complex forecasting problems into their component parts. &
	By breaking down complex forecasting problems into average portions or discreet segments - "decomposition" - and then combining the results, forecasters can facilitate certain forecasting estimates. This technique is applicable to either multiplicative or segmented forecasts - although multiplicative decomposition can also greatly enlarge component errors. Decomposition facilitates numerical estimating based on partial or incomplete knowledge. It is a manner of proceeding when time and/or money constraints limit information acquisition - or when much of the information for some segments is simply not accessible. & MacGregor cites the "enormous" research evidence proving the substantial increases in accuracy achieved by segmenting suitable complex problems. Choice of multiplicative or additive method depends on the characteristics of the forecasting problem. Numerical estimates aggregating differing classes of components - sales of apples plus oranges plus bananas - must be done separately and added together. Numerical estimates of similar components - revenues from sales of oranges in different stores - can be averaged and multiplied from results from a representative sample. Of course, the two methods may be used together for suitable components of a single problem, and the choice may not always be clear cut. &
"Multiplicative decomposition is based on the assumption that errors in estimation are unrelated and therefore will tend to cancel each other out."	Decomposition is useful when overall uncertainty is high. It should not be used merely to refine fairly reliable estimates, since risks of propagating component errors outweigh benefits. "Multiplicative decomposition is based on the assumption that errors in estimation are unrelated and therefore will tend to cancel each other out." When uncertainty levels are low, individual errors may not adequately cancel out. & Methods of analyzing relative uncertainty are reviewed by MacGregor. Extreme instances are easy, but matters get less clear near medium uncertainty levels. Of course, it is also not useful if the components are more difficult to estimate than the whole. & Combining different efforts and techniques - as always with estimating or forecasting - tends to increase accuracy. Combining the results of different estimators or forecasters and/or different breakdown methods and/or other forecasting or estimating methods all seem to increase accuracy, especially since errors are believed to tend to cancel out in high uncertainty estimating and forecasting problems. However, evidence for this from reliable studies is sparse. &

Delphi expert opinion surveys:

"Expert Opinions in Forecasting: The Role of the Delphi Technique" - by Gene Rowe, Norwich Research Park, U.K., and George Wright, Strathclyde Univ. Business School, U.K. - covers the use of Delphi surveys - one of the most widely used and successful techniques for improving forecasts and planning based on expert judgment. "Delphi is a useful method for eliciting and aggregating expert opinion."
&

Delphi groups should include between 5 and 20 experts with disparate knowledge of the subject - should continue through 2 or 3 rounds - with each expert providing average estimates and justification. The procedure includes anonymity, iteration, controlled feedback, and statistical aggregation.

Delphi techniques provide a feedback mechanism for groups of experts that substantially improves expert judgment. It is applicable wherever expert judgment must be relied upon because statistical techniques are not viable or practical.
&
Delphi surveys provide an exchange of information among anonymous experts over a number of rounds - "iterations" - permitting panelists to react to the information thus gathered in each round to perfect their forecasts. The estimates on the final round are averaged to provide a group judgment.
&
The authors conclude that Delphi groups should include between 5 and 20 experts with disparate knowledge of the subject - should continue through 2 or 3 rounds - with each expert providing average estimates and justification. The procedure includes anonymity, iteration, controlled feedback, and statistical aggregation.

Experts should be chosen "whose combined knowledge and expertise reflects the full scope of the problem domain. Heterogeneous experts are preferable to experts focused in a single specialty."

"Domain knowledge" is "expert's knowledge about a situation" - such as managers' knowledge about a brand and its market, or of episodic events - both in the past and likely in the future. Strikes, government regulations, major product or price changes or advertising campaigns or changes in the competitive environment are typical examples of causal factors that will impact outcomes.

Panelists must be permitted to express their opinions and judgments privately to eliminate various social pressures that may arise if they knew who said what. "Ideally, this should allow the individuals to consider each idea based on merit alone."
Proper construction of questions is important. "They should be long enough to define the question adequately so that respondents do not interpret it differently, yet they should not be so long and complicated that they result in information overload, or so precisely define a problem that they demand a particular answer. Also, questions should not contain emotive words or phrases" like the term "radicals." They should be balanced - emphasizing both sides of an issue rather than just one - and should avoid irrelevant clutter.
Iteration permits panelists to change their opinions in light of the information gathered from other experts "without fear of losing face."
For quantifiable forecasts, feedback should be in the form of "a simple statistical summary of the group response, usually a mean or median value, such as the average group estimate of the date which an event will occur."
Inclusion of justification materially improves results. In a form of Delphi called "Nominal Group Technique," the panel assembles after each round to discuss, clarify and justify their responses prior to each succeeding round. However, it is uncertain whether the benefits of this discussion outweigh the problems of loss of anonymity. The authors recommend "eliciting anonymous rationales from all panelists."

Delphi techniques may be more useful for "noncasting - for establishing current plans of action in reaction to current conditions - than for forecasting" the likelihood of change.

An initial unstructured round may be used to permit panelists "to help specify the key issues to be addressed, rather than compelling them to answer a set of questions that they might feel were unbalanced, incomplete, or irrelevant."
The procedure should continue until a degree of stability is achieved - something the facilitator must decide upon. Stability of opinions does not necessarily mean consensus. The authors recommend three structured rounds unless a high degree of instability remains. It should be kept in mind that experts are busy people, and may drop out in sufficient numbers to make subsequent rounds unviable.

The optimal size of Delphi panels is uncertain, with some Delphi surveys including scores or hundreds of members. However, administrative costs in terms of time and money are always considerable and increase with the size of the panel. For quantifiable forecasts, large panels may have no advantage over smaller panels of at least 5 experts.
&
Inclusion of non experts with experts in the survey panels will still improve the results if only the estimates of the experts are used for the final average. However, trying to weight the experts is generally of dubious value, so the ultimate average should generally be of equally weighted estimates - perhaps a trimmed mean excluding extreme results.
&
For laboratory studies, simplified Delphi surveys are commonly used with structured questions from round one and with feedback limited to estimates. The authors note the wide array of Delphi techniques in use in actual practice, and the lack of rigorous testing for all but simplified versions. Indeed, Delphi techniques may be more useful for "noncasting - for establishing current plans of action in reaction to current conditions - than for forecasting" the likelihood of change.
&
The authors also discuss various means of removing bias from probability estimates, and note that the utility of Delphi surveys can vary greatly with the skill of the facilitator.
&

Conjoint analysis:

"Forecasting with Conjoint Analysis" - by Dick R. Wittink, Yale School of Management, and Trond Bergestuen, American Express - covers the use of conjoint analysis - a method "used in marketing and other fields to quantify how individuals confront trade-offs when they choose between multidimensional alternatives."
&

In order to get realistic responses for market surveys, it is necessary to do more than just ask what customers want. The survey must emphasize the tradeoffs participants will have to make. For example, a range of particular product features will be offered for increasing sums of money. Conjoint analysis is a method for designing appropriate questions, administering the survey, and analyzing responses to quantify various tradeoffs.
&
Although practical complexities make it difficult to provide scientific proof of the validity of conjoint analysis, its wide acceptance and use in commerce in various forms indicates widespread satisfaction with its results. Easy to use software is now available to facilitate analysis.
&
The authors cite the example of determining acceptance of a patented elastic waistband added to disposable diapers at stated increases in price. Respondents were asked to rank their preferences for a package of regular diapers at several different prices and elastic waistband diapers at several different prices. In some methods, respondents are asked to provide information on their intensity of preferences in a series of paired comparisons or on a 10-point scale. Various means can then be used to translate responses into numerical values for analysis.
&

A well designed survey should also permit breakdowns according to types of potential customers or for other segmentation of interests. With proper segmentations, a respondent's preferences will depend on the pertinent product characteristics and there will be "no doubt about the causal direction --- the relevant variables are known and specified, and the variations within attributes and covariations between attributes are controlled."

"Of course, the quality of the results will still depend on other study elements, such as the number of attributes, the types of attributes, the definitions of attributes, the number of attribute levels, the ranges of variation and covariation, the allowance for attribute interaction effects in data collection and analysis, and so on."

The survey must

select respondents who are the decision makers for the product market,
represent a suitable probability sample,
are asked questions designed to induce realistic marketplace responses,
about alternatives that "can be fully defined in terms of a modest number of attributes," and
who are properly motivated to provide valid judgments.

A distinction is drawn between surveys about changes in existing products - for which the procedure is readily applicable and analytical results can be repeatedly tested against actual market results - and the introduction of totally new products about which respondents lack familiarity. In the latter case, respondents must first be thoroughly familiarized with the product before the survey. The effectiveness of the survey - and the reliability of the analytical efforts - will inevitably vary widely when applied to totally new products.
&

Generally, it is the success stories that get published. Failures frequently go unreported.

The validity of conjoint-based predictions is inherently difficult to determine because of the dynamic nature of pertinent influences. The authors caution that many other factors besides customer response will impact sales and market share. Among other things, product availability may be limited, advertising levels may limit customer awareness of availability, competing offerings may appear in the market, economic conditions may change, word of mouth influences may develop over time. The authors cover the efforts to test validity, the problems inherent in such efforts, and the relative strengths and weaknesses of various testing methods individually and in combination.
&
Well designed conjoint studies have been shown to reasonably predict market behavior and have been useful in providing management with new information about potential customers. The authors provide a checklist and citations to applicable literature.
&
The authors candidly provide a warning about conjoint analysis literature - ( a warning that is really applicable to all estimating and forecasting literature - as well as to the literature about many other types of practical activities). Generally, it is the success stories that get published. Failures frequently go unreported. Nevertheless, the growing use of conjoint analysis in the marketplace over many years is the best evidence of its reliability and usefulness.
&

Judgmental bootstrapping:

"Inferring Experts' Rules for Forecasting" - by Armstrong - by examining the inputs and resulting predictions, and the characteristics of the forecasting problem, is called "judgmental bootstrapping." It is a type of limited expert system - also called "policy capturing." Bootstrapping - or "linear" - models resemble econometric models except that they use expert forecasts rather than actual outcomes as the dependent variable.
&

Once developed, bootstrap models can provide reliable expert forecasts inexpensively and rapidly, and aid learning and the conduct of experiments. They are less expensive than more elaborate expert systems. They can reveal best practices. They are especially useful when large numbers of forecasts are needed - as in hiring decisions. Armstrong reports that several successful sports franchises use bootstrapping models to assist in selecting athletes.

"It can help to identify and reduce biases, improve reliability, make the predictions by the best experts available for use by others with less expertise, reduce costs of forecasting, and provide forecasts rapidly."

Bootstrap models can be combined with conjoint analysis to forecast sales for new products. They can be used to assure that decisions that might be controversial - like university admissions - are being made fairly and consistently. (But, today, many universities prefer political correctness in admissions rather than any fair estimates based on rates of student success.)
&

Although not as comprehensive or flexible as an expert, they frequently achieve better results than the unaided experts themselves because the bootstrapping models achieve consistency in application.

Bootstrapping models are based only on data that experts use. They apply "only to studies in which an expert's rules are inferred by regression analysis."
&
Indeed, although not as comprehensive or flexible as an expert, they frequently achieve better results than the unaided experts themselves because the bootstrapping models achieve consistency in application. Complexity may undermine the ability of experts to consistently and efficiently analyze causal relationships, or judge feedback.
&
Where relevant data is lacking or of such poor quality as to make econometric models unreliable - (something that occurs far more often than many econometrics analysts would care to admit) - judgmental bootstrapping models may achieve superior results.
&

It is important to use all the variables the expert might use - something that takes careful interviewing to do since the experts might not be aware of some of the variables that influence their judgment.

Bootstrapping models are most appropriate "for complex situations, where judgments are unreliable, and where experts' judgments have some validity." They should be considered for complex problems - but not too complex for practical modeling - where reliable expert estimates can be obtained, feedback is sufficient to judge the reliability of the estimates, and where the alternative is to use the judgment of unskilled individuals. Also, for situations where many different experts are needed, a bootstrap model can be cost effective. If relevant historical data is available, models can be tested for reliability.
&
Unlike other expert systems, bootstrapping models use only the expert's predictions and cues. It is important to use all the variables the expert might use - something that takes careful interviewing to do since the experts might not be aware of some of the variables that influence their judgment. Interviewing a variety of experts in a variety of ways and examining pertinent literature can reveal such variables.

"While it is important to include all important variables, the number of variables should be small. Studies that use regression analysis to infer relationships can seldom deal effectively with more than five variables, and often three variables can tax the system."

If invalid variables are used, the model's consistency will increase unreliability instead of decreasing it.

By employing a wide range of stimulus cases, the applicability of the bootstrapping model can be broadened to cover a wide range of possibilities. Historical examples should be used.

Thus, the model must be competently simplified to include only the vital variables. These then must be quantified in meaningful ways, sometimes based on experts' ratings or on evaluation of previous history. Where good feedback is available - such as in weather forecasting and economic market analyses - the best experts can be identified and their evaluation of variables relied upon. If invalid variables are used, the model's consistency will increase unreliability instead of decreasing it. Developing models based on several different types of experts or groups of experts should help reveal errors and biases and establish best practices.
&
A substantial number of stimulus cases should be presented to test expert practices and use of variables. Armstrong suggests use of something in excess of 20 examples - and use in excess of 100 examples may sometimes prove useful. By employing a wide range of stimulus cases, the applicability of the bootstrapping model can be broadened to cover a wide range of possibilities. Historical examples should be used. "It is particularly important to introduce variations for factors that have been constant in the past but might change in the future," Armstrong emphasizes.

"The predictive validity of the bootstrapping model is not highly sensitive to the type of regression analysis used. More surprisingly, it is typically not sensitive to the estimates of the magnitudes of relationships. The key steps then are to (1) obtain the proper variables, (2) use the correct directions for the relationships, and (3) use estimates for relationships that are approximately correct."

Obviously, if and as actual results become available, the model should be recalibrated to improve results. All models fail when major unexpected changes occur - but since this is infrequent, it has not been a problem in practice.

Trend analysis:

"Forecasting Analogous Time Series" - by George T. Duncan, Wilpen L. Gorr and Janusz Szczypula, School of Public Policy, Carnegie Mellon Univ. - covers the ancient and widespread use of analogies for trend forecasts - both judgmental and statistical. For commercial purposes, trend analysis can provide highly reliable forecasts.
&

Pooled data provides additional data, extending the sample size.

Pooling methods must be checked over time to eliminate divergent time series.

Techniques for examining analogous time series - with mechanisms to recognize step jumps, turning points, and time trend shape changes - are discussed.
&
By analyzing data from several analogous time series - "pooling" - estimating and forecasting reliability can be substantially increased - especially where sample size is small and data points are few. Pooling methods and grouping methods should be kept simple - expert judgment or co-movement clustering prove superior to sophisticated model-based clustering. Pooled data provides additional data, extending the sample size.

"[S]everal phenomena and business practices give rise to analogous products [and other analogous dependent variables] that organizations must forecast. The pooling methods we discuss are intended primarily to draw on pooled time-series data from analogous variables to improve forecast accuracy. The additional benefit of pooling is forecast explanation, especially for univariate forecasts. In explaining a forecast, one can use additional information from equivalence groups that all or most members of analogous products persist in trends, or have similar trends."

Pooling can include similar products, services or other subjects in the same geographic areas - or the same products, services or other subjects in different geographic areas or time periods. "To be useful in forecasting, analogous time-series should correlate positively - - - over time." This co-variation can enhance precision and quickly adapt to pattern changes such as step jumps or turning points. Analogous groups may be identified by several methods - expert judgment, correlated movements, model-based clustering, and/or total population pooling.
&
The authors provide examples of different "equivalence groups" divided spatially or economically to forecast such things as marketing trends, economic trends, or the spread of infectious disease. Suitable equivalence groups for pooling purposes must be identified, and each individual time series must be standardized to eliminate differences in magnitude. Pooling methods must be checked over time to eliminate divergent time series.
&

Bayesian pooling is emphasized by the authors. Various statistical methods based on sample means or sample standard deviations or percentages of totals allow the use of less than ideal data by removing differences in magnitudes and variances. Standardized data is constantly recalculated. The construction and combining of local models and group models for Bayesian pooling is explained.
&
The "shrinkage" formulas that yield weights for combining local and group parameter estimates are explained and an example provided. The manner in which these weights react to volatility results in rapid and accurate adjustments to new patterns. Time series should be monitored to manually switch shrinkage rates for pattern changes where practical.
&
Assessment of pooled forecasting is more complex than for conventional time-series methods, and has never been rigorously done. The authors present their suggested specifications for such studies.
&
The authors suggest that pooling for time series analyses are most useful

when time series are highly volatile,
when "outlier data points" are present, and
when time series patterns differ greatly within cluster groups and their is strong co-movement within cluster groups.

Trend analysis:

"Extrapolation for Time Series and Cross Sectional Data" - by Armstrong - is a widely used, reliable forecasting method whenever past behavior is a good predictor of future behavior. It is objective, inexpensive, quick, easily automated, and replicable. Risks arise if relevant variables are overlooked - or if there is a sudden break in trend.
&

Developing, recognizing and using appropriate data is essential for reliable analysis, and depends on the skill - and intentions - of the analyst.

Extrapolation is typically used for inventory and production forecasts up to two years into the future, and for some long-term forecasts of stable trends such as population forecasting. "Researchers and practitioners have developed many sensible and inexpensive procedures for extrapolating reliable forecasts."
&
Extrapolation - like all mathematical reasoning - can only be as good as the data employed. Armstrong notes an example. Income data indicate poverty increased in the U.S. after 1972 - but consumption data demonstrated significant declines in poverty. (This example has been repeatedly referred to in FUTURECASTS.) Developing, recognizing and using appropriate data is essential for reliable analysis, and depends on the skill - and intentions - of the analyst.

The figures don't lie - but liars can figure! Although this book recognizes bias factors, it properly deals primarily with good faith employment of statistical analytical methods. It is well to keep in mind that there is always ample opportunity for abuse of these methods by advocacy scholars and other propagandists.

Data may be derived from historical records of the situation studied, or of analogous situations - from laboratory experiments - or from field experiments. Historical data is best for forecasting small changes but worst for forecasting large changes, where field experiments perform best. Drawing data from analogous situations is the worst of the three methods for small changes and the middle choice for large changes.
&

Armstrong advises against relying on cycle forecasts unless the length and amplitude are fairly certain - as for daily use of electricity. Broader cycle forecasts for economic or sociological trends have proven very inaccurate.

Analyst skill (and intentions) also play major roles - in structuring the problem - segmentation of the problem - making inflation and similar adjustments - "cleaning the data" to correct for erroneous data - identifying and eliminating "outlier" data - applying statistical techniques such as various averaging methods - aggregating intermittent data across time or geographic area - adjusting for the impact of sporadic events such as strikes or wars - and adjusting for seasonal influences by a variety of statistical methods.
&
The extrapolation process itself may require considerable judgment to deal with

multiple and not completely consistent data sources - such as various political polls, or
recent major changes that have not yet been reflected in the data - like an advertising campaign or price change for the product or a competing product, or
events that will "bend the trend" into some substantially new course - like an advertising campaign or price change.

Armstrong advises that simplicity usually aids reliability when choosing an extrapolation method. He also advises that the most recent data should be the most heavily weighted unless "the series is subject to substantial random measurement errors or to instabilities" which might be magnified by such weighting. "Exponential smoothing" of moving averages may be especially useful for short period forecasts, but its benefits decline for longer forecasts. He also advises conservatism - especially when dealing with volatile trend lines or periods of trend acceleration in long term forecasts.
&
He points out that, while there are statistical methods for evaluating which extrapolation method to use, the "structured judgment" of the analyst remains important. Today, "computerized searches for the best smoothing constants [grid search routines] can help improve accuracy."
&
Of course, extrapolations should be recalculated as new data comes in to improve accuracy. Some care must be taken to recognize transient events - like a strike - but such recalculation translates a long term forecast into a series of short term forecasts which will usually be more reliable.
&
Armstrong advises against relying on cycle forecasts unless the length and amplitude are fairly certain - as for daily use of electricity. Broader cycle forecasts for economic or sociological trends have proven very inaccurate.
&

Methods for estimating forecast reliability - "prediction intervals" - are also reviewed by Armstrong. Reliability problems naturally increase greatly with the length of the forecast period. Prediction intervals generally err on the optimistic side and are set too narrowly. However, where asymmetric errors are common - as in management and social sciences studies - errors will occur on both the narrow and wide side. Domain knowledge must be relied upon to evaluate the likelihood of substantial changes in causal forces.
&
Extrapolation can be useful and effective where large numbers of forecasts are needed, as in inventory forecasts - and for stable situations where there is little knowledge of causal changes or where adjustments for known causal changes are possible. They are less subject to unintentional forecaster bias. They are useful to establish policy outcome benchmarks.
&

Neural networks:

"Neural Networks for Time-Series Forecasting" - by William Remus, College of Business Administration, Univ. of Hawaii, and Marcus O'Connor, Univ. of New South Wales, Australia - perform best for (1) monthly and quarterly forecasting, rather than annual forecasting, (2) for forecasting based on discontinuous series, and (3) for forecasts that are several periods out on the forecast horizon. Less complex and less expensive time-series methods should be used for less complex and shorter term forecasting problems.
&

Many of the suggestions applicable to traditional extrapolation models apply to neural networks. This is a relatively new kind of technique, and experimentation and development are ongoing.
&
The authors provide a commonly used "back propagation" example stressing the weighting of inputs, the inclusion of mathematical adjustments - "node bias" - and feedback mechanisms to sharpen these mathematical techniques. "Learning occurs through the adjustment of the path weights and node bias." Various adjustment techniques are available. Since the networks of neurons contain many "nonlinear neurons," the networks can capture fairly complex phenomena.
&
"They excel in pattern recognition and forecasting from pattern clusters." They have the ability to approximate other forecasting models, but where other models perform well, the other models will be easier to develop and use. Because of their inherent complexity, neural networks need a substantial number of observations. "Neural networks have more parameters to estimate than most time-series forecasting models."
&

Particular suggestions include:

The data must be inspected and cleaned of "outliers" prior to model building. "Outliers make it difficult for neural networks to model the true underlying functional form."
The data should be scaled and deseasonalized prior to estimation of the forecasting model.
Available software should be used to help find good starting points and achieve proper modeling results and network size. Oversized networks - besides complicating use - tend to produce poor results. Applicable software is being constantly improved, so recent reviews should be consulted before purchase.
Of course, results should be constantly checked to assure that the neural network model is valid.

Rule-based time-series extrapolation:

"Rule-Based Forecasting Using Judgment in Time-Series Extrapolation" - by Armstrong, Monica Adya, Dep't. of Management, DePaul Univ., and Fred Callopy, The Weatherhead School, Case Western Reserve Univ., -applies judgment from forecasting expertise and domain knowledge to develop and apply rules for combining extrapolations. Expectations about trends - "causal forces" - are identified to assign weights to extrapolations.
&

Conditions where rule-based forecasting improves accuracy include: when long interval - annual or longer - data are used, good domain expertise is available, causal forces are clearly identified, causal forces conflict with trend, significant trends exist, uncertainty and instability are not too high, and/or there is a long-range forecast horizon. Domain knowledge seems to be the most important of these.

"Empirical results on multiple sets of time series show that [rule based forecasting] produces more accurate forecasts than those from traditional extrapolation methods or equal weights combined extrapolations. [Rule-based forecasting] is most useful when it is based on good domain knowledge, the domain knowledge is important, the series is well-behaved [such that patterns can be identified], there is a strong trend in the data, and the forecast horizon is long."

Even where these conditions don't apply and rule-based forecasting offers little or no advantage, rules based on causal forces still improved "the selection of forecasting methods, the structuring of time series, and the assessment of prediction intervals."
&
Cost in time and money - and short forecasting time horizons - are major limitations on the use of rule-based forecasting. On the other hand, econometric models should perform better where large changes are involved and budgets and data on causal variables are adequate.
&
Expert systems can also be costly to maintain when domain knowledge is changing rapidly. Changes in pertinent priorities or the loss of development personnel needed for maintenance of the system may lead to abandonment.

Always, even where mathematical methods are most applicable, the skill and judgment of the analyst determines the reliability of the result.

In commercial forecasting, domain knowledge is managers' knowledge of episodic events - both past and likely in the future.

In commercial forecasting, domain knowledge is managers' knowledge of episodic events - both past and likely in the future. Strikes, government regulations, major product or price changes or advertising campaigns or changes in the competitive environment are typical examples of causal factors that will impact trends. Expert knowledge is also used to employ such statistical methods as combining forecasts and dampening trends.

"Using production [if-then] rules, [rule based forecasting] determines what weights to give to the forecasts. Features of the situation are identified in the condition [if] part of the rules, and weights are adjusted to match features to the underlying assumptions of the methods. In effect, [rule-based forecasting] uses structured judgment to tailor extrapolation methods to situations."

Applicable experts rules and their proper usage can be found in the growing literature about the subject. Research about applicable techniques has been derived not just from interviewing expert practitioners, but from actually observing them in action. There are now roughly 100 conditional statements derived for rule-based forecasting. Some of the most broadly applicable are:

Decomposition of time-series into level and trend is useful because "domain knowledge can affect the estimates of level and trend differently."
Simple methods are generally as accurate as complex methods, and reduce potential for mistakes. Complex methods should generally be used only for the conditions they were designed for.
Combining forecasts from different methods almost always improves accuracy. The authors refer to random walk regression against time, and various linear exponential smoothing techniques, damped trend, and "robust trend" models.
Use different methods for short, medium, and long term forecasts. Random walk is most useful for short term, exponential smoothing for medium term, and linear regression for long term, with the various results "blended" by weighting them in relation to the forecast horizon.
To reflect uncertainty, trend damping should be applied as forecast horizons lengthen or other uncertainties are identified.

Factors affecting trends are obtained from the cognizant domain experts and are divided into factors promoting "growth" or "decay" or "opposing" historical trends or tendencies - or "regressing" towards the mean, or "supporting" the trend. The latter is theoretical, since they are generally assumed in the identified trend. Major events whose sum impact is "unknown" should also nevertheless be identified. Adjustments for episodic events in the past or expected should be made by various statistical methods.
&
Decomposing forecasts to reflect causal forces that act in different directions is risky but can increase accuracy in some instances. The value of such decomposition must be tested to ascertain if errors are cumulating.
&
The authors provide checklists for the features of time-series - the elements of domain knowledge and applicable historical data - and the appropriate weighting of different extrapolation methods combined for forecasts.
&
Thus, traditional extrapolation models do well when the historical trend is reinforced by identified causal forces. However, if identified causal forces are contrary to historically estimated trends, the trends should not be used. Sometimes, differences in basic and recent trends permit use of one but not the other in line with causal forces. Causal forces do not always support the trend.
&
Further, when a recent trend differs from the basic trend and causal forces are unknown, the trend extrapolation should be conservative. Uncertainty should generally favor conservative methods.
&
Models and weighting must be appropriate for short, medium, and long term forecast horizons. The presence of discontinuities increases the weighting of the latest observations - while instability and uncertainty favor equal weighting. The current level "can be adjusted in the direction implied by the causal forces."
&

Expert systems:

"Expert Systems for Forecasting" - by Callopy, Adya, and Armstrong - should be used in cases involving many forecasts - for complex problems where data on the dependent variable are of poor quality - and where problem structure is inadequate for statistical techniques but adequate enough for the development of applicable rules. Examples include oil well drilling, product introductions, and choice of medical drugs.
&

Expert systems are far less costly to use than employing experts for estimation and forecasting work. And, they assure consistency.

Establishment and improvement of expert systems requires considerable research involving textbooks, research papers, interviews, surveys, and especially protocol analyses into the methods of cognizant experts. However, once developed, they should be easy to use. They should incorporate best practices and knowledge, and reveal applicable reasoning. They are far less costly to use than employing experts for estimation and forecasting work. And, they assure consistency.
&

Actual analysis of the experts in action is often needed where the experts themselves cannot explain or have no conscious knowledge of their actual practices.

The authors discuss the major methods for detailing how experts work. Actual analysis of the experts in action is often needed where the experts themselves cannot explain or have no conscious knowledge of their actual practices. However, this is very costly in time and money. Several experts should be studied. Where econometric studies are available, they can be very helpful. Judgmental bootstrapping and conjoint analysis can also be helpful where applicable.
&
Simplification is - here, too - a major virtue. "An expert system should not impose cognitive strain on its users." Simplification can consist of a structure that is intuitive, or that makes "reasonable assumptions about defaults that apply to most common situations" and that is designed to be altered only as needed. Most models use "if-then" production rules.
&
Nevertheless, completeness - encoding all key variables of the forecasting problem - is essential to meet the expectations of users. It is also essential because omitted features - even when otherwise widely applied - will tend to be overlooked by the user relying on a model that doesn't include them.
&
Disclosure of applicable reasoning is essential to facilitate proper use, to develop further improvements, to resolve anomalies, and to learn best practices. As with all forecasting techniques, testing for reliability should both precede and accompany use.
&
Expert systems have many other uses - such as planning and monitoring, design and interpretation, and providing professional advice.
&

Econometric models:

"Econometric Forecasting" - by P. Geoffrey Allen, Dep't. of Resource Economics, Univ. of Massachusetts, and Robert Fildes, The Management School, Univ. of Lancaster, U.K. - has had its problems. However, results seem to have improved since the 1980s with the development of the "vector autoregression" approach - now the most widely used type of model - which increases emphasis on dynamics.
&

Econometric forecasting methods have been beaten by less sophisticated methods "more often than they should." "The problem is that we are some way from knowing the correct strategy and principles, although we do seem to be making progress."

Still, model building efforts can go wrong at a number of points, so development of a well-specified model requires high levels of professional skill. As might be expected, broad validation of econometric methods is difficult and scarce since all models are so different.
&
Econometric forecasting methods have been beaten by less sophisticated methods "more often than they should." "The problem is that we are some way from knowing the correct strategy and principles, although we do seem to be making progress."

In macroeconomic forecasting, econometric forecasts are dependent not only on the skill of the model builder, but also on the validity of economic theory - and the accuracy of economic statistics - which are often the cumulative product of varied accounting treatments designed for other purposes than the generating of economic statistics.
&
As has been highlighted by recent accounting developments, accounting is a nebulous practical art which - at best - performs remarkably well with very difficult subject matter for its designed business purposes - but in no sense produces data that is as precise as it appears or that is valid for purposes of economic analysis. The economic theory for most macroeconomic forecasts relies on Keynesian concepts that frequently confuse cause and effect and are clearly invalid at many points.

"If your carefully developed and tested econometric model cannot forecast better than such a simple alternative, all your effort has been in vain."

However, for a wide variety of microeconomic and other narrower purposes - where the quantitative inputs are known and the causal variables can be accurately evaluated - econometric analysis has been performing with increasing reliability.
&
Armstrong advises use of econometric methods "when (1) the causal relationships can be estimated accurately, (2) the causal variables change substantially over time, and (3) the change in causal variables can be forecasted accurately." Appropriate testing is essential to determine whether the first and/or third requirements are not being met.
&
Here, too, to assure reliability, the model should be simplified as much as possible - but without compromising the sophistication needed to include all substantial variables. The authors provide advice on the following tasks.

Defining the objectives of the modeling effort.
Determining the applicable variables based on pertinent economic theory or other work, and then reducing the number to the essential few.
Collecting all the data, and adjusting it for episodic events like wars, strikes, regulatory changes, etc. Where essential data doesn't exist or is inadequate, proxy data must be identified and collected instead.
Constructing the model. The authors stress vector autoregression models.
Estimating the model to suit the patterns of the data.
Testing the model to confirm its specifications. Since any econometric model will fail some tests, judgment is required when determining the changes called for by the tests.
Testing the specifications to aid in simplifying the model.
Comparing the relative accuracy of the model to the accuracy of simpler alternatives. "If your carefully developed and tested econometric model cannot forecast better than such a simple alternative, all your effort has been in vain."

Other uses for econometric models include as assistance in strategy analysis or policy analysis.

There are a surprising number of users and academics who question whether such models have to be valid enough for forecasting purposes. How invalid models can be good for anything more than as propaganda to impress the credulous is a good question.

Many analysts will force the technique or techniques they are most familiar with on the problem at hand. Selection of an inappropriate method may lead to large errors.

Structured judgment - when competently done - has been shown to enhance reliability and validity.

The relative success of financial advisers, for instance, is seldom known or accurately evaluated. Also, past success is no guarantee of future success - especially if the relevant past was relatively stable.

Armstrong examines six general frequently applicable considerations.

Convenience - a major consideration where cost is an issue or achieving best levels of precision is not critical. These are common occurrences.
How the client may unwittingly be making the choice when choosing an analyst. Many analysts will force the technique or techniques they are most familiar with on the problem at hand. Selection of an inappropriate method may lead to large errors.
Market popularity - using what other people in similar circumstances are using, can simplify the choice process. However, details that might affect choice are rarely reported. Were long terms or short terms involved? - industrial or consumer goods? - existing products or new? - for large firms or small? - and what were the particular questions asked? - and with what success?

Usage is frequently unrelated to efficacy. "Forecasters use expert opinions even when ample evidence exists that other methods are more accurate, - - -." Since usage surveys frequently overlook methods such as role playing, judgmental bootstrapping, conjoint analysis, and expert systems, "market popularity is the enemy of innovation."

Structured judgment - when competently done - has been shown to enhance reliability and validity. A list of important evaluation criteria is created and weighted. Accuracy, ease of use and ease of interpretation are generally high on the list. Then, experts familiar with the forecasting problem can be independently surveyed to rate the methods. A Delphi survey may be useful.
Statistical criteria - can be usefully employed to facilitate the choice from among similar quantitative methods - such as whether to employ seasonal factors or trend dampening methods in extrapolation forecasts - or which econometric model to use. Statistical criteria are of little use - and may be misleading or blind to appropriate alternatives - when the choice is among substantially different methods.
Relative track records - assessed in a systematic, unbiased and reliable way - can be useful - but must be more than just user satisfaction. Unfortunately, good records of success are seldom kept, and records of failure are even more scarce. The relative success of financial advisers, for instance, is seldom known or accurately evaluated. Also, past success is no guarantee of future success - especially if the relevant past was relatively stable.

"An operational definition of simple is that an approach can be explained to a manager so clearly that he could then explain it to others."

Armstrong provides suggestions - "principles" - from published research.

The forecasting process should be a structured process developed and tested by competent professional research.
Quantitative methods are superior to judgmental methods if enough usable data exists. However, the forecaster must be reasonably competent with the method selected, and unnecessary complexities should be eliminated.
Methods evaluating causal influences are superior to simple statistical methods such as extrapolation - especially where large changes are likely or where forecasting horizons are long.
Simplicity is a virtue - complexity should be avoided unless essential. "An operational definition of simple is that an approach can be explained to a manager so clearly that he could then explain it to others."

The author provides a "selection tree" graphic for forecasting methods - with explanations for usage - much of which duplicates material in earlier chapters - but which is usefully brought together here.
&

Combining:

"Combining Forecasts" - by Armstrong - emphasizes one of the book's most important repetitive themes. Accuracy can be increased by combining forecasts based on different methods and drawing from different sources of information - and also by using different forecasters. Uncertainty - about the situation or best methodology - and a need to avoid large errors - favor the combining of forecasts.
&

"Combining can reduce errors arising from faulty assumptions, bias, or mistakes in data." Weather forecasters have found that combining longer term forecasts with short term forecasts as they arise improves results. However, combining similar forecasts risks accentuating positively correlated errors. "When inexpensive, it is sensible to combine forecasts from at least five methods."

"Even if one can identify the best method, combining may still be useful if other methods contribute some information. If so, differential weights may improve accuracy."

The various familiar averaging methods and weighting methods are discussed by the author. Unequal weighting risks the introduction of bias, but is useful when track records are strong.
&
Armstrong advises:

"Use combined forecasts when more than one reasonable method is available and when there is uncertainty about the situation and the selection of method. Draw upon forecasts that make use of different information, such as forecasts from a heterogeneous group of experts. Use methods that analyze data in different ways.
&
"If your have five or more forecasts, trim the mean, for example, by dropping the highest and lowest forecasts. Equal weighting provides a good starting point, but use differential weights if prior research findings provide guidance or if various methods have reliable track records in your situation. To the extent that these sources of evidence are strong, one can improve accuracy with larger departures from equal weights."

Generally, the gains from combining increase as the forecasting horizon lengthens and also as the differences in the different forecasts increase. Indeed, a "structured approach for combining independent forecasts is invariably more accurate" than forecasts made by groups of experts.
&

Integrating judgment with quantitative forecasts:

"Judgmental Time-Series Forecasting Using Domain Knowledge" - by Richard Webby, Telecordia Technologies, and O'Connor and Michael Lawrence, School of Information Systems, Univ. of New South Wales, Australia - emphasizes a particular aspect of the combining theme - that reliability and accuracy can be enhanced by making appropriate judgmental adjustments in quantitative methods. Studies demonstrate that "available and valid domain knowledge improves forecast accuracy." This is a theme repeated in all the statistical methods chapters.
&

Reliability and accuracy can be enhanced by making appropriate judgmental adjustments in quantitative methods.

Expert judgment may be essential in dealing with "soft" information or relatively inaccessible domain knowledge, but risks introducing biases or inefficiencies.

"Domain knowledge should be used when there is a large amount of relevant information, when experts are deemed to possess it, and when the experts do not appear to have predetermined agendas for the final forecast or the forecast setting process. Forecasters should select only the most important causal information, adjust initial estimates boldly in the light of new domain knowledge, and use decomposition strategies to integrate domain knowledge into the forecast."

"Forecasters are typically left making judgmental adjustments to objective forecasts for a variety of hard-to-model factors," like a new promotional campaign of a client or competitor - product diffusion rates - government regulations - new technological developments - production problems - things that cause discontinuities or trend changes.
&
This is true even though pure quantitative methods frequently outperform pure judgmental methods. By combining the two, bias and inefficient consideration of available data - which increases as the amount of data increases - can be substantially reduced by a structured presentation in quantitative form. Structure can also help experts integrate multiple factors into their judgments. Decomposition of the problem can help deal with particular causal factors. By using independent domain experts, bias can be further reduced.
&

Adjustments:

"Judgmental Adjustments of Statistical Forecasts" - by Nada R. Sanders, Dep't. of Management Sciences and Information Systems, Wright State Univ., and Larry P. Ritzman, Operations and Strategic Management, Boston College - emphasizes the need to adjust statistical forecasts to conform to judgmental factors in appropriate cases.
&

"Statistical models are only as good as the data upon which they are based."

Judgmental adjustments have been shown to be useful for macroeconomic forecasts.

Judgment must be used to assure valid data inputs. "Statistical models are only as good as the data upon which they are based." (GIGO - "garbage in, garbage out" - is the first law of mathematical reasoning.) As might be expected, studies show significant increases in forecast accuracy when domain knowledge is applied to statistical forecasts. Judgmental adjustments have been shown to be useful for macroeconomic forecasts. (No surprise here!)

The weaknesses in modern economic theory based on Keynesian concepts, and the weaknesses in available economic data inputs, makes reliance on judgment essential in macroeconomic forecasting, and any appearance of precision in statistical methods illusory. The most that can be expected are trend acceleration or deceleration, turning points, and timing approximations. Nor is any more required for policy purposes, since economic policy is invariably a blunt instrument at best.

Poor record keeping is a widespread forecasting weakness that undermines evaluation and hinders efforts at improvement of forecasting methods.

Biases include optimism, wishful thinking, lack of consistency, and manipulation for political, ideological, economic, or other self serving interests. They can be innocent or deliberate.

Adjustment processes should be structured, documented, and periodically checked for accuracy. The authors mention several methods for structuring the judgmental adjustment process. If the process can be mechanically integrated, that should be considered. The magnitude of adjustments, the process used and the reasoning should all be recorded. Poor record keeping is a widespread forecasting weakness that undermines evaluation and hinders efforts at improvement of forecasting methods.
&
Statistical forecasts should be adjusted:

When there is important domain knowledge that may materially affect outcomes.
When there are discontinuities - pattern changes - or a high degree of uncertainty in the data.
When there are known changes or episodic events in the forecasting environment - such as advertising or promotional campaigns, strikes, production or distribution constraints - past, current or future.
When the benefits of improved accuracy justify the expenditure of money and expert time.

Inefficient consideration of data becomes increasingly likely as the number of judgmental factors increases and causes "judgment overload." However, both this problem and bias can be countered by structured approaches.
&
Biases include optimism, wishful thinking, lack of consistency, and manipulation for political, ideological, economic, or other self serving interests. They can be innocent or deliberate. (They can be the result or cause of authoritative myths - a major concern of FUTURECASTS.)

Monitoring forecasts:

"Learning from Experience: Coping with Hindsight Bias and Ambiguity" - by Baruch Fischhoff, Dep't of Social and Decision Sciences, Carnegie Mellon Univ. - emphasizes that the obtaining of accurate feedback is essential for evaluation, learning and professional development - and for dealing with hindsight bias and ambiguity. Of course, both hindsight bias and ambiguity may be "motivational" rather than "cognitive" - an attempt to achieve or retain personal status. (It might also be part of an intentional effort to further some political, economic or ideological objective.)
&

The obtaining of accurate feedback is essential for evaluation, learning and professional development - and for dealing with hindsight bias and ambiguity.

Hindsight bias is "the tendency to exaggerate in hindsight what one was able to predict in foresight -- or would have been able to predict had one been asked." It obscures past forecasts, the bases of those forecasts, and the real achievements of the forecaster.
&
Ambiguity is caused by the natural limitations of language. It "makes it difficult to understand the substance or rationale of forecasts." (This is an ancient problem - well known to the clients of the oracle at Delphi.) Verbal quantifiers - such as "rare," "likely," or "severe" - can be a major source of ambiguity. The author advises use of numerical scales.
&

Feedback should be obtained in a formal way - with accurate written records of forecasts, reasoning, results and analyses.

Feedback should be obtained in a formal way - with accurate written records of forecasts, reasoning, results and analyses. Detailed records facilitate the testing of methodology and data sources, and evaluation of whether error was avoidable or just within a normal range of variability. Detailed written records also protect the forecaster from Monday morning quarterbacks by setting forth all the known conditions and alternative possibilities that existed in the absence of 20/20 hindsight. Most important of all, detailed written records also provide information to the recipients of forecasts to help proper understanding and use.
&

Extrapolation methods:

"Population Forecasting" - by Dennis A. Ahburg, Carlson School of Management, Univ. of Minnesota - demonstrates how forecasting methods and principles have been applied in demographic studies. Even this most precise of forecasting fields has significant problems.
&

Experts tend to simply extrapolate trends - and are routinely caught short when trends change.

Disaggregation is a predominant method. Population groupings such as age, sex and race - per time unit - within particular geographic areas - are common characteristics broken out for separate measurement.
&
Decomposition evaluates the population by such factors as fertility, mortality, and migration.
&
However, studies show no increase in accuracy over aggregate population forecasts. The author discusses when disaggregation and decomposition are likely to be beneficial - and when not. For example, population forecasts will not be accurate if changes in education rates are not taken into account. Forecast error rates are considerably higher for small population groups than for large groups - considerably higher for groups experiencing rapid increase or decline than for more stable groups.
&
Expert judgment is also essential - for establishing assumptions about such inputs as fertility, mortality and migration - and for proper use of a variety of methodology factors. However, experts tend to simply extrapolate trends - and are routinely caught short when trends change. Ahlburg cites the failure of the Census and Social Security agencies to forecast either the beginning or the end of the baby boom. Different choices on methodology "can lead to different forecasts based on the same data." Inaccurate assumptions and data inputs about base period populations can lead to significant extrapolation errors.
&
Ahlburg reviews studies of econometric models that take socioeconomic factors into account in their population forecasts. As usual, combining forecasts derived from different methods and inputs improves accuracy.
&

Test market models:	"Forecasting Trial Sales of New Consumer Packaged Goods" - by Peter S. Fader, The Wharton School, Univ. of Pennsylvania, and Bruce G. S. Hardie, London Business School, U.K. - discusses appropriate usage and best practices for forecasting new packaged goods sales. &
	Since promotional expenses can be many times more costly than development expenses - especially for inexpensive packaged goods like shampoo - accurate forecasts of both initial sales and repeat purchase rates are vital factors in the decision to launch. Analogy based methods drawing on prior experience with similar goods, as well as other judgment based methods, are inexpensive and appropriate at certain early stages of product development, but as the need for major commitments arises, more formal pretest market models and test market models become advisable. There have been no reported studies of the reliability of subjective or analogy based methods, and the frequency of new product failure speaks volumes for the need for better forecasting methods. &
	Test market models measure both initial and repeat purchases during some initial test marketing period to quickly project the likely success of the product. The applicability of models is important since full fledged test marketing is expensive in time as well as in money - giving competitors time to react. & Pretest market models attempt to weed out likely failures and avoid the cost of any test marketing. These models are based on surveys of intentions concerning "repeat" purchases by consumers who have been exposed to the product by means of samples or who have voluntarily bought the product from mock test stores. The authors could find no studies of the relative reliability of the various pretest market models - something rendered more complex by the need to evaluate both of the two separate components - initial sales penetration rates and repeat purchase rates. & But market research firms have set up "controlled test markets" in small cities for application, evaluation and development of test market forecasting methods. They use extrapolation methods to forecast initial penetration rates and extent of penetration as well as repeat purchase rates. & As usual, simple exponential distribution models appear more successful than more complex efforts as long as the model accounts for marketing decision variables such as advertising and promotion efforts and for different rates of product trial by different types of consumers. The authors provide several other suggestions for best practices. &

Books:

"Diffusion of Forecasting Principles through Books" - by James E. Cox, Jr., and David G. Loomis, Illinois State Univ. - evaluates 18 forecasting books - and finds that they concentrate on understanding and application of technique with at best limited attention to the identity and evaluation of best practices.
&

Software:

These programs are especially poor in assessment of uncertainty and all have other significant weaknesses that make forecasting expertise essential for proper use.

"Diffusion of Forecasting Principles through Software" - by Leonard J. Tashman, School of Business Administration, Univ. of Vermont, and Jim Hoover, U.S. Navy Dep't. - evaluates four categories of commercially available statistical forecasting method software that employ time series data - including 15 individual programs: (1) spreadsheet add-ins, (2) forecasting modules of general statistical programs, (3) neural network programs, and (4) dedicated business forecasting programs.
&
These offer many advantages, and are rated somewhat higher than the books at covering best practices. However, they are especially poor in assessment of uncertainty and all have other significant weaknesses that make forecasting expertise essential for proper use. Judgment and skill are required for such tasks as "setting objectives, structuring problems and identifying information sources," among others.
&

"[I]t has yet to be shown that multi-equation models add value to forecasting."

Dedicated business forecasting programs were rated as the best - and are especially strong in method selection, implementation, and evaluation. Three of them "contain features designed to reconcile forecasts across a product hierarchy, a task this group performs so commendably it can serve as a role model for forecasting engines in demand-planning systems."
&
Demand planning software was omitted as unavailable to the reviewers. Econometric software that can be used to develop multi-equation causal models to forecast business and economic series - and artificial neural networks for financial and economic forecasting - can be used by suitably sophisticated forecasters - but these were not evaluated in this article.

"[I]t has yet to be shown that multi-equation models add value to forecasting."

Software that does not employ time series data - such as conjoint analysis software and other software designed to enhance and support judgmental forecasting - are also not covered. Many of the time series programs covered do offer elements that employ some of these methods.
&
Also not evaluated are such factors as ease of learning and use - sophistication required - technical support - availability of training and support. Reviews about individual software packages are available on the "Principles of Forecasting" web site: http://hops.wharton.upenn.edu/forecast.

The "Cassandra" forecasting method:

FUTURECASTS relies upon judgmental methods for its forecasts since it is concerned predominantly with subject matter in the non scientific practical arts. It relies heavily on trend analysis - but is always alert to the need to recognize and take into account the new influences constantly arising. All historic trends are subject to the rapid changes in the modern world's political, economic, military, societal, and technological environment.
&

History opens windows on the future.

There is nothing that is completely new under the sun - but nothing ever precisely repeats itself.

Crisis is opportunity!

However, FUTURECASTS has another - unfortunately extremely useful and reliable - forecasting method. The "Cassandra" forecasting method is one of the most ancient in recorded history. Armstrong's otherwise excellent handbook totally ignores this very useful and incredibly accurate forecasting method - which the publisher of FUTURECASTS has been using successfully since the middle of the 1960s.
&
Best practices principles for the Cassandra forecasting method include:

Examine the people in high decision making posts.
Determine the conflicting interests that predominate over organizational or national interests.
Examine the concepts of academics and intellectuals influential in the pertinent professions and other pertinent practical arts.
Determine what the gross stupidities are in the pertinent authoritative myths.
Determine what significant policy errors are currently being pursued, and the reasons for those errors - an important factor in determining the difficulty of correcting those errors prior to some noticeable system failure.
Determine what significant reverses will be suffered as a result of those policy errors. Evaluate extent and timing.
Practice objectivity in a disciplined manner. Accurately acknowledge feedback - both expected and unexpected - to ascertain whether analytical conclusions are firmly attached to reality and when alterations of conclusions are needed.
Always recognize the tremendous strengths of the United States - the incredible steadfastness of the people in the face of periods of adversity - the tremendous resilience and productivity of any relatively flexible capitalist system - and the likelihood of adequate reform once policy failures manifest themselves and become evident. Crisis is opportunity!