Advanced Analytics

Some aspects of the solutions based on a software based analysis framework are that they need to act more “intelligent” and be more tolerant of uncertainty than traditional software based systems. These characteristics are, to some extent, present in the way that humans approach the same kinds of problems. Although the purpose of an analysis framework is not necessarily to mimic biological thought processes, there is sufficient common ground to make it a logical place to begin a design.

From one standpoint, there are basically two ways that human analysts can approach a given situation. Both cases amount to dealing with the problem as more manageable parts which are either more easily understood or more deterministic than attempting to deal with the entire problem at once. The design of the framework must be able to support both forms of analysis.

Top-down – Prediction driven processing – addresses questions of the type ” Why is this?” In this case the decomposition process identifies what information is necessary or at least desirable before a statistically valid inference can take place. For each of the required pieces of information the process repeats until either all mandatory information is obtained from atomic values or a roadblock is hit where one or more required pieces of information are unavailable and can not be estimated by other means, in which case the deficiency is simply reported. If all the necessary information can be collected then it is applied according to the system model to produce the resulting “answer”.

Bottom-up – Data driven processing – addresses questions of the type ” What does this mean?” The bottom up process is essentially a sensemaking exercise where there initially exists some amount of basic information (observations) that need to be processed into progressively more relevant or understandable forms. The figure below illustrates a stack that is equally relevant for either top-down or bottom-up forms of reasoning. Each successive layer is derived from information in the layer below it. The intent of the reasoning framework developed in this investigation is to provide optimal capabilities for producing and operating on information (knowledge) in each of the levels in the stack. Note that this process stack and the capabilities required to implement it are extremely generic. A framework capable of generalizing analysis capabilities across all the layers could be applied to a very broad range of problems.

As part of the design process to produce a software system for addressing the analysis process stack, we utilized a simplified model, whereby the various capabilities were classified into one of three layers. While it might have been intuitive to identify a separate software model layer with each of the boundaries in the process stack, in reality, some of the types of processing employed by adjacent layers have similar requirements. From this perspective, rather than representing the process stack with 5 layers, a simplified version containing 3 layers (green, yellow, and red) can be used.

An alternate way of interpreting the 3-Layer version is that each layer is based on the nature of the processing required to provide a given capability. Each of the three layers (bottom, middle and top) be characterized by the nature of data they operate on and the level of abstraction created by the technology.

The purpose of the bottom layer is to interact with the external data and produce corresponding entries in the knowledgebase. Functions such as feature extraction, detection , and identification reside in this layer.

The middle layer is where relationships are identified. Processing here operates only on information present in the knowledgebase, either inferred or directly translated there by the bottom layer. Functions such as clustering and classification occur in the middle layer.

Finally, the responsibility of the top layer is to conceptualize the situation in more of an abstract way than that provided by the middle layer. Like the middle layer, it also operates exclusively on information in the knowledgebase. But unlike the middle layer, the nature of the processing is more abstract and specific to a particular domain model. Functionality such as sensemaking , decision-support and behavior simulation reside in the top layer. In order to identify appropriate technologies to address the requirements of each of the three model layers, it is important to identify their functionality and data characteristics.

Bottom Layer
Support for the higher levels of reasoning requires that various supporting information must be produced. The lowest level components will perform functionality involving detection, discrimination and preliminary identification. Example components might map pixel level features to abstract model elements or filter time series data to detect significant events.

The nature of the bottom layer is to provide the basic interface to any information not derived by the framework itself. Although some simple forms of data may be directly mapped into the knowledgebase by processing elements in this layer, in many situations some additional analysis will be required. This analysis corresponds to the boundary between observations and inferred entities illustrated above . Rather that simply storing a direct representation of the “raw” data values, relevant abstract features are detected and identified, and placed in the knowledgebase as an instance of a class in the problem domain ontology. For example, rather than attempting to persist all the pixels in a reconnaissance image, relevant features are extracted and only the feature descriptions are captured. Of course this does not preclude any system external to the framework from persisting the raw data and archival data is also a valid source of information for processing by elements in the framework. But a general goal for each element in the framework is to progressively increase the usability of the knowledge present in the data.

It is particularly important to note that for each layer of the model, many different types and instances of processing can (and should) be operating in parallel. Rather than having a single “very smart” detection element that is configured to find many different kinds of features. Many smaller, tightly focused components should be employed, each one looking only for a specific feature. In fact it would be desirable in many cases to have multiple processing elements all looking for the same feature, but using different techniques, approaches, or technologies. It would be up to the higher layers of the model to fuse or reconcile the results.

Types of uncertainty present for this layer of the model are primarily data related as opposed to being more abstract or model related. Missing or noisy input data are two of the most common problems.

Middle Layer
The function of the mid level processing is to act as a transition layer between the low level processing which is more data oriented and the high level reasoning which is related more closely to an abstract model. Clustering or classification activities are examples of functionality that typically occur in the middle layer of the analysis model. These operations relate to the second boundary in the process stack, between “inferred entities” and “relationships”. For example, once a set of features has been detected by the bottom layer, mid level components could be employed to identify relationships between them and other information present in the knowledgebase. The resulting relationships are also stored back in the knowledgebase for further consideration by other parts of the framework, either at this level or above.

Uncertainty often manifests itself in the mid level of the model as vagueness, ambiguity, or incoherence.

Top Layer
Unlike the lower two layers, the top layer of reasoning is specifically targeted at reasoning on a deep level of knowledge which is generally model specific and often symbolic in nature. The functionality of the top layer needs to support three different types of capabilities.

•  Probabilistic reasoning – Calculates probabilities in order to make predictions or decisions based on otherwise uncertain or insufficient information.
•  Decision support – Evaluates the relative merits of alternative decisions or courses of action.
•  Behavior simulation – utilizes models and rules to identify a behavioral directive to respond to a recognized situation. In the case where more than one behavior might be appropriate, the framework needs to be able to identify which one is optimal for a given criteria.

The examples of uncertainty present in the top layer include irrelevance, stochastic variance, parametric errors, and modeling problems.

Soft Processing Technologies
There are a number of soft computing technologies that have been shown to be effective when operating on problems involving uncertainty. One of many guiding factors in determining what technology to apply is the nature of the information we have available on which to act. Sometimes we have data that contains a buried wealth of information, other times we have knowledge (rules). Each of these characteristics leads us towards a different technology that is best suited to acting on that kind of information.

Another factor relates to dealing with a tradeoff between interpretability and precision. It is a characteristic of some soft reasoning technologies that you can either get a simplified but easily understood answer, or you can have a more precise answer for which the means that it was arrived at are not readily available. Sometimes simplification is necessary or useful, but it comes with a cost. Understanding issues like these and finding the correct balance is one of the barriers of entry for providing high quality intelligent solution. The table below shows some of the differences between a few of the more commonly used soft technologies and illustrates one possible way in which they might be utilized in the three layer reasoning model.

A point that is sufficiently important to bear repeating is that no single soft technology is right for all tasks and there is a strong synergistic effect when several complimentary technologies are used together. The problem has been that with the exception of a few binary hybrid combinations and cooperating agent technology, there has been little progress made regarding techniques to integrate multiple types reasoning technologies into a single semantic system in a reusable way. Typically when more than one soft technology is employed, the solution takes the form of a binary hybrid. The problem with the binary hybrids is twofold. First, the use of only two reasoning technologies artificially limits the coverage of diverse problem spaces. Second, the coupling of the technologies is typically tight and targeted at a very specific problem, thereby making the combination difficult to reuse.

advanced analytics