"Pay my troops no mind; they're just on a fact-finding mission."

Open Source Intelligence Analysis – Software, Methods, Resources

Research firm Applied Research Associates has just launched a website, Global Crowd Intelligence, that invites the public to sign up and try their hand at intelligence forecasting, BBC Future reports.

The website is part of an effort called Aggregative Contingent Estimation, sponsored by the Intelligence Advanced Research Projects Activity (Iarpa), to understand the potential benefits of crowdsourcing for predicting future events by making forecasting more like a game of spy versus spy.

The new website rewards players who successfully forecast future events by giving them privileged access to certain “missions,” and also allowing them to collect reputation points, which can then be used for online bragging rights. When contributors enter the new site, they start off as junior analysts, but eventually progress to higher levels, allowing them to work on privileged missions.

The idea of crowdsourcing geopolitical forecasting is increasing in popularity, and not just for spies.  Wikistrat, a private company touted as “the world’s first massively multiplayer online consultancy,” was founded in 2002, and is using crowdsourcing to generate scenarios about future geopolitical events. It recently released a report based on a crowdsourced simulation looking at China’s future naval powers.

Warnaar says that Wikistrat’s approach appears to rely on developing “what-if scenarios,” rather than attaching a probability to a specific event happening, which is the goal of the Iarpa project.

Paul Fernhout put together a good open letter awhile back on the need for this, it seems IARPA has put some effort forward for this purpose:

A first step towards that could be for IARPA to support better free software tools for “crowdsourced” public intelligence work involving using a social semantic desktop for sensemaking about open source data and building related open public action plans from that data to make local communities healthier, happier, more intrinsically secure, and also more mutually secure. Secure, healthy, prosperous, and happy local (and virtual) communities then can form together a secure, healthy, prosperous, and happy nation and planet in a non-ironic way. Details on that idea are publicly posted by me here in the form of a Proposal Abstract to the IARPA Incisive Analysis solicitation: “Social Semantic Desktop for Sensemaking on Threats and Opportunities”

So what kind of tools can an amateur use for making sense of data?

Data Mining and ACH

Here is a basic implementation of ACH:

Analysis of Competing Hypotheses (ACH) is a simple model for how to think about a complex problem when the available information is incomplete or ambiguous, as typically happens in intelligence analysis. The software downloadable here takes an analyst through a process for making a well-reasoned, analytical judgment. It is particularly useful for issues that require careful weighing of alternative explanations of what has happened, is happening, or is likely to happen in the future. It helps the analyst overcome, or at least minimize, some of the cognitive limitations that make prescient intelligence analysis so difficult. ACH is grounded in basic insights from cognitive psychology, decision analysis, and the scientific method. It helps analysts protect themselves from avoidable error, and improves their chances of making a correct judgment.

RapidMiner – About 6% of data miners use it – Can use R as an extension with a GUI

R – 46% of data miners use this – in some ways better than commercial software – I’m not sure what the limit of this software is, incredibly powerful

Network Mapping

Multiple tools – Finding sets of key players in a network – Cultural domain analysis – Network visualization – Software for analyzing ego-network data – Software package for visualizing social networks

NodeXL is a free, open-source template for Microsoft® Excel® 2007 and 2010 that makes it easy to explore network graphs. With NodeXL, you can enter a network edge list in a worksheet, click a button and see your graph, all in the familiar environment of the Excel window.

Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.

*ORA is a dynamic meta-network assessment and analysis tool developed by CASOS at Carnegie Mellon. It contains hundreds of social network, dynamic network metrics, trail metrics, procedures for grouping nodes, identifying local patterns, comparing and contrasting networks, groups, and individuals from a dynamic meta-network perspective. *ORA has been used to examine how networks change through space and time, contains procedures for moving back and forth between trail data (e.g. who was where when) and network data (who is connected to whom, who is connected to where …), and has a variety of geo-spatial network metrics, and change detection techniques. *ORA can handle multi-mode, multi-plex, multi-level networks. It can identify key players, groups and vulnerabilities, model network changes over time, and perform COA analysis. It has been tested with large networks (106 nodes per 5 entity classes).Distance based, algorithmic, and statistical procedures for comparing and contrasting networks are part of this toolkit.

NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

Social Networks Visualizer (SocNetV) is a flexible and user-friendly tool for the analysis and visualization of Social Networks. It lets you construct networks (mathematical graphs) with a few clicks on a virtual canvas or load networks of various formats (GraphViz, GraphML, Adjacency, Pajek, UCINET, etc) and modify them to suit your needs. SocNetV also offers a built-in web crawler, allowing you to automatically create networks from all links found in a given initial URL.

SUBDUE is a graph-based knowledge discovery system that finds structural, relational patterns in data representing entities and relationships. SUBDUE represents data using a labeled, directed graph in which entities are represented by labeled vertices or subgraphs, and relationships are represented by labeled edges between the entities. SUBDUE uses the minimum description length (MDL) principle to identify patterns that minimize the number of bits needed to describe the input graph after being compressed by the pattern. SUBDUE can perform several learning tasks, including unsupervised learning, supervised learning, clustering and graph grammar learning. SUBDUE has been successfully applied in a number of areas, including bioinformatics, web structure mining, counter-terrorism, social network analysis, aviation and geology.

A range of tools for social network analysis, including node and graph-level indices, structural distance and covariance methods, structural equivalence detection, p* modeling, random graph generation, and 2D/3D network visualization.(R based) … index.html

statnet is a suite of software packages for network analysis that implement recent advances in the statistical modeling of networks. The analytic framework is based on Exponential family Random Graph Models (ergm). statnet provides a comprehensive framework for ergm-based network modeling, including tools for model estimation, model evaluation, model-based network simulation, and network visualization. This broad functionality is powered by a central Markov chain Monte Carlo (MCMC) algorithm. (Requires R)

Tulip is an information visualization framework dedicated to the analysis and visualization of relational data. Tulip aims to provide the developer with a complete library, supporting the design of interactive information visualization applications for relational data that can be tailored to the problems he or she is addressing.

GraphChi is a spin-off of the GraphLab ( ) -project from the Carnegie Mellon University. It is based on research by Aapo Kyrola ( and his advisors.

GraphChi can run very large graph computations on just a single machine, by using a novel algorithm for processing the graph from disk (SSD or hard drive). Programs for GraphChi are written in the vertex-centric model, proposed by GraphLab and Google’s Pregel. GraphChi runs vertex-centric programs asynchronously (i.e changes written to edges are immediately visible to subsequent computation), and in parallel. GraphChi also supports streaming graph updates and removal of edges from the graph. Section ‘Performance’ contains some examples of applications implemented for GraphChi and their running times on GraphChi.

The promise of GraphChi is to bring web-scale graph computation, such as analysis of social networks, available to anyone with a modern laptop. It saves you from the hassle and costs of working with a distributed cluster or cloud services. We find it much easier to debug applications on a single computer than trying to understand how a distributed algorithm is executed.

In some cases GraphChi can solve bigger problems in reasonable time than many other available distributed frameworks. GraphChi also runs efficiently on servers with plenty of memory, and can use multiple disks in parallel by striping the data.

Web Based Stuff:

Play amateur Gestapo from the comfort of your living room:

Search Professionals by Name, Company or Title, painfully verbose compared to the above 2 tools

Broad list of search engines


A tool that uses Palantir Government:

connected with the following datasets:
and some misc. others

Database Listings

Analytic Methods:


Morphological Analysis – A general method for non-quantified modeling

Modeling Complex Socio-Technical Systems using Morphological Analysis

CIA Tradecraft Manual

Top 5 Intelligence Analysis Methods: Analysis Of Competing Hypotheses
(the author scores a 4.4 of 5 on , 2.4 on the easiness scale)
Many new analysts find that getting started is the hardest part of their job. Stating the objective, from the consumer’s standpoint, is an excellent starting point. If the analyst cannot define the consumer and his needs, how is it possible to provide analysis that complements what the consumer already knows.

“Ambassador Robert D. Blackwill … seized the attention of the class of some 30 [intelligence community managers] by asserting that as a policy official he never read … analytic papers. Why? “Because they were nonadhesive.” As Blackwill explained, they were written by people who did not know what he was trying to do and, so, could not help him get it done:
“When I was working at State on European affairs, for example, on certain issues I was the Secretary of State. DI analysts did not know that–that I was one of a handful of key decision makers on some very important matters….”

More charitably, he now characterizes his early periods of service at the NSC Staff and in State Department bureaus as ones of “mutual ignorance”

“DI analysts did not have the foggiest notion of what I did; and I did not have a clue as to what they could or should do.”[6]
Blackwill explained how he used his time efficiently, which rarely involved reading general CIA reports. “I read a lot. Much of it was press. You have to know how issues are coming across politically to get your job done. Also, cables from overseas for preparing agendas for meetings and sending and receiving messages from my counterparts in foreign governments. Countless versions of policy drafts from those competing for the President’s blessing. And dozens of phone calls. Many are a waste of time but have to be answered, again, for policy and political reasons.

“One more minute, please, on what I did not find useful. This is important. My job description called for me to help prepare the President for making policy decisions, including at meetings with foreign counterparts and other officials…. Do you think that after I have spent long weeks shaping the agenda, I have to be told a day or two before the German foreign minister visits Washington why he is coming?”

2 responses to “Open Source Intelligence Analysis – Software, Methods, Resources

  1. Fearless October 16, 2012 at 8:47 pm

    Very interesting, and plenty of helpful links. The CIA tradecraft manual looks especially promising.

  2. Pingback: Komplexe Datenanalyse für Einsteiger | Wirtschaftsprofiling und Unternehmenssicherheit

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: