Viewing the World as Connections

Cataphora’s technology models data as a graph of interconnections, showing relationships between data items. Items of data such as emails, spreadsheets, instant messages, phone calls, and expense reports are characterized by their various forms of content and metadata. All of these elements can be broken apart and recombined to form new evidence types – text contains topics and textblocks, aliases and names belong to people, people form cliques. These additional items are represented as new vertices in the graph and new connections (i.e. edges) are formed among them and with the original data items.

The advantage of this approach is that by creating a library of graph analysis algorithms that are robust, flexible, and combinable, we have gained the ability to pull from the evidence graph disparate important structures, many of which re-enter the evidence graph to make yet more structure apparent.


Actor detection: seeing people, not accounts

Emails and other communications don’t directly tell us who their senders and recipients are– all they reveal are their addresses, handles, or accounts. The typical office worker has many of these, including at least one per communication channel (email, instant messaging, phone numbers, wikis, bulletin boards, etc.). For many of these channels, a single employee might have numerous accounts, both work-granted and personal. HR and IT systems can often help identify the former, but are no help with the latter. Nor do they help identify the many external persons your employees communicate with, including friends, family, vendors, and customers. Add to this confusion identities not normally viewed as communication channels, including keycard accesses and system logons, and it’s possible for a good deal of information to be missed.

Cataphora’s Actor Detection technology uses behavior and lexical connections to join these various identities together into “actors,” corresponding to actual people. This allows us to view as a unit all data created or consumed by a particular individual. It also provides nuance, showing which channels that individual uses for different purposes.


Discussions™: understanding documents in context

From the evidence graph described above, Discussions can be derived. A Discussion is a set of apparently causally related data – in other words, it is the electronic record of a conversation. Discussions can cross channels of communications, from email to SMS to IM, can include point-events such as keycard accesses, can drift to a degree in terms of topics and actors, and can include items not joined by any aliases. In all of these ways Discussions differ from traditional technologies for joining documents, such as thread reconstruction and various forms of clustering.

Discussions are particularly useful for understanding short or vague communications. Consider the message “Sure, go ahead.” What does this mean? Is this authorization to order lunch, or agreement to commit fraud? By itself, this message conveys little meaning. The sender assumed the recipient could understand it based on previous communications. By rejoining these communications and reforming that context, the meaning becomes apparent again.

Discussions effectively reveal significant structure within unstructured data, connecting actions with their effects. Hence they allow numerous additional analyses that would not otherwise be possible. Workflows, projects, and relationships all manifest themselves via Discussions.


Seeing the whole organization

Using Actor Detection and Discussions, the evidence graph described above can be viewed as a network of interacting individuals. To this richly detailed network, we apply both standard and proprietary social network analysis (SNA) methods. De facto work units become visible, as do the connections between them. Command structure becomes apparent, showing who makes daily decisions, who delegates to whom, and who is held in regard by his or her peers. We can also discern effective job functions, showing how individuals relate to topics and workflows.