Using Technology to Access & Analyze
Managing workforce risks with analytics is becoming increasingly complex in the digital age. To successfully address
these risks, contractors must not only understand how to
leverage multiple types of data, but also how to apply the
available technologies: traditional analytics, machine learning, and textual analytics.
Analytics is defined as the methods and procedures used to
extract useful information from a data set in order to answer
a strategic question. Traditional analytics rely on rule-based
methods that follow simple Boolean logic (a search that
limits results with the keywords “and,” “or,” and “not”) to
detect anomalies. For example: If a subcontractor’s address
matches an employee’s address and its wire transfer account
matches the employee bank account, then an improper relationship likely exists.
While effective, rule-based systems are inherently subjective, geared toward known questions, and limited to a few
attributes and exact matching of criteria.
Machine learning is a useful type of artificial intelligence that
can learn without pre-defined rules, supplementing traditional
decision-making processes with enhanced technologies. One
such type of machine learning, known as supervised learning,
constructs a decision tree based on meta-tagged data (e.g.,
“red flag” or “not”) to determine how red flag transactions are
related. Machine learning applies the learned logic to new data
and has the ability to learn from a complex array of data rather
than just a few variables, which leads to greater accuracy in
analyzing a business problem.
Another type of machine learning, called unsupervised learning, constructs decision trees without meta-tagged data; it
identifies patterns of interest and anomalies using its own
decision-making criteria. (See page 22.) This allows users to
find patterns in data sets not previously identified or codified
into rule-based methods. Both supervised and unsupervised
machine learning systems are self-refining; that is, accuracy
improves as more data is encountered.
Applications in Structured Data
Machine learning is frequently used to spot red flag patterns
in structured data – which exists in columns and rows, is
commonly found in tabular format (i.e., typically found in
a database), and is critical for managing risk and quantifying
exposure. Examples include identifying suspicious change
requests, unusual banking transactions, and credit card activity.
It is also useful in network relationship analysis, which is the
exploration of connections between individuals and/or entities. These complex relationship networks can be quantified
with an unsupervised learning approach called “clustering,”
which allows the user to efficiently identify key relationships,
both known and previously unknown.
In an industry like construction, relationships are vital to success, but also easy to exploit. Further, internal relationships
may lead to distrust and favoritism in the workforce. The
source of such data is often corporate e-mail, but may also
include text messages, instant messages, and social media.
Machine learning also enhances basic attribute matching.
Rather than creating a complex set of rules for matching
names, addresses, and other identifying attributes, machine
learning-based systems learn what a match looks like and
applies this logic to the data, resulting in better accuracy.
Applications in Unstructured Data
Machine learning also applies to unstructured data, a
relatively untapped set of data in most organizations.
Unstructured data – such as e-mail, social media, instant
messages, geolocation information, data generated from the
Internet of Things, and other nontraditional data sets – is not
in a tabular format (i.e., not housed in a database).
One such example of machine learning applied to unstructured data is a Stanford University study of transcripts from
earnings conference call of publicly traded companies. The
study revealed that the choice of general vs. specific adjectives and use of third-person language had predictive power
in assessing whether an organization was hiding something
or engaging in questionable activities. 1 The now-infamous
Enron e-mail data set revealed similar patterns.
The most powerful feature of machine learning in analyzing
textual data is its ability to identify emotional expressions.
Common emotions associated with workforce risks include
anger/frustration, tension/anxiety, vagueness, and evasive/
conspiratorial tones. Machine learning can find these types
of communications before an individual reads a single e-mail,
allowing the user to focus on more relevant content early in
the assessment process. Identifying and monitoring emotional expressions over time can be helpful in proactively
addressing workforce issues.