Sie sind auf Seite 1von 8

Techniques Used

Exploratory Data Analysis


Clustering of Criminals based on their profile
Dimensionality reduction on quantitative and qualitative parameters
Factor Analysis for Mixed Data (in R)
Principal component method to explore both continuous and categorical variables
Quantitative (K), Qualitative(Q) variables in FAMD {K,Q} look for a function I, the
more related to all K + Q variables by, maximum of

where,
the correlation coefficient between variables
and
the squared correlation ratio between variables.
Hierarchial Clustering
Each individual will be cluster at start.
Cluster is aggregated using ward method.
all possible pairs of clusters are combined and the sum of the squared distances within each cluster
is calculated.
This is then summed over all clusters.
The combination that gives the lowest sum of squares is chosen.

Location Analytics
Hotspot Analysis
KDE (Kernel Density Estimation)
non parametric probability density function estimator.
Kernels (Triangular, Quadratic, Gaussian)
Default Bandwidth (width/height)

Neighbourhood Analysis of Data spatially


Global (Is the values being analyzed exhibit spatial pattern)
Spatial Autocorrelation tool (Global Morans I)
High/Low Clustering Tools (Getis/Ord Global G)

Local (Where Spatial Significance happens)


Local Morans I (Where Spatial Outlier Occurs)
Getis/Ord Local G*( WHERE high/low values cluster)

Text Analytics
Location Extraction
Stanford Named Entity Recognizer
Conditional Random Fields with Gibbs Sampling
Approximate inference algorithm but helps in non local
inference.

Lucene Index and Gazateer

Classification
Parameter Extraction (Unigrams frequently used)
Document Term Matrix Creation
Logistic Regression based Supervised Machine Learning

Lingo Clustering
Creation of Term Document Matrix
Latent Semantic Indexing (the problems of lexical
matching by using statistically derived conceptual
indices instead of individual words for retrieval)
A truncated Singular Value Decomposition (SVD) is used to
estimate the structure in word usage across documents

Abstract Concepts in LSI is represented using frequent


phrases in that collection
Terms that occurs a minimum number of times
Complete phrases

Comparison with Document and Query Vector (phrases)

Text Related
Entity Disambiguation
Dbpedia Spotlight
Generative probabilistic Model using
P(e)- probability of entity
P(s/e)- Probability of text to this entity
P(c/e)- Probability of entity in Context

RelationShip Extraction
Semantic Role Labelling
Propbank Sentence :- Sentence that are annotated using PCFG and
Semantic Roles(Agent(A0), Theme(A1),Location(AM-LOC), Time(AMTMP), Predicate).
Extract features from sentence, syntactic parse and other sources
for each candidate constituent.
Train statistical ML classifier to identify arguments.
Extract features same as or similar to those in step 2.
Train statistical ML classifier to select appropriate label for
arguments.
All vs one, pairwise, structured multi-label classification.

Deep Learning
Convolutional Neural Networks
Automatic Feature Learning
Identify Features from the imagenet
database
Selective Search
SVM Classification

Das könnte Ihnen auch gefallen