Sie sind auf Seite 1von 8

Running head: DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS 1

DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS

5070u10a1

Hal Hagood, Dante Durrman

Group 4

Capella University
DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS 2

We first approached this problem by creating a corpus, or repository of data, on the topic: the use

of data analytics in quality assurance. The corpus of documents was created by searching the university

library and downloading the subject matter pertinent to this investigation of our topic. A file called the

corpus compilation was used to record this information. This file was imported into SAS Enterprise Miner

after being uploaded to the Tool Wire remote server. A project diagram, library, and data source was

created inside SAS Enterprise Miner. Following the Text Import node, we partitioned the data, parsed the

text, and filtered the results. Additional documents can be easily added by moving more documents into

the folder and rerunning the project flow diagram. This can be easily automated inside SAS Enterprise

Miner. A screenshot of the project flow diagram is shown in Figure 1; it displays how each of the nodes

were connected and which analyses were to be performed and their order of operations.

Figure 1. Project Flow

Text mining is the analytic investigation of information contained in dialectical content. The use of

content mining methods to solve business problems is called content investigation, or text analytics. Text

mining has a wide array of applications that often result in unique insights about a person or an

organization. Frequently discussed domains of text mining are sentiment analysis, natural language

processing, and information extraction. According to Search Business Analytics (2017), “Text mining can
DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS 3

help an organization derive potentially valuable business insights from text-based content such as word

documents, email and postings on social media streams like Facebook, Twitter and LinkedIn.” The

volume of unstructured data keeps increasing and organizations are even becoming reliant on data

analytics to remain competitive. One such reason for this increase in volume of unstructured data is that

social media is becoming widely used by many organizations. As valuable insights are to be gained by

organizations about products, customers, and the marketplace, interpreting this information is a priority

for most organizations.

Search Business Analytics (2017) explains the necessity to better understand unstructured data

with text mining methods, “Mining unstructured data with natural language processing (NLP), statistical

modeling and machine learning techniques can be challenging, however, because natural language text

is often inconsistent. It contains ambiguities caused by inconsistent syntax and semantics, including

slang, abbreviations, entities, language specific to vertical industries and age groups, double entendres,

and even sarcasm.” Since understanding the information obtained through textual analysis continually

proves to be quite valuable, we can help offset the difficulties of interpreting its meaning through more

sophisticated methodologies leveraging the power of computational thinking.

Search Business Analytics (2017) states, “Text analytics software can help by transposing words

and phrases in unstructured data into numerical values which can then be linked with structured data in a

database and analyzed with traditional data mining techniques. With an iterative approach, an

organization can successfully use text analytics to gain insight into content-specific values such as

sentiment, emotion, intensity and relevance. Because text analytics technology is still considered to be an

emerging technology, however, results and depth of analysis can vary wildly from vendor to vendor.”

Therefore, finding the right set of tools for the task at hand is necessary when approaching the business

problem with text analytics software. In the case of our project at Vila Health, we used SAS Enterprise

Miner to accomplish this task. It had an extensive array of text mining features including a built-in

dictionary to correctly stem words for frequency analysis.

Analytics India (2017) summarizes the goal of text analytics, “The scope in the field of business

analytics is ever expanding and is helping it become mainstream as companies of all sizes and analytics

skill levels get into the big data game. Exploring business analytics needs the right focus, right
DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS 4

technology, right people, right culture and top management commitment. Companies like IBM, Accenture,

and Deloitte are using business analytics tools and coming up with decisions that are useful and

profitable. Business Analytics plays a very important role here as it uses statistics and tools to decode

consumer insights. This is done based on accrued data, and Business Intelligence that garners key

insights that can help predict future behavior, in effect, helping businesses run better. The latest

developments in Business Analytics’ technology are playing a crucial role in automating the analysis

process.”

“Now a day’s most of the information is available in digital form to get the proper data that is a

challenging task. Most of the researchers focused on these problems and come up with the new model to

retrieving the information from the digital system. In this paper, we learn performance of the different

linguistic patterns and statistical scores considered is carefully studied and evaluated in order to design a

method that maximizes the quality of the results. Our proposal is also evaluated for several well

distinguish domain, offering in all cases, reliable taxonomies considering precision and recall along with

F-measure. In this paper, we propose sequential pattern mining based pattern taxonomy relation, which

discover pattern effectively, to achieve the goal we use some state of art data mining method and popular

algorithms for evolution, for the experimental result we use Reuters (RCV1) dataset and the results show

that we improve the discovering pattern as compared to previous text mining methods. The results of the

experiment setup show that the keyword-based methods not give better performance than pattern-based

method. The results also indicate that removal of meaningless patterns not only reduces the cost of

computation but also improves the effectiveness of the system” (ieeeplore, 2017).

Figure 2 shows the results of the text filter node. Figure 3 displays the documents returned from

the search term: ‘analytics’. Figure 3 also lists a dictionary of terms from the corpus with their frequency,

number of documents of which they appear, whether or not they are kept, and their term weight. Figure 4

shows a map of concept links for terms related to the search term: ‘analytics’. The term “analytics” is

associated with “multiple”, “database”, “doi”, “individual”, “design”, “management”, “technology” and

especially the term “hospital”. It is possible to right-click on any child term, and select ‘expand links’ to

display the terms associated with the child term. In this case the term “hospital” was selected, as such it is

possible to see the expanded relationships i.e. “indicator”, “nurse”, “clinician”, “score”, and “capability”.
DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS 5

Figure 5 includes a map where the node for hospital is expanded in addition to the original search term.

Concept linking is an important part of exploratory analysis in text mining because it allows an initial

interactive view in preparation for a more procedural solution such as text clustering, probability modeling,

or regression modeling.

Reflecting on this experience of group work, we both thought that we worked exceptionally well

together. Hal did an excellent job of putting together the initial document with a lot of cited material, and

Dante was able to add the necessary content to coincide with the directives, make adjustments to the

initial document, and add sufficient commentary to submit a polished paper. We were able to complete

the project rather efficiently and enjoyed working together.

Figure 2. Text Filter results


DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS 6

Figure 3. Interactive Filter Viewer

Figure 4. Concept links


DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS 7

Figure 5. Expanded Concept Links


DATA ANALYTICS INTERNSHIP: TEXT MINING APPLICATIONS 8

References

Analytics India. (2017). Scope and Future Trends of Business Analytics. Retrieved from

http://analyticsindiamag.com/scope-and-future-trends-of-business-analytics-in-india/

Capella. (2017). Analytics Internship: Text Mining. Retrieved from

http://media.capella.edu/CourseMedia/ANLT5070/TextMining/transcript.asp

Ieeexplore. (2017). Electronics and Communication...Operational pattern detection in text mining using

pattern taxonomy. Retrieved from

http://ieeexplore.ieee.org/abstract/document/6892780/?reload=true

SAS. (2015). Support.sas.com. Retrieved from https://support.sas.com/resources/papers/proceedings15/

SAS4084-2015.pdf

Search Business Analytics. (2017). Analytics technologies lend enterprise content management a hand.

Retrieved from http://searchbusinessanalytics.techtarget.com/definition/text-

mining

Das könnte Ihnen auch gefallen