How-To

How To Analyze Text with Amazon Comprehend

You can perform real-time analysis on text to derive customer sentiment, pick out key phrases that get used often and determine which entities (people, places and so on) are being discussed.

If a business is to be successful, one of the things that it absolutely must do is make itself aware of what's being said about it online. User-generated content such as customer reviews, social media posts, and YouTube videos can make or break a company.

I recently stumbled upon an Amazon Web Services (AWS) natural language processing service that seems to be designed to help companies address this very challenge. It's called Amazon Comprehend. The basic idea behind how it works is that the engine can analyze large volumes of text, derive customer sentiment, pick out key phrases that get used often and determine which entities (people, places and so on) are being discussed.

AWS provides two different options for using Amazon Comprehend. One option is to perform real-time analysis. The other option is to create an Analysis job. Real-time analysis is the best option to use if you want to analyze a relatively small amount of text, or if you just want to see how the service works. If you need to analyze a larger volume of text (which would typically be the case for real-world usage), then you're going to be better off using the Analysis job option.

Performing a real-time analysis on a selection of text couldn't be easier. Just go into the Amazon Comprehend service, click on Real-time analysis, paste the text into the Input text field, and then click Analyze. You can see what this looks like in Figure 1.

[Click on image for larger view.] Figure 1. Paste your text into the space provided, and click Analyze.

When Amazon Comprehend finishes analyzing the text, it will provide various insights, which are displayed further down on the screen. By default, insights related to entities are displayed. As you can see in Figure 2, words within the text are underlined in various colors as a way of indicating what Amazon Comprehend has determined those words to be. For instance, Amazon.com, Inc was determined to be an Organization, while Seattle, WA, was determined to be a Location.

[Click on image for larger view.] Figure 2. Amazon Comprehend picks key pieces of information out of the text.

As you look at Figure 2, you'll notice that the Insights section is divided into a series of tabs. The Key phrases tab, for instance, picks out what Amazon Comprehend determines to be the most important phrases within the text. The results are kind of meaningless for the short sample text shown in the screen capture, but would likely be much more useful if a longer bit of text were being analyzed.

The Language tab shows what language in which Amazon Comprehend thinks that the text is being written. Normally the language used should be pretty obvious, but I can imagine at least a couple of use cases in which the Language tab might be helpful.

The Sentiment tab is probably the most useful of the Insights tab. It gives you a sense of whether the text is saying something positive or negative. If you look at Figure 3, you can see that this particular sample text was deemed to be neither positive nor negative, but rather neutral.

[Click on image for larger view.] Figure 3. Amazon Comprehend tries to determine whether the text expresses a positive or negative sentiment.

If you do use the Sentiment feature, be sure to pay attention to the way that the output is presented. As you can see in Figure 3, there are separate columns for Neutral, Positive, Negative and Mixed. All four columns are displayed regardless of the sentiment. I'm not personally a fan of the interface, but I will say that the Sentiment feature does seem to work as intended. While writing this blog post, I pasted some random product reviews into the Input text section, and the Sentiment seemed to be correct for each review that I evaluated.

The last tab in the Insights section is the Syntax tab. Basically, this tab just breaks down the text and identifies each word as a noun, verb and so on. The tab might be a source of amusement for those who are grammatically inclined, but I'm honestly not sure what the real-world use case would be for this tab.

As I mentioned earlier, real-time analytics are most suitable for evaluating short blocks of text. If you have a project that's more involved, you'll probably want to use the Analysis job feature. Creating an Analysis job works similarly to what you've already seen, but with a couple of key differences.

For starters, you'll need to upload the document or documents that you want to analyze to an Amazon Simple Storage Service (S3) bucket. You'll also have to use an Amazon S3 bucket for output. Another key difference is that you'll have to choose the type of analysis that you want to perform (sentiment, key phrases and so on), rather than Amazon Comprehend using all of the available analytical methods. Aside from that, the process works very similar to that of performing real-time analytics.

About the Author

Brien Posey is a 22-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.

Featured

Subscribe on YouTube