Dan's Take

Lucidworks Fusion and the Never-Ending Search for Insight

Leveraging the power of the Apache Solr project for crunching Big Data.

Lucidworks, long a champion for the Apache Solr project, recently launched Fusion, a platform designed to give companies the ability to translate massive pools of data into actionable insights faster. That is, it is a Big Data tool designed to gain insights from massive amounts of non-structured data, such as word processing documents, PDF files, spreadsheets, presentations and so on.

Lucidworks says that its product, Fusion, builds on the technology from the Apache Solr project and uses machine learning and signal processing techniques to make the never-ending search to gain more insight from a company's structured and non-structured data assets faster and much easier than before.

What is Apache Solr?
Solr is a project managed by the Apache Software Foundation. It's an open source enterprise search engine built on the foundation of the Lucene Java search library. Solr makes it possible to conduct full-text, near real-time searches. It's highly scaleable and fault tolerant. It supports the search and navigation features of many large Web sites.

The Apache Foundation says that Solr is a standalone enterprise search server with a REST-like API. REST, by the way, means "representational state transfer," a programming architecture designed to coordinate functions across the Web in a defined, controlled way.

In the case of Solr, developers can put documents into the document store -- called indexing -- using XML (Extensible Markup Language), JSON (JavaScript Object Notation), CSV (comma separated values) or even binary over an HTTP (hypertext transfer protocol) connection. Queries can be done using the HTTP GET command.

Solr has these characteristics, according to Apache:

  • Advanced full-text search
  • Optimized for high volume Web traffic
  • Standards based (XML, JSON and HTTP) open interfaces
  • Comprehensive HTML administration interfaces
  • Server statistics are made available for monitoring
  • Scalable distributed server architecture that provides automatic index replication, failover and recovery
  • Near real-time indexing
  • Flexible XML configuration
  • Extensible plugin architecture
What is Lucidworks saying about Fusion?
Lucidworks explains that Fusion is designed to give companies the ability to search through and transform huge repositories of non-structured data into useful content. The company describes Fusion's features:

  • Modular Integration: allows current Solr users to overlay Lucidworks search and discovery on their existing Solr configuration.
  • Big Data Discovery Engine: actively prepares and enriches data, adding features such as language identification and analysis, geospatial processing and synonym identification.
  • Connector Framework: provides the ability to import and use data from many external sources and database types.
  • Intelligent Search Services: provides developers with the ability to incorporate Lucidworks advanced search features into their applications, providing real-time results that are personalized, relevant and context-aware.
  • Signal Processing: harnesses machine learning to transform clicks, social, device, geo and other signals into a contextually-relevant data experience.
  • Advanced Analytics: deeply integrated analytics and dashboards make understanding and modifying users' search experience intuitive and flexible.
  • Natural Language Search: delivers full text search that simplifies discovery for users at any level of an organization.
Dan's Take
A large amount of an organization's data assets are kept in forms not suitable for traditional databases, even though many of the relational database products include the capability of storing large, binary objects. The tools to search what's inside those objects can be seen as primitive when compared to an organization's list of search requirements.

The Apache Solr project developed a powerful tool to help organizations search through their data repositories to find things and gain a better understanding of business realities, customer requirements and the like. Like many open source projects, Solr is powerful, but can require a significant investment in time and resources to 1) understand the tool, 2) understand what data is lurking in their inventory of data assets, and 3) use the tool to gain usable, actionable insights.

Lucidworks wants to reduce the skill requirements, reduce the time it takes to go from initial question to actionable insight, and make Solr usable throughout an organization. Wouldn't it be nice, for example, if a business analyst could delve into all the email messages received by the company's customer service department to pick out requests for new services or for improvements to today's service offerings?

If you haven't met the folks from Lucidworks, it might be worth the time to see them demonstrate what Fusion can do. I'm sure you'll think of many ways its search capabilities can be used.

About the Author

Daniel Kusnetzky, a reformed software engineer and product manager, founded Kusnetzky Group LLC in 2006. He's literally written the book on virtualization and often comments on cloud computing, mobility and systems software. He has been a business unit manager at a hardware company and head of corporate marketing and strategy at a software company.


Subscribe on YouTube