Marpex Inc.'s key tech man, Doug Lowry, was sole inventor of the FindIt search engine that was marketed by Reteaco Inc. in Canada and the United States in the mid-to-late 1980s. Marpex Inc. owns his more recent U.S. Patent # 7,433,893 entitled Method and System for Compression Indexing and Efficient Proximity Search of Text Data. Branded MarpX, this technology started out as a search engine for relatively small collections of up to 3,200,000 words. (To keep this in perspective, the 37 plays of Shakespeare together amount to 2,200,000 words.) Try these examples: the play Shakespeare's Hamlet, a blog archive Grateful Convert by Kevin Lowry, or a Supreme Court ruling posted when it was news in 2014, the Hobby Lobby Decision.
In these small collections you can use wild cards (an asterisk for a few characters, a question mark for one character) to substitute within words that you are looking for. Notice that you control the filter which weeds out hits in which the words you are after are not close enough together. The filtering saves you the time and bother of sorting through useless results to get at the hits that are meaningful to you. These small collections may also be used for reading or browsing content.
The super-compressed bit tree method underlying FindIt starting in November 1984 has long since found its way into the public domain. Combining that method with MarpX and adding some bells and whistles has produced the ability to scale up MarpX precision search from capacity 3.2 million words to 13 billion, then 53 trillion words, and well beyond. Designed with efficiency in mind, the design filters out irrelevant hits, shows results quickly, and ranks them by how close the desired words appear together. By the way, words in headings are taken into account. For two or three years we posted search across every word of a collection of 4080 books from Project Gutenberg. At the moment, there is a demonstration of content discovery for publishers that shows the style for a modest collection of 236 books. Note in particular the advanced search features and the way it displays results -- a way particularly convenient for researchers.
At Marpex Inc., we have experimented a lot with text data mining. Because of the filtering design, and our (obsessive compulsive?) fixation on efficiency, automated scanning for patterns turns out to be extremely rapid.
The files underlying MarpX search contain no text. It's simply not there. (It took a trip to the USPTO in Alexandria VA to convince the patent examiner and his supervisor.) All that is there in those files is individual search terms, their frequencies, and their locations in an array of position values.
In consequence, encrypted content can be stored securely in the cloud, while simultaneously authorized persons can search or set up and run text data mining sequences on offline computers. Search results tell the user which files are most likely to contain the desired content. Those can be downloaded and decrypted as needed.