News Column

Patent Issued for Methods and Apparatus for Automated Redaction of Content in a Document

February 20, 2014



By a News Reporter-Staff News Editor at Computer Weekly News -- Adobe Systems Incorporated (San Jose, CA) has been issued patent number 8645812, according to news reporting originating out of Alexandria, Virginia, by VerticalNews editors.

The patent's inventor is Leeds, Bennett (Los Gatos, CA).

This patent was filed on August 17, 2010 and was published online on February 4, 2014.

From the background information supplied by the inventors, news correspondents obtained the following quote: "Conventional computer systems operate software applications that assist users in document processing and modifying information contained in such documents. Such software applications are commonly used to perform tasks for computer users such as word processing, graphic design, image processing and the like. Typically, these software applications provide users with a variety of tools that facilitate the modification of data within a document. More specifically, conventional software applications provide tools enabling a user to select data or other content, such as text or image data, within a document and to manipulate and/or delete the selected data (e.g., highlighting a text string in a word processing document and subsequently deleting the highlighted text, or changing the font of the highlighted text).

"As another example, various conventional software applications include conventional redaction tools that allow a user to modify, or mark-up, text data within a document such that the data is unrecognizable and/or irretrievable by other users who have subsequent access to the document, but that keeps the documents original structure (e.g. pagination, paragraph sizes, etc.) in tact. Generally, such conventional redaction tools modify text within a document resulting in a `black box` or similar rectangular graphical barrier that serves as a place-filler in lieu of the redacted text. An example application of a conventional software redaction tool involves the redaction of sensitive information contained in electronic documents as part of the discovery phase during litigation in a lawsuit or the removal of classified information from government documents that are released to the public."

Supplementing the background information on this patent, VerticalNews reporters also obtained the inventor's summary information for this patent: "Conventional software applications that enable a user to redact data in a document suffer from a number of drawbacks. In particular, conventional document processing software applications that contain digital redaction techniques are limited in that these applications only provide means for redacting words that have been previously identified for redaction (vis-a-vis phrases, number sequences, etc.). Desktop redaction applications almost always enable a user to manually select what should be redacted, with some applications having an automation-assist function where words previously identified are either redacted or marked for redaction. Furthermore, these conventional software applications lack various contextual capabilities. For instance, such conventional software applications are not capable of determining when words should be redacted (or should not be redacted) depending on whether those words appear in certain phrases or other contextual structures. As a specific example, such conventional redaction techniques do not provide a mechanism for a user to specify a set of content such as a list of words to be kept in the document. As a result, manual review is often necessary to ensure accuracy after the automated redaction of a document by conventional means to ensure that words the user meant to retain in the document are not redacted.

"Embodiments disclosed herein significantly overcome such deficiencies and provide a method for the automated redaction of documents by executing a redaction process that uses pre-configured lists containing content that should be redacted (e.g., redaction data) and content that should not be redacted (e.g., non-redaction data). In one configuration, the pre-configured lists are progressively developed and tailored to a specific user's preferences after every execution of the redaction process. As an example, the non-redaction data list may be a dictionary containing most common words and phrases. This non-redaction data specifies data or content that is not to be removed from the document. Conversely, the redaction data is a list of words or phrases that the user wants to have redacted from the document content. In operation, the redaction process applies the content (e.g. lists of words) in the redaction data and non-redaction data against a document to produce two intermediary lists. One intermediary list (e.g., a redact list) provides information regarding instances of content in the document that match content in the redaction list or, in other words, content (e.g., words, phrases, objects, etc.) that the user has previously identified for redaction. In addition, a second list (e.g., a potential list) displays content in the document that was in neither (i.e., that did not match) the redaction data nor the non-redaction data. As such, the potential list identifies foreign content (e.g., content that the user has not identified for redaction or non-redaction) in the document that the user may desire to redact.

"In one configuration, the user can also supply (i.e. input) proximity data (e.g. as another list) indicating proximate expressions to be matched against the document. The redaction process processes the proximity data against the document to identify content that may be selected for redaction and adds content that matches the proximity data to the potential list as well. Thus after initial processing, the potential list includes content from the document that did not match either the redaction data (i.e. content the user specifically indicates to redact) nor the non-redaction data (i.e., content that should not be redacted), as well as content that match expressions in the proximity data. In this manner, a user can specify, for example, proximity data in the form of a regular expression that may match strings of text that the user may potentially want to redact. The redaction process adds these to the potential list, which allows the user to review content in this potential list for further addition to the content to actually be redacted. Once selections from the redact and potential lists are complete, the user can commit the redaction process to redact the selected content from these two lists. As a result, the user may select content from both the redact list and potential list for redaction. With the redaction data and non-redaction data modules, a user need not painstakingly create and/or update a new redaction scheme for subsequent applications of the redaction process to the same and/or other documents.

"Furthermore, in configurations disclosed herein, the redaction process can include conflict resolution processing for words and/or phrases that appear in both the redaction data and non-redaction data. For example, the conflict resolution processing allows a user to select words that should (or should not) be redacted if those words appear in certain phrases that are identified in a document. Embodiments of the redaction process disclosed thus substantially overcome the aforementioned drawbacks.

"Generally, as in one embodiment disclosed herein, a redaction process obtains redaction data indicating content to be redacted in a document. In addition, the redaction process obtains non-redaction data indicating content not to be redacted in the document. The redaction process also obtains proximity data indicating proximate expressions to be matched against the document. Furthermore, the redaction process processes the redaction data and the non-redaction data against the document to produce a redacted version of the document.

"In another embodiment, the redaction process renders a redact list comprising instances of content in the document that match the content to be redacted in the redaction data. The redaction process also renders a potential list comprising instances of content in the document that did not match the content to be redacted in the redaction data and did not match the content not to be redacted in the non-redaction data. Furthermore, the redaction process receives, from a user, a redact selection of content to be redacted in the document from the redact list. In turn, the redaction process applies a redaction function to the redact selection to redact content matching the redact selection in the document. Similarly, the redaction process receives, from a user, a potential selection of content from the potential list from the user. Accordingly, the redaction process applies a redaction function to the potential selection to redact content matching the potential selection in the document.

"In yet another embodiment, the redaction process obtains proximity data indicating proximate expressions to be matched against the document. Upon obtaining the proximity data, the redaction process processes the proximity data against the document to identify content that may be selected for redaction. In processing the proximity data, the redaction process matches proximity expressions against content in the document to identify proximate content. Furthermore, the redaction process renders the proximate content in the potential list.

"In still yet another embodiment, the redaction process obtains redaction data indicating content that may be selected for redaction in an original document. Further, the redaction process obtains non-redaction data indicating content not to be redacted in the original document. In its operation, the redaction process renders a redact list comprising instances of content in the original document that match the content to be redacted in the redaction data. Additionally, the redaction process renders a potential list comprising instances of content in the original document that did not match the content to be redacted in the redaction data and did not match the content not to be redacted in the non-redaction data. In this manner, the redaction process receives, from a user, a redact selection that comprises content to be redacted in the original document from the redact list and the potential list. Furthermore, the redaction process applies a redaction function to the redact selection to redact content matching the redact selection in the original document, wherein the redaction function produces a redacted version of the document.

"Other embodiments disclosed herein include any type of computerized device, workstation, handheld or laptop computer, or the like configured with software and/or circuitry (e.g., a processor) to process any or all of the method operations disclosed herein. In other words, a computerized device such as a computer or a data communications device or any type of processor that is programmed or configured to operate as explained herein is considered an embodiment disclosed herein. Other embodiments disclosed herein include software programs to perform the steps and operations summarized above and disclosed in detail below. One such embodiment comprises a computer program product that has a computer-readable medium including computer program logic encoded thereon that, when performed in a computerized device having a coupling of a memory and a processor, programs the processor to perform the operations disclosed herein. Such arrangements are typically provided as software, code and/or other data (e.g., data structures) arranged or encoded on a computer readable medium such as an optical medium (e.g., CD-ROM), floppy or hard disk or other a medium such as firmware or microcode in one or more ROM or RAM, PROM or FPGA chips or as an Application Specific Integrated Circuit (ASIC). The software or firmware or other such configurations can be installed onto a computerized device to cause the computerized device to perform the techniques explained as embodiments disclosed herein.

"It is to be understood that the system disclosed herein may be embodied strictly as a software program, as software and hardware, or as hardware alone. The embodiments disclosed herein, may be employed in data communications devices and other computerized devices and software systems for such devices such as those manufactured by Adobe Systems Incorporated.RTM. of San Jose, Calif."

For the URL and additional information on this patent, see: Leeds, Bennett. Methods and Apparatus for Automated Redaction of Content in a Document. U.S. Patent Number 8645812, filed August 17, 2010, and published online on February 4, 2014. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=17&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=824&f=G&l=50&co1=AND&d=PTXT&s1=20140204.PD.&OS=ISD/20140204&RS=ISD/20140204

Keywords for this news article include: Software, Legal Issues, Word Processing, Adobe Systems Incorporated.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Computer Weekly News


Story Tools