How to add an index rendition to your documents in Documentum?

When speaking of document transformation, people mostly think about converting a document from one type to another, while preserving all the data from the source document. However, this is not always required. DocShifter thinks one step beyond basic transformation and answers specific needs you might encounter. In this post I will explain how DocShifter can be used to generate an index rendition, which contains all the textual content from conventional and less conventional documents stored in Documentum.

Index rendition

Let’s start off with the meaning of an index rendition. Indexing servers, like xPlore, do not always provide all the functionalities you want. Some file formats are not supported and other documents need OCR first or may need to be bundled for indexing. For example, when archiving an email, you store the body and attachment in Documentum and create relations between the email and attachment. Next, you want to search emails but the system needs to be able to search in the content of both the email and the attachment. A solution for this, is to merge the text of the email and the attachment into one text rendition and add it to the email object as a rendition. This is what’s called an index rendition.

DocShifter

DocShifter offers an Index Rendition module that allows you to automatically create these kinds of renditions. The module extracts the text of one or more documents by leveraging the functionality of other modules, and by merging the results into one file.

The following image displays a DocShifter workflow that creates such a rendition. The workflow consists of 3 steps: first the Documentum input module reads documents from Documentum, then the index rendition gets created and finally, the document is added to the Documentum object.

Input

The input module queries the queue of the Documentum user for requests of a specific type. These requests can be created by TBO’s, SBO’s, xCP workflows, … The input module polls the Documentum queue on regular time frames and extracts the files that have a queue item associated. This module can also be configured to export related documents.

The Documentum sender module has different parameters:

Name

Type

Mandatory

Description

dctm_repository

STRING

YES

The name of the Documentum repository to poll.

dctm_user

STRING

YES

Name of the user used to connect to Documentum. This is a technical user and requires at least READ access to the objects.

dctm_pass

PASSWORD

YES

Password of the user in the dctm_user parameter

renditionType

STRING

YES

The Documentum queue_item message that identifies the queue requests.

dataFields

STRING

NO

Some DocShifter renditions require Documentum metadata for processing. The fields that need to be exported from Documentum can be listed here.

relationName

STRING

NO

The name of the relation who’s document also needs to be read from Documentum. In the case of the example this is the relation linking the email to the attachments.

frequency

INTEGER

YES

The size of the polling interval in which the poller polls from Documentum.

start_date

DATETIME

NO

Start date of the polling period. If not provided then the poller will always start.

end_date

DATETIME

NO

End date of the polling period. If not provided then the poller will not stop.

Transform

The transformation is handled by the Index Rendition module. This module processes the input and saves the content to one text file. The input of this module can be a file or a folder. When the input is a folder, all content will be merged to one text document with the document names as separator. To extract the content out of the document, the Index Rendition module leverages the use of the different modules available in the DocShifter instance to get the most content. For example: When an image-PDF or a Tiff-file enters the module you might believe that these documents do not contain any textual data. However, when the OCR module is installed, the Index Rendition module will use this module to extract text from the image.

The Index Rendition module does not have any parameters.

Output

The Documentum release module saves the result of the workflow to Documentum. This module supports multiple types of releases: as a new rendition, as update of the content or as a version of the object. In case of the index rendition, a new rendition is the correct choice.

The Documentum release module has 4 parameters:

Name

Type

Mandatory

Description

dctm_repository

STRING

YES

The name of the Documentum repository to poll.

dctm_user

STRING

YES

Name of the user used to connect to Documentum. This is a technical user and requires at least WRITE access to the objects.

dctm_pass

PASSWORD

YES

Password of the user in the dctm_user parameter.

dctm_update_type

STRING

YES

The type of release. this can be rendition, update, minor or major.

Configuring Documentum

Documentum must be configured to request renditions. This can be done by using a workflow, TBO, SBO or by other ways to create queue_items.

Configure indexing of (only) the index renditions

For the index rendition a custom format, called index, is created. This format is then configured to be indexed, by configuring two attributes on the format.

The first attribute is can_index (“Full-Text Indexing” in DA), this Boolean enables a format for indexing. However, this only enables the fact that a format can be indexed. Configuring that a format needs to be indexed, even if it is a rendition, is done by adding “ft_always” to the formats format_class attribute. The ft_always class implies that a rendition is always indexed. The class can also be set to ft_prefferred, this gives the rendition preference to be indexed instead of other renditions from whom the format_class is not set. If multiple renditions have a format class of ft_preferred only, the first will be indexed.

Conclusion

The combination of Documentum and DocShifter allows users to quickly find documents in a convenient way. DocShifter fills the voids of your index server by transforming the documents to a simple text format. If you have any questions, you can contact us through our contact page or find more information on the DocShifter website.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply