Sunday, April 3, 2016

Oracle Endeca 11.x : How Last Mile Crawl is used?

Endeca baseline update process invokes last mile crawl to create Dgidx-compatible data and passes it to Dgidx to generate binary files for ... thumbnail 1 summary

Sunday, April 03, 2016

Endeca baseline update process invokes last mile crawl to create Dgidx-compatible data and passes it to Dgidx to generate binary files for MDEX engine.

Find out following operations during last-mile-crawl

Merges index-config.json using system user and ATG user
Merge Multiple record store defined in crawl configuration XML file.
Processes product/article/store records. Data manipulation can be done if required using custom manipulators.
Writes the schema and records in the MDEX-compatible format.

How Last Mile gets created

Endeca Application instance needs to create using deployment template. Following commands create last-mile-crawl

<<Endeca_App>>/control/initialize_services.sh
In turn the initialize_services.sh runs a following command
${CAS_ROOT}/bin/cas-cmd.sh createCrawls -h ${CAS_HOST} -p ${CAS_PORT} -f ${WORKING_DIR}/../config/cas/last-mile-crawl.xml

Features

1. Record Store Joins
Where the <<Endeca_App>>/config/cas/last-mile-crawl.xml sets up the CAS Crawl with the names of the CAS Recordstores,

<moduleProperties>
<moduleProperty>
<key>dataRecordStores</key>
<value>CRS-data</value>
<value>CRS-External-data</value>
</moduleProperty>
<moduleProperty>
<key>dimensionValueRecordStores</key>
<value>CRS-dimvals</value>
</moduleProperty>
</moduleProperties>

As per XML snapshot above, multiple record stores can be added for further processing. CAS Based indexing only support switch join between multiple Record store.

2. Add Manipulators
Java manipulators can be added into last-mile-crawl in case any data manipulation required e.g. remove comma and create multi-valued properties

<manipulatorConfig>
<moduleId>
<id>com.endeca.cas.extension.sample.manipulator.
substring.SubstringManipulator</id>
</moduleId>
<moduleProperties>
<moduleProperty>
<key>sourceProperty</key>
<value>Endeca.Document.Text</value>
</moduleProperty>
<moduleProperty>
<key>targetProperty</key>
<value>Short.Truncated.Text</value>
</moduleProperty>
<moduleProperty>
<key>length</key>
<value>20</value>
</moduleProperty>
</moduleProperties>
<id>Create short truncated text property</id>
</manipulatorConfig>

3. Merges index-config
initialize_services.sh runs the following command to update the Endeca Configuration repository with the properties and dimensions mentioned in the ./index-config.json
"${WORKING_DIR}/index_config_cmd.sh" set-config -f "${WORKING_DIR}/../config/index_config/index-config.json" -o all

It's your Turn

Was this blog helpful for you? What do you think about this post? Any other topics that you want to cover in details.

Provide your valuable comments or response below.

Baseline Update Process, CAS Based Indexing, Core Endeca

9 comments

Oracle Admin said...: This comment has been removed by the author.; May 4, 2016 at 7:22 AM
Ajay Agrawal said...: I will review article one by one and will do the needful immediately. Thanks for notifying me.; May 4, 2016 at 8:48 AM
Oracle Admin said...: This comment has been removed by the author.; May 11, 2016 at 2:04 AM
Ajay Agrawal said...: Sure. I have deleted all the post and images related to oracle documentation, support and images. Let me know you still see anything. I can remove those as well. I have emailed to copyright_us@oracle.com id as well about this incident last week to apologize.; May 11, 2016 at 10:05 PM
Unknown said...: Hi Ajay, I have one question. I have different sources from where I am creating record store. Now, -data has record.id but new one doesn't have it. It has Endeca.Id. These two are not getting merged. Can you please help me on how can I join those two recor stores.; July 11, 2016 at 8:20 AM
Ajay Agrawal said...: Hi Sumit, all record should have record.id to join the records in all record stores.

Thanks,
Ajay Agrawal; July 17, 2016 at 11:58 AM
Unknown said...: Yes correct, but the issue is I have used a crawl of type Endeca Record File and whenever I am trying to explicitly set record.id as configuration, it throws an error saying expected Endeca.Id but found record.id.
I tried to use a modifier manipulator to add record.id as new PROP, but I am not sure whether it is the right thing to do.

Apart from that, how can I ensure that my records are getting indexed?

Regards,
Sumit Saurabh; July 19, 2016 at 6:06 AM
Unknown said...: Adding modifying script manipulator in the crawl and adding a new prop as record.id resolved my issue.

idPropertyValue = record.getPropertySingleValue("Endeca.Id");
record.addPropertyValue(new PropertyValue("record.id", idPropertyValue.value));
logger.info("Processed Record:" + idPropertyValue.Value);

Thanks; August 25, 2016 at 5:33 PM
James said...: most people are for always looking for dvd movies fore sale to increase their movie library. used and new movies; November 6, 2019 at 3:41 PM

Enterprise Search online tutorial

Sunday, April 3, 2016

Oracle Endeca 11.x : How Last Mile Crawl is used?

Oracle Endeca 11.x : How Last Mile Crawl is used?

9 comments

Post a Comment

Text Widget

Endeca/Search Education series

Search This Blog

Followers

Baby Giggling Anvi - My Daughter

Total Pageviews

About Me

Most Popular

Blogroll