Sunday, April 3, 2016

Oracle Endeca 11.x : How Last Mile Crawl is used?

Endeca baseline update process invokes last mile crawl to create Dgidx-compatible data and passes it to Dgidx to generate binary files for ... thumbnail 1 summary
Endeca baseline update process invokes last mile crawl to create Dgidx-compatible data and passes it to Dgidx to generate binary files for MDEX engine.

Find out following operations during last-mile-crawl 

  • Merges index-config.json using system user and ATG user 
  • Merge Multiple record store defined in crawl configuration XML file.
  • Processes product/article/store records. Data manipulation can be done if required using custom manipulators.
  • Writes the schema and records in the MDEX-compatible format.

How Last Mile gets created

Endeca Application instance needs to create using deployment template. Following commands create last-mile-crawl

In turn the runs a following command
${CAS_ROOT}/bin/ createCrawls -h ${CAS_HOST} -p ${CAS_PORT} -f ${WORKING_DIR}/../config/cas/last-mile-crawl.xml 


1.  Record Store Joins
Where the <<Endeca_App>>/config/cas/last-mile-crawl.xml sets up the CAS Crawl with the names of the CAS Recordstores,

As per XML snapshot above, multiple record stores can be added for further processing. CAS Based indexing only support switch join between multiple Record store.

2. Add Manipulators
Java manipulators can be added into last-mile-crawl in case any data manipulation required e.g. remove comma and create multi-valued properties
<id>Create short truncated text property</id>

3. Merges index-config runs the following command to update the Endeca Configuration repository with the properties and dimensions mentioned in the ./index-config.json
"${WORKING_DIR}/" set-config -f "${WORKING_DIR}/../config/index_config/index-config.json" -o all

It's your Turn

Was this blog helpful for you? What do you think about this post? Any other topics that you want to cover in details. 

Provide your valuable comments or response below.


  1. This comment has been removed by the author.

    1. I will review article one by one and will do the needful immediately. Thanks for notifying me.

    2. This comment has been removed by the author.

    3. Sure. I have deleted all the post and images related to oracle documentation, support and images. Let me know you still see anything. I can remove those as well. I have emailed to id as well about this incident last week to apologize.

  2. Hi Ajay, I have one question. I have different sources from where I am creating record store. Now, -data has but new one doesn't have it. It has Endeca.Id. These two are not getting merged. Can you please help me on how can I join those two recor stores.

    1. Hi Sumit, all record should have to join the records in all record stores.

      Ajay Agrawal

    2. Yes correct, but the issue is I have used a crawl of type Endeca Record File and whenever I am trying to explicitly set as configuration, it throws an error saying expected Endeca.Id but found
      I tried to use a modifier manipulator to add as new PROP, but I am not sure whether it is the right thing to do.

      Apart from that, how can I ensure that my records are getting indexed?

      Sumit Saurabh

    3. Adding modifying script manipulator in the crawl and adding a new prop as resolved my issue.

      idPropertyValue = record.getPropertySingleValue("Endeca.Id");
      record.addPropertyValue(new PropertyValue("", idPropertyValue.value));"Processed Record:" + idPropertyValue.Value);



Text Widget