Endeca baseline update process invokes last mile crawl to create Dgidx-compatible data and passes it to Dgidx to generate binary files for MDEX engine.
Find out following operations during last-mile-crawl
As per XML snapshot above, multiple record stores can be added for further processing. CAS Based indexing only support switch join between multiple Record store.
2. Add Manipulators
Java manipulators can be added into last-mile-crawl in case any data manipulation required e.g. remove comma and create multi-valued properties
3. Merges index-config
initialize_services.sh runs the following command to update the Endeca Configuration repository with the properties and dimensions mentioned in the ./index-config.json
"${WORKING_DIR}/index_config_cmd.sh" set-config -f "${WORKING_DIR}/../config/index_config/index-config.json" -o all
It's your Turn
Was this blog helpful for you? What do you think about this post? Any other topics that you want to cover in details.
Provide your valuable comments or response below.
Find out following operations during last-mile-crawl
- Merges index-config.json using system user and ATG user
- Merge Multiple record store defined in crawl configuration XML file.
- Processes product/article/store records. Data manipulation can be done if required using custom manipulators.
- Writes the schema and records in the MDEX-compatible format.
How Last Mile gets created
Endeca Application instance needs to create using deployment template. Following commands create last-mile-crawl
<<Endeca_App>>/control/initialize_services.sh
In turn the initialize_services.sh runs a following command
${CAS_ROOT}/bin/cas-cmd.sh createCrawls -h ${CAS_HOST} -p ${CAS_PORT} -f ${WORKING_DIR}/../config/cas/last-mile-crawl.xml
Features
1. Record Store Joins
Where the <<Endeca_App>>/config/cas/last-mile-crawl.xml sets up the CAS Crawl with the names of the CAS Recordstores,
Endeca Application instance needs to create using deployment template. Following commands create last-mile-crawl
<<Endeca_App>>/control/initialize_services.sh
In turn the initialize_services.sh runs a following command
${CAS_ROOT}/bin/cas-cmd.sh createCrawls -h ${CAS_HOST} -p ${CAS_PORT} -f ${WORKING_DIR}/../config/cas/last-mile-crawl.xml
Features
1. Record Store Joins
Where the <<Endeca_App>>/config/cas/last-mile-crawl.xml sets up the CAS Crawl with the names of the CAS Recordstores,
<moduleProperties> <moduleProperty> <key>dataRecordStores</key> <value>CRS-data</value> <value>CRS-External-data</value> </moduleProperty> <moduleProperty> <key>dimensionValueRecordStores</key> <value>CRS-dimvals</value> </moduleProperty> </moduleProperties> |
As per XML snapshot above, multiple record stores can be added for further processing. CAS Based indexing only support switch join between multiple Record store.
2. Add Manipulators
Java manipulators can be added into last-mile-crawl in case any data manipulation required e.g. remove comma and create multi-valued properties
<manipulatorConfig> <moduleId> <id>com.endeca.cas.extension.sample.manipulator. substring.SubstringManipulator</id> </moduleId> <moduleProperties> <moduleProperty> <key>sourceProperty</key> <value>Endeca.Document.Text</value> </moduleProperty> <moduleProperty> <key>targetProperty</key> <value>Short.Truncated.Text</value> </moduleProperty> <moduleProperty> <key>length</key> <value>20</value> </moduleProperty> </moduleProperties> <id>Create short truncated text property</id> </manipulatorConfig> |
3. Merges index-config
initialize_services.sh runs the following command to update the Endeca Configuration repository with the properties and dimensions mentioned in the ./index-config.json
"${WORKING_DIR}/index_config_cmd.sh" set-config -f "${WORKING_DIR}/../config/index_config/index-config.json" -o all
It's your Turn
Was this blog helpful for you? What do you think about this post? Any other topics that you want to cover in details.
Provide your valuable comments or response below.
9 comments
I will review article one by one and will do the needful immediately. Thanks for notifying me.
Sure. I have deleted all the post and images related to oracle documentation, support and images. Let me know you still see anything. I can remove those as well. I have emailed to copyright_us@oracle.com id as well about this incident last week to apologize.
Hi Ajay, I have one question. I have different sources from where I am creating record store. Now, -data has record.id but new one doesn't have it. It has Endeca.Id. These two are not getting merged. Can you please help me on how can I join those two recor stores.
Hi Sumit, all record should have record.id to join the records in all record stores.
Thanks,
Ajay Agrawal
Yes correct, but the issue is I have used a crawl of type Endeca Record File and whenever I am trying to explicitly set record.id as configuration, it throws an error saying expected Endeca.Id but found record.id.
I tried to use a modifier manipulator to add record.id as new PROP, but I am not sure whether it is the right thing to do.
Apart from that, how can I ensure that my records are getting indexed?
Regards,
Sumit Saurabh
Adding modifying script manipulator in the crawl and adding a new prop as record.id resolved my issue.
idPropertyValue = record.getPropertySingleValue("Endeca.Id");
record.addPropertyValue(new PropertyValue("record.id", idPropertyValue.value));
logger.info("Processed Record:" + idPropertyValue.Value);
Thanks
most people are for always looking for dvd movies fore sale to increase their movie library. used and new movies
Post a Comment
Note: Only a member of this blog may post a comment.