Split Queue item to manageable batches.
9 unresolved threads
9 unresolved threads
Closes #3487487
Merge request reports
Activity
Filter activity
requested review from @kevinquillen, @Marcus_Johansson, @scotteuser, and @wouters_frederik
69 * The AI VDB provider plugin manager. 70 */ 71 public function __construct( 72 array $configuration, 73 $plugin_id, 74 $plugin_definition, 75 LoggerInterface $logger, 76 EntityTypeManagerInterface $entity_type_manager, 77 EmbeddingStrategyPluginManager $embedding_strategy_manager, 78 AiVdbProviderPluginManager $vdb_provider_manager, 79 ) { 80 parent::__construct($configuration, $plugin_id, $plugin_definition); 81 $this->logger = $logger; 82 $this->entityTypeManager = $entity_type_manager; 83 $this->embeddingStrategyProviderManager = $embedding_strategy_manager; 84 $this->vdbProviderManager = $vdb_provider_manager; changed this line in version 4 of the diff
429 ): void { 430 $item_id = $item->getId(); 431 $chunk_batches = array_chunk($chunks, $chunk_threshold); 432 $operations = []; 433 $is_not_cron = \Drupal::routeMatch()->getRouteName() !== 'system.cron'; 434 $is_not_command_line = PHP_SAPI !== 'cli'; 435 if ($is_not_cron && $is_not_command_line) { 436 foreach ($chunk_batches as $chunk_batch) { 437 $operations[] = [ 438 [__CLASS__, 'processChunk'], 439 [$search_index, $item, $chunk_batch, $configuration], 440 ]; 441 } 442 443 // Define the batch. 444 $batch = [ changed this line in version 4 of the diff
442 443 // Define the batch. 444 $batch = [ 445 'title' => $this->t( 446 'Processing chunks for item @id', 447 [ 448 '@id' => $item_id, 449 ] 450 ), 451 'operations' => $operations, 452 'init_message' => $this->t('Initializing background processing...'), 453 'progress_message' => $this->t('Processed @current out of @total.'), 454 'error_message' => $this->t('An error occurred during processing.'), 455 ]; 456 457 batch_set($batch); changed this line in version 4 of the diff
90 94 IndexInterface $index, 91 95 array $items, 92 96 EmbeddingStrategyInterface $embedding_strategy, 97 array $chunks, 98 bool $reindex, 128 137 if (!empty($configuration['chunk_min_overlap'])) { 129 138 $this->chunkMinOverlap = (int) $configuration['chunk_min_overlap']; 130 139 } 140 141 if (!empty($configuration['chunk_threshold'])) { 142 $this->chunkThreshold = (int) $configuration['chunk_threshold']; 143 } 144 else { 145 $this->chunkThreshold = 10; I think we should have the default be -1 for unlimited/no threshold and explain to users:
- If they have very long content and are running into timeouts they can enable it
- If they enable it they should also make sure they have crons running frequently processing queue items (e.g. Simple Cron or Ultimate Cron)
211 239 * Returns array of default configuration values for given strategy. 212 240 * 213 241 * @return array 214 * List of configuration values set for given model. 242 * List of configuration values set for given strategy. 215 243 */ 216 244 public function getDefaultConfigurationValues(): array { 217 245 return [ 218 246 'chunk_size' => 500, 219 247 'chunk_min_overlap' => 100, 248 'chunk_threshold' => 10, 360 foreach ($items as $item_id => $item) { 361 $fields = $item->getFields(); 362 if (empty($fields)) { 363 $this->messenger->addStatus( 364 $this->t( 365 'Item @id has been skipped, it has no fields to be indexed.', 366 [ 367 '@id' => $item_id, 368 ] 369 ) 370 ); 371 $processed[] = $item_id; 372 continue; 373 } 374 $chunks[$item_id] = $embedding_strategy->getChunks($fields, $index); 375 if (count($chunks[$item_id]) <= $chunk_threshold) { changed this line in version 4 of the diff
416 * @param array $chunks 417 * Chunks of the item content. 418 * @param int $chunk_threshold 419 * The number of chunks to process in a single queue item. 420 * @param array $configuration 421 * Configuration of the server. 422 */ 423 protected function enqueueItem( 424 IndexInterface $search_index, 425 ItemInterface $item, 426 array $chunks, 427 int $chunk_threshold, 428 array $configuration, 429 ): void { 430 $item_id = $item->getId(); 431 $chunk_batches = array_chunk($chunks, $chunk_threshold); See above re threshold perhaps being -1, ie, they never want to queue Whatabout if threshold is set to 0; ie, they always want to queue, never process immediately
You have to manually enable "Index items immediately" at Index to do indexing (and chunking and embedding) on node Update/Save. This change doesn't change that behaviour.
Edited by Michal Gow
added 117 commits
-
781b6a92...080f1937 - 116 commits from branch
project:1.0.x
- 38eb5655 - Merge branch '1.0.x' into 3487487-improve-ai-search
-
781b6a92...080f1937 - 116 commits from branch
added 13 commits
-
38eb5655...bb469c26 - 12 commits from branch
project:1.0.x
- 8a6f813f - Merge branch '1.0.x' into 3487487-improve-ai-search
-
38eb5655...bb469c26 - 12 commits from branch
added 34 commits
-
1658d80b...3221e5aa - 33 commits from branch
project:1.0.x
- ce213f52 - Merge branch '1.0.x' into 3487487-improve-ai-search
-
1658d80b...3221e5aa - 33 commits from branch
Please register or sign in to reply