Skip to content
Snippets Groups Projects

Split Queue item to manageable batches.

Open Michal Gow requested to merge issue/ai-3487487:3487487-improve-ai-search into 1.0.x
9 unresolved threads

Closes #3487487

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
69 * The AI VDB provider plugin manager.
70 */
71 public function __construct(
72 array $configuration,
73 $plugin_id,
74 $plugin_definition,
75 LoggerInterface $logger,
76 EntityTypeManagerInterface $entity_type_manager,
77 EmbeddingStrategyPluginManager $embedding_strategy_manager,
78 AiVdbProviderPluginManager $vdb_provider_manager,
79 ) {
80 parent::__construct($configuration, $plugin_id, $plugin_definition);
81 $this->logger = $logger;
82 $this->entityTypeManager = $entity_type_manager;
83 $this->embeddingStrategyProviderManager = $embedding_strategy_manager;
84 $this->vdbProviderManager = $vdb_provider_manager;
  • 429 ): void {
    430 $item_id = $item->getId();
    431 $chunk_batches = array_chunk($chunks, $chunk_threshold);
    432 $operations = [];
    433 $is_not_cron = \Drupal::routeMatch()->getRouteName() !== 'system.cron';
    434 $is_not_command_line = PHP_SAPI !== 'cli';
    435 if ($is_not_cron && $is_not_command_line) {
    436 foreach ($chunk_batches as $chunk_batch) {
    437 $operations[] = [
    438 [__CLASS__, 'processChunk'],
    439 [$search_index, $item, $chunk_batch, $configuration],
    440 ];
    441 }
    442
    443 // Define the batch.
    444 $batch = [
  • 442
    443 // Define the batch.
    444 $batch = [
    445 'title' => $this->t(
    446 'Processing chunks for item @id',
    447 [
    448 '@id' => $item_id,
    449 ]
    450 ),
    451 'operations' => $operations,
    452 'init_message' => $this->t('Initializing background processing...'),
    453 'progress_message' => $this->t('Processed @current out of @total.'),
    454 'error_message' => $this->t('An error occurred during processing.'),
    455 ];
    456
    457 batch_set($batch);
  • 90 94 IndexInterface $index,
    91 95 array $items,
    92 96 EmbeddingStrategyInterface $embedding_strategy,
    97 array $chunks,
    98 bool $reindex,
    • I think this will be okay; Pinecone/Milvus don't extend this method from AiVdbProviderClientBase BUT I'm not 100% sure about the Postgres one as I haven't seen it yet, so we may need to let Josh Hayter from Numiko know.

    • Author Developer

      This is still experimental module - but we should put this in the log for sure.

    • Please register or sign in to reply
  • 128 137 if (!empty($configuration['chunk_min_overlap'])) {
    129 138 $this->chunkMinOverlap = (int) $configuration['chunk_min_overlap'];
    130 139 }
    140
    141 if (!empty($configuration['chunk_threshold'])) {
    142 $this->chunkThreshold = (int) $configuration['chunk_threshold'];
    143 }
    144 else {
    145 $this->chunkThreshold = 10;
    • I think we should have the default be -1 for unlimited/no threshold and explain to users:

      • If they have very long content and are running into timeouts they can enable it
      • If they enable it they should also make sure they have crons running frequently processing queue items (e.g. Simple Cron or Ultimate Cron)
    • Author Developer

      I have used '0' for keeping chunks in single batch item and amended description to express that.

    • Please register or sign in to reply
  • 211 239 * Returns array of default configuration values for given strategy.
    212 240 *
    213 241 * @return array
    214 * List of configuration values set for given model.
    242 * List of configuration values set for given strategy.
    215 243 */
    216 244 public function getDefaultConfigurationValues(): array {
    217 245 return [
    218 246 'chunk_size' => 500,
    219 247 'chunk_min_overlap' => 100,
    248 'chunk_threshold' => 10,
  • 360 foreach ($items as $item_id => $item) {
    361 $fields = $item->getFields();
    362 if (empty($fields)) {
    363 $this->messenger->addStatus(
    364 $this->t(
    365 'Item @id has been skipped, it has no fields to be indexed.',
    366 [
    367 '@id' => $item_id,
    368 ]
    369 )
    370 );
    371 $processed[] = $item_id;
    372 continue;
    373 }
    374 $chunks[$item_id] = $embedding_strategy->getChunks($fields, $index);
    375 if (count($chunks[$item_id]) <= $chunk_threshold) {
  • 416 * @param array $chunks
    417 * Chunks of the item content.
    418 * @param int $chunk_threshold
    419 * The number of chunks to process in a single queue item.
    420 * @param array $configuration
    421 * Configuration of the server.
    422 */
    423 protected function enqueueItem(
    424 IndexInterface $search_index,
    425 ItemInterface $item,
    426 array $chunks,
    427 int $chunk_threshold,
    428 array $configuration,
    429 ): void {
    430 $item_id = $item->getId();
    431 $chunk_batches = array_chunk($chunks, $chunk_threshold);
    • See above re threshold perhaps being -1, ie, they never want to queue Whatabout if threshold is set to 0; ie, they always want to queue, never process immediately

      Suggested change
      450 $chunk_batches = array_chunk($chunks, $chunk_threshold);
      450 if ($chunk_threshold >= 0) {
      451 $chunk_batches = array_chunk($chunks, max($chunk_threshold, 1));
      452 }
      453 else {
      454 $chunk_batches = [$chunks];
      455 }
    • Author Developer

      You have to manually enable "Index items immediately" at Index to do indexing (and chunking and embedding) on node Update/Save. This change doesn't change that behaviour.

      Edited by Michal Gow
    • Please register or sign in to reply
  • Michal Gow added 117 commits

    added 117 commits

    Compare with previous version

  • Michal Gow added 13 commits

    added 13 commits

    Compare with previous version

  • Michal Gow added 4 commits

    added 4 commits

    • 2fbdb3bc - 1 commit from branch project:1.0.x
    • e0d49907 - Peer Review fixes
    • d3207480 - CodeSniffer changes, logs removed
    • 7621d2af - Merge branch '1.0.x' into 3487487-improve-ai-search

    Compare with previous version

  • Michal Gow added 1 commit

    added 1 commit

    Compare with previous version

  • Michal Gow added 1 commit

    added 1 commit

    Compare with previous version

  • Michal Gow added 1 commit

    added 1 commit

    • 1658d80b - Fixed AI Search API index form

    Compare with previous version

  • 45 45 */
    46 46 protected int $chunkSize;
    47 47
    48 /**
    49 * The chunk threshold.
    50 *
    51 * @var int
    52 */
    53 protected int $chunkThreshold;
  • Michal Gow added 34 commits

    added 34 commits

    Compare with previous version

  • Please register or sign in to reply
    Loading