Skip to content
Snippets Groups Projects
Claudiu Cristea's avatar
Issue #2844521 by zaporylie, claudiu.cristea: Do not suggest adding composer requirements manually
Claudiu Cristea authored
9b9ce259
History

Migrate Spreadsheet

Overview

The module provides a migrate source plugin for importing data from spreadsheet files. This source plugin uses the PhpOffice/PhpSpreadsheet library to read from the spreadsheet files.

The supported source files includes .ods, .xls, .xlsx, .csv.

Installing PHPSpreadsheet

Usually we use Composer to install dependencies in Drupal 8. Unfortunately, the composer.json file provided in the module root directory is not automatically parsed when the module gets enabled. For this reason, before installing the module, relevant lines from that file should be copied in the project top composer.json file. The entries under "repositories", "require" and "autoload" > "psr-4" should be copied under the same arrays, in the top project's composer.json. Then a $ composer update should be issued.

Usage

In the migration file add the plugin provided by this module as source plugin:

source:
  plugin: spreadsheet

  # The source file. The path can be either relative to Drupal root but it can
  # be a also an absolute reference such as a stream wrapper.
  file: ../resources/source_file.xlsx

  # The name of the worksheet to read from.
  worksheet: 'Personnel list'

  # The top-left cell where data area starts (excluding the header, if exists).
  # It should use a spreadsheet representation such as B4, A3, etc. The data
  # area does NOT include the header. If this configuration is missed, the
  # assumption is that the first row contains the table header and the data
  # origin is the first cell of the second row. And that is A2. In this example
  # the data area starts from the second column of the third row.
  origin: B3

  # The row where the header is placed, if any. If this configuration is missed,
  # there's no table header and the spreadsheet columns (A, B, C, ...) will be
  # automatically used as table header. If the table header is on the first row,
  # this configuration should be 1. The header cell values will act as column
  # names. The value of 2 means that the table header is on the second row.
  header_row: 2

  # The list of columns to be returned. Is basically a list of table header cell
  # values, if a header has been defined with `header_row`. If there's no table
  # header (i.e. `header_row` is missing), it should contain a list/sequence of
  # column letters (A, B, C, ...). If this configuration is missed, all columns
  # that contain data will be be returned (not recommended).
  columns:
    - ID
    - Revision
    - 'First name'
    - 'Sure name'
    - Gender

  # The name to be given to the column containing the row index. If this setting
  # is specified, the source will return also a pseudo-column, with this name,
  # containing the row index. In this example 'Row no.' can be used later in
  # `keys` list to make this column a primary key column. This name doesn't
  # need to be appended to the `columns` list, it will be added automatically.
  row_index_column: 'Row no.'

  # The primary key as a list of keys. It's a list of source columns that are
  # composing the primary key. The list is keyed by column name and has the
  # field storage definition as value. If the table have a header (i.e.
  # `header_row` is set) the keys will be set as the name of header cells acting
  # as primary index. Otherwise the column letters (A, B, C, ...) can be used.
  # If no keys are defined here, the current row position will be returned as
  # primary key, but in this case, `row_index_column` must have a value.
  keys:
    ID:
      type: integer
      size: big
    Revision:
      type: string
      max_length: 32
      is_ascii: true

Author

Claudiu Cristea (claudiu.cristea)