1 | <?xml version="1.0" encoding="UTF-8"?> |
---|
2 | <!DOCTYPE chapter PUBLIC |
---|
3 | "-//Dawid Weiss//DTD DocBook V3.1-Based Extension for XML and graphics inclusion//EN" |
---|
4 | "../../../../lib/docbook/preprocess/dweiss-docbook-extensions.dtd" |
---|
5 | [ |
---|
6 | <!ENTITY runplugin.configure.common |
---|
7 | "The top of the window displays the names of the selected plug-in and |
---|
8 | configuration, a list with parameters to the left, an area for input fields to the |
---|
9 | right and buttons to proceed with at the bottom. |
---|
10 | Click on a parameter in the parameter list to show the form fields |
---|
11 | for entering values for the parameter to the right. Parameters |
---|
12 | with an <guilabel>X</guilabel> in front of their names already have a |
---|
13 | value. Parameters marked with a blue rectangle are required and must |
---|
14 | be given a value before it is possible to proceed." |
---|
15 | > |
---|
16 | ]> |
---|
17 | <!-- |
---|
18 | $Id: import_data.xml 5372 2010-06-24 12:28:06Z nicklas $ |
---|
19 | |
---|
20 | Copyright (C) 2007 Peter Johansson, Nicklas Nordborg, Martin Svensson |
---|
21 | Copyright (C) 2008 Jari Häkkinen |
---|
22 | |
---|
23 | This file is part of BASE - BioArray Software Environment. |
---|
24 | Available at http://base.thep.lu.se/ |
---|
25 | |
---|
26 | BASE is free software; you can redistribute it and/or |
---|
27 | modify it under the terms of the GNU General Public License |
---|
28 | as published by the Free Software Foundation; either version 3 |
---|
29 | of the License, or (at your option) any later version. |
---|
30 | |
---|
31 | BASE is distributed in the hope that it will be useful, |
---|
32 | but WITHOUT ANY WARRANTY; without even the implied warranty of |
---|
33 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
---|
34 | GNU General Public License for more details. |
---|
35 | |
---|
36 | You should have received a copy of the GNU General Public License |
---|
37 | along with BASE. If not, see <http://www.gnu.org/licenses/>. |
---|
38 | --> |
---|
39 | <chapter id="import_data" chunked="0"> |
---|
40 | <?dbhtml dir="import"?> |
---|
41 | <title>Import of data</title> |
---|
42 | <para> |
---|
43 | In some places the only way to get data into BASE is to import it |
---|
44 | from a file. This typically includes raw data, array design |
---|
45 | features, reporters and other things, which would be inconvenient |
---|
46 | to enter by hand due to the large number of data items. There is |
---|
47 | also convenience batch importers for importing other items such as |
---|
48 | biosources, samples, and extracts. The batch importers are |
---|
49 | described later in this chapter after the general import |
---|
50 | description. |
---|
51 | </para> |
---|
52 | <para> |
---|
53 | Normally, a plug-in handles one type of items and may require a |
---|
54 | configuration, for example, the import plug-ins need some |
---|
55 | information about how to find headers and data lines in |
---|
56 | files. BASE ships with a number of export plug-ins as a part of |
---|
57 | the core plug-ins package, cf. <xref linkend="coreplugins.import" |
---|
58 | />. The core plug-in section links to configuration examples for |
---|
59 | some of the plugins. Go to |
---|
60 | <menuchoice> |
---|
61 | <guimenu>Administrate</guimenu> |
---|
62 | <guimenuitem>Plugins</guimenuitem> |
---|
63 | <guisubmenu>Definitions</guisubmenu> |
---|
64 | </menuchoice> |
---|
65 | to check which plug-ins are installed on your BASE server. When |
---|
66 | BASE finds a plug-in that supports import of a certain type of |
---|
67 | item an &gbImport; button is displayed in the toolbar on either |
---|
68 | the list view or the single-item view. |
---|
69 | </para> |
---|
70 | <note> |
---|
71 | <title>Missing/unavailable button</title> |
---|
72 | <para> |
---|
73 | If the import button is missing from a page were you would expect |
---|
74 | to find them this usually means that: |
---|
75 | </para> |
---|
76 | <itemizedlist> |
---|
77 | <listitem> |
---|
78 | <simpara> |
---|
79 | The logged in user does not have permission to use the plug-in. |
---|
80 | </simpara> |
---|
81 | </listitem> |
---|
82 | <listitem> |
---|
83 | <simpara> |
---|
84 | The plug-in requires a configuration, but no one has been |
---|
85 | created or the logged in user does not have permission to |
---|
86 | use any of the existing configurations. |
---|
87 | </simpara> |
---|
88 | </listitem> |
---|
89 | </itemizedlist> |
---|
90 | <para> |
---|
91 | Contact the server administrator or a similar user that has permission to |
---|
92 | administrate the plug-ins. |
---|
93 | </para> |
---|
94 | </note> |
---|
95 | |
---|
96 | <sect1 id="import_data.import"> |
---|
97 | <title>General import procedure</title> |
---|
98 | |
---|
99 | <para> |
---|
100 | Starting a data import is done by a wizard-like interface. There |
---|
101 | are a number of step you have to go through: |
---|
102 | </para> |
---|
103 | |
---|
104 | <orderedlist> |
---|
105 | <listitem> |
---|
106 | <simpara> |
---|
107 | Select a plug-in and file format to use, or select the |
---|
108 | auto detect option. |
---|
109 | </simpara> |
---|
110 | </listitem> |
---|
111 | <listitem> |
---|
112 | <simpara> |
---|
113 | If you selected the auto detection function, you must select |
---|
114 | a file to use. |
---|
115 | </simpara> |
---|
116 | </listitem> |
---|
117 | <listitem> |
---|
118 | <simpara> |
---|
119 | Specify plug-in parameters. |
---|
120 | </simpara> |
---|
121 | </listitem> |
---|
122 | <listitem> |
---|
123 | <simpara> |
---|
124 | Add the import job to the job queue. |
---|
125 | </simpara> |
---|
126 | </listitem> |
---|
127 | <listitem> |
---|
128 | <simpara> |
---|
129 | Wait for the job to finish. |
---|
130 | </simpara> |
---|
131 | </listitem> |
---|
132 | </orderedlist> |
---|
133 | |
---|
134 | <sect2 id="import_export_data.import.plugin_fileformat"> |
---|
135 | <title>Select plug-in and file format</title> |
---|
136 | <para> |
---|
137 | Click on the &gbImport; button |
---|
138 | in the toolbar to start the import wizard. The first step is to |
---|
139 | select which plug-in and, if supported, which |
---|
140 | file format to use. There is also an <guilabel>auto detect</guilabel> |
---|
141 | option that lets you select a file and have BASE try to find a suitable |
---|
142 | plug-in/file format to use. |
---|
143 | </para> |
---|
144 | |
---|
145 | <figure id="import_export_data.figures.select_import_plugin"> |
---|
146 | <title>Select plug-in and file format</title> |
---|
147 | <screenshot> |
---|
148 | <mediaobject> |
---|
149 | <imageobject><imagedata fileref="figures/select_import_plugin.png" format="PNG" /></imageobject> |
---|
150 | </mediaobject> |
---|
151 | </screenshot> |
---|
152 | </figure> |
---|
153 | |
---|
154 | |
---|
155 | <helptext external_id="import.selectplugin" |
---|
156 | title="Select plug-in and file format for data import"> |
---|
157 | |
---|
158 | <variablelist> |
---|
159 | <varlistentry> |
---|
160 | <term><guilabel>Plugin + file format</guilabel></term> |
---|
161 | <listitem> |
---|
162 | <para> |
---|
163 | This is a combined list of plug-ins and their |
---|
164 | respective file format configurations. The list only |
---|
165 | includes combinations that |
---|
166 | the logged in user has permission to use. If you select |
---|
167 | an entry a short description of about the plug-in and configuration |
---|
168 | is displayed |
---|
169 | below the lists. More information about the plug-ins can |
---|
170 | be found under the menu choices |
---|
171 | <menuchoice> |
---|
172 | <guimenu>Administrate</guimenu> |
---|
173 | <guimenuitem>Plugins</guimenuitem> |
---|
174 | <guisubmenu>Definitions</guisubmenu> |
---|
175 | </menuchoice> |
---|
176 | and |
---|
177 | <menuchoice> |
---|
178 | <guimenu>Administrate</guimenu> |
---|
179 | <guimenuitem>Plugins</guimenuitem> |
---|
180 | <guisubmenu>Configurations</guisubmenu> |
---|
181 | </menuchoice>. |
---|
182 | </para> |
---|
183 | <note> |
---|
184 | <title>File format vs. Configuration</title> |
---|
185 | <simpara> |
---|
186 | A file format is the same thing as a plug-in configuration. |
---|
187 | It may be confusing that the interface sometimes use |
---|
188 | <emphasis>file format</emphasis> and sometimes use |
---|
189 | <emphasis>configuration</emphasis>, but for now, we'll have |
---|
190 | to live with it. |
---|
191 | </simpara> |
---|
192 | </note> |
---|
193 | </listitem> |
---|
194 | </varlistentry> |
---|
195 | </variablelist> |
---|
196 | |
---|
197 | <para> |
---|
198 | Proceed to the next step by clicking on the |
---|
199 | &gbNext; button. |
---|
200 | </para> |
---|
201 | |
---|
202 | <seeother> |
---|
203 | <other external_id="import.autodetect">The auto detect function</other> |
---|
204 | </seeother> |
---|
205 | </helptext> |
---|
206 | |
---|
207 | <sect3 id="import_export_data.import.plugin_fileformat.autodetect"> |
---|
208 | <title>The auto detect function</title> |
---|
209 | |
---|
210 | <helptext |
---|
211 | external_id="import.autodetect" |
---|
212 | title="The auto detect function"> |
---|
213 | |
---|
214 | <para> |
---|
215 | The auto detect function lets you select a file and have |
---|
216 | BASE try to find a suitable plug-in and file format. This option is |
---|
217 | selected by default in the combined plug-in and file format list when there is |
---|
218 | at least one plug-in that supports auto detection. |
---|
219 | </para> |
---|
220 | <note> |
---|
221 | <title>Support of auto detect</title> |
---|
222 | <para> |
---|
223 | Not all plug-ins support auto detection. The ones that do are marked in |
---|
224 | the list with <guilabel>×</guilabel>. |
---|
225 | </para> |
---|
226 | </note> |
---|
227 | |
---|
228 | <para> |
---|
229 | Select the <guilabel>auto detect (all)</guilabel> option to search for a file format |
---|
230 | in all plug-ins that supports the feature, or select the <guilabel>auto detect (plugin)</guilabel> |
---|
231 | option to only search the file formats for a specific plug-in. |
---|
232 | Continue to the next step by clicking on the &gbNext; button. |
---|
233 | </para> |
---|
234 | |
---|
235 | <seeother> |
---|
236 | <other external_id="import.selectplugin">Select plug-in and file format for data import</other> |
---|
237 | <other external_id="import.autodetect.selectfile">Select file for auto detection</other> |
---|
238 | </seeother> |
---|
239 | |
---|
240 | </helptext> |
---|
241 | |
---|
242 | <para> |
---|
243 | You must now select a file to import from. |
---|
244 | </para> |
---|
245 | |
---|
246 | <figure id="import_export_data.figures.select_autodetect_file"> |
---|
247 | <title>Select file for auto detection</title> |
---|
248 | <screenshot> |
---|
249 | <mediaobject> |
---|
250 | <imageobject><imagedata fileref="figures/select_autodetect_file.png" format="PNG" /></imageobject> |
---|
251 | </mediaobject> |
---|
252 | </screenshot> |
---|
253 | </figure> |
---|
254 | |
---|
255 | <helptext external_id="import.autodetect.selectfile" |
---|
256 | title="Select file for auto detection"> |
---|
257 | |
---|
258 | <variablelist> |
---|
259 | <varlistentry> |
---|
260 | <term><guilabel>Plugin</guilabel></term> |
---|
261 | <listitem> |
---|
262 | <para> |
---|
263 | Displayes the selected plug-in or <guilabel>all</guilabel> if the |
---|
264 | auto-detection is used on all supporting plug-ins. |
---|
265 | </para> |
---|
266 | </listitem> |
---|
267 | </varlistentry> |
---|
268 | <varlistentry> |
---|
269 | <term><guilabel>File</guilabel></term> |
---|
270 | <listitem> |
---|
271 | <para> |
---|
272 | Enter the path and file name for the |
---|
273 | file you want to use. Use the <guibutton>Browse…</guibutton> |
---|
274 | button to browse after the file in BASE's file system. |
---|
275 | If the file does not exist in the file system you have the option |
---|
276 | to upload it. |
---|
277 | <nohelp>Read more about this in <xref linkend="file_system" />.</nohelp> |
---|
278 | </para> |
---|
279 | </listitem> |
---|
280 | </varlistentry> |
---|
281 | <varlistentry> |
---|
282 | <term><guilabel>Character set</guilabel></term> |
---|
283 | <listitem> |
---|
284 | <para> |
---|
285 | The character set used in text files. If the selected file has been configured |
---|
286 | with a character set the correct option is automatically selected. In all |
---|
287 | cases, you have the option to override the default selection. Most files, |
---|
288 | typically use one of the UTF-8 or ISO-8859-1 character sets. |
---|
289 | </para> |
---|
290 | </listitem> |
---|
291 | </varlistentry> |
---|
292 | <varlistentry> |
---|
293 | <term><guilabel>Recently used</guilabel></term> |
---|
294 | <listitem> |
---|
295 | <para> |
---|
296 | A list of files you have recently used |
---|
297 | for auto detection. |
---|
298 | </para> |
---|
299 | </listitem> |
---|
300 | </varlistentry> |
---|
301 | </variablelist> |
---|
302 | |
---|
303 | <para> |
---|
304 | Click on the &gbNext; button |
---|
305 | to start the auto detection. |
---|
306 | </para> |
---|
307 | |
---|
308 | <para> |
---|
309 | If the auto detection finds a exactly one plug-in and file format |
---|
310 | the next step is to configure any additional parameters needed |
---|
311 | by the plug-in. This is the same step as if you had selected |
---|
312 | the same plug-in and file format in the first step. |
---|
313 | If no plug-in can be found an error message is displayed. |
---|
314 | </para> |
---|
315 | |
---|
316 | <note> |
---|
317 | <title>More then one compatible plug-in/file format</title> |
---|
318 | <para> |
---|
319 | If more than one matching plug-in or file format is used |
---|
320 | you will be taken back to the first step. This time |
---|
321 | the lists will only include the matching plug-ins/file formats |
---|
322 | and the auto detect option is not present. |
---|
323 | </para> |
---|
324 | </note> |
---|
325 | |
---|
326 | <seeother> |
---|
327 | <other external_id="import.selectplugin">Select plug-in and file format for data import</other> |
---|
328 | <other external_id="import.autodetect">The auto detect function</other> |
---|
329 | </seeother> |
---|
330 | |
---|
331 | </helptext> |
---|
332 | |
---|
333 | </sect3> |
---|
334 | |
---|
335 | </sect2> |
---|
336 | |
---|
337 | <sect2 id="import_export_data.import.pluginparameters"> |
---|
338 | <title>Specify plug-in parameters</title> |
---|
339 | <para> |
---|
340 | When you have selected a plug-in and file format or used |
---|
341 | the auto detect function to find one, a form where you |
---|
342 | you can enter additional parameters for the plug-in is displayed. |
---|
343 | </para> |
---|
344 | |
---|
345 | <figure id="import_export_data.figures.confiure_plugin"> |
---|
346 | <title>Specify plug-in parameters</title> |
---|
347 | <screenshot> |
---|
348 | <mediaobject> |
---|
349 | <imageobject> |
---|
350 | <imagedata |
---|
351 | scalefit="1" width="100%" |
---|
352 | fileref="figures/plugin_parameters.png" format="PNG" /> |
---|
353 | </imageobject> |
---|
354 | </mediaobject> |
---|
355 | </screenshot> |
---|
356 | </figure> |
---|
357 | |
---|
358 | <helptext external_id="runplugin.configure.import" |
---|
359 | title="Specify plug-in parameters"> |
---|
360 | <para> |
---|
361 | &runplugin.configure.common; |
---|
362 | </para> |
---|
363 | |
---|
364 | <para> |
---|
365 | The parameter list is very different from plug-in to plug-in. |
---|
366 | Common parameters for import plug-ins are: |
---|
367 | </para> |
---|
368 | |
---|
369 | <variablelist> |
---|
370 | <varlistentry> |
---|
371 | <term><guilabel>File</guilabel></term> |
---|
372 | <listitem> |
---|
373 | <para> |
---|
374 | The file to import data from. A value is already set if |
---|
375 | you used the auto detect function. |
---|
376 | </para> |
---|
377 | </listitem> |
---|
378 | </varlistentry> |
---|
379 | |
---|
380 | <varlistentry> |
---|
381 | <term><guilabel>Error handling</guilabel></term> |
---|
382 | <listitem> |
---|
383 | <para> |
---|
384 | A section which contains different options how to |
---|
385 | handle errors when parsing the file. Normally you can |
---|
386 | select if the import should fail as a while or if |
---|
387 | the line with the error should be skipped. |
---|
388 | </para> |
---|
389 | </listitem> |
---|
390 | </varlistentry> |
---|
391 | </variablelist> |
---|
392 | |
---|
393 | <para> |
---|
394 | Continue to the next step by clicking the |
---|
395 | &gbNext; button. |
---|
396 | </para> |
---|
397 | |
---|
398 | <seeother> |
---|
399 | <other external_id="runplugin.configure">The plug-in configuration wizard</other> |
---|
400 | </seeother> |
---|
401 | </helptext> |
---|
402 | |
---|
403 | </sect2> |
---|
404 | |
---|
405 | <sect2 id="import_export_data.import.jobqueue"> |
---|
406 | <title>Add the import job to the job queue</title> |
---|
407 | |
---|
408 | <para> |
---|
409 | In this window should information about the job be filled in, like name and |
---|
410 | description. Where name is required and need to have valid string as a value. There |
---|
411 | are also two check boxes in this page. |
---|
412 | <variablelist> |
---|
413 | <varlistentry> |
---|
414 | <term> |
---|
415 | <guilabel>Send message</guilabel> |
---|
416 | </term> |
---|
417 | <listitem> |
---|
418 | <para> |
---|
419 | Tick this check box if the job should send you a message when it is |
---|
420 | finished, otherwise untick it |
---|
421 | </para> |
---|
422 | </listitem> |
---|
423 | </varlistentry> |
---|
424 | <varlistentry> |
---|
425 | <term> |
---|
426 | <guilabel>Remove job</guilabel> |
---|
427 | </term> |
---|
428 | <listitem> |
---|
429 | <para> |
---|
430 | If this check box is ticked, the job will be marked as removed when |
---|
431 | it is finished, on condition that it was finished successfully. This |
---|
432 | is only available for import- and export- plugins. |
---|
433 | </para> |
---|
434 | </listitem> |
---|
435 | </varlistentry> |
---|
436 | </variablelist> |
---|
437 | </para> |
---|
438 | <para> |
---|
439 | Clicking on |
---|
440 | &gbFinish; |
---|
441 | when everything is set will end the job configuration and place the job in the job queue. |
---|
442 | A self-refreshing window appears with information about the |
---|
443 | job's status and execution time. How long time it takes before the job starts to run |
---|
444 | depends on which priority it and the other jobs in the queue have. The job does not |
---|
445 | depend on the status window to be able to run and the window can be |
---|
446 | closed without interrupting the execution. |
---|
447 | </para> |
---|
448 | <tip> |
---|
449 | <title>View job status</title> |
---|
450 | <para> |
---|
451 | A job's status can be viewed at any time by opening it from the job list page, |
---|
452 | <menuchoice> |
---|
453 | <guimenuitem>View</guimenuitem> |
---|
454 | <guimenuitem>Jobs</guimenuitem> |
---|
455 | </menuchoice>. |
---|
456 | </para> |
---|
457 | </tip> |
---|
458 | </sect2> |
---|
459 | |
---|
460 | </sect1> |
---|
461 | |
---|
462 | <sect1 id="import_data.batch"> |
---|
463 | <title>Batch import of data</title> |
---|
464 | |
---|
465 | <para> |
---|
466 | There are in general several possibilities to import data into |
---|
467 | BASE. Bulk data such as reporter information and raw data |
---|
468 | imports are handled by plug-ins created for these tasks. For |
---|
469 | item types that are imported in more moderate quantities a |
---|
470 | suite of batch item importers available |
---|
471 | (<xref linkend="coreplugins.import.batch" />). These importers |
---|
472 | allows the user to create new items in BASE and define item |
---|
473 | properties and associations between items using tab-separated |
---|
474 | (or equivalent) files. |
---|
475 | </para> |
---|
476 | |
---|
477 | <para> |
---|
478 | The batch importers are available for most users and they may |
---|
479 | have been pre-configured but there is no requirement to |
---|
480 | configure the batch importer plug-ins. Here we assume that no |
---|
481 | plug-in configuration exists for the batch |
---|
482 | importers. Pre-configuration of the importers is really only |
---|
483 | needed for facilities that perform the same imports regularly |
---|
484 | whereas for occasional use the provided wizard is |
---|
485 | sufficient. Configuring the importers follows the route |
---|
486 | described in <xref linkend="plugins.configuration" />. |
---|
487 | </para> |
---|
488 | |
---|
489 | <para> |
---|
490 | The batch importers either creates new items or updates |
---|
491 | already existing items. In either mode the plugin can set |
---|
492 | values for |
---|
493 | <itemizedlist> |
---|
494 | <listitem> |
---|
495 | <para> |
---|
496 | Simple properties, <emphasis>eg.</emphasis>, string |
---|
497 | values, numeric values, dates, etc. |
---|
498 | </para> |
---|
499 | </listitem> |
---|
500 | <listitem> |
---|
501 | <para> |
---|
502 | Single-item references, <emphasis>eg.</emphasis>, |
---|
503 | protocol, label, software, owner, etc. |
---|
504 | </para> |
---|
505 | </listitem> |
---|
506 | <listitem> |
---|
507 | <para> |
---|
508 | Multi-item references are references to several other |
---|
509 | items of the same type. The labeled extracts of a |
---|
510 | hybridization or pooled samples are two examples of |
---|
511 | items that refer to several other items; a hybridization |
---|
512 | may contain several labeled extracts and a sample may be |
---|
513 | a pool of several samples. In some cases a multi-item |
---|
514 | reference is bundled with simple |
---|
515 | values, <emphasis>eg.</emphasis>, used quantity of a |
---|
516 | source biomaterial, the array index a labeled extract is |
---|
517 | used on, etc. Multi-item references are never removed by |
---|
518 | the importer, only added or updated. Removing an item |
---|
519 | from a multi-item reference is a manual procedure to be |
---|
520 | done using the web interface. |
---|
521 | </para> |
---|
522 | </listitem> |
---|
523 | </itemizedlist> |
---|
524 | The batch importers do not set values for annotations since |
---|
525 | this is handled by the already existing annotation importer |
---|
526 | plug-in (<xref linkend="annotations.massimport" />). However, |
---|
527 | the annotation importer and batch item importers have similar |
---|
528 | behaviour and functionality to minimize the learning cost for |
---|
529 | users. |
---|
530 | </para> |
---|
531 | |
---|
532 | <para> |
---|
533 | The importer only works one item type at each use and can be |
---|
534 | used in a <emphasis>dry-run</emphasis> mode where everything |
---|
535 | is performed as if a real import is taking place, but the work |
---|
536 | (transaction) is not committed to the database. The result of |
---|
537 | the test can be stored to a log file and the user can examine |
---|
538 | the output to see how an actual import would perform. Summary |
---|
539 | results such as the number of items imported and the number of |
---|
540 | failed items are reported after the import is finished, and in |
---|
541 | the case of non-recoverable failure the reason is reported. |
---|
542 | </para> |
---|
543 | |
---|
544 | <sect2 id="import_data.batch.fileformat"> |
---|
545 | <title>File format</title> |
---|
546 | |
---|
547 | <para> |
---|
548 | For proper and efficient use of the batch importers users |
---|
549 | need to understand how the files to be imported should be |
---|
550 | formatted. For users who wishes to get a hands-on |
---|
551 | experience there is |
---|
552 | an <ulink url="http://base.thep.lu.se/attachment/wiki/DocBookSupport/batchimport_sample.ods?format=raw">OpenOffice |
---|
553 | spreadsheet with sample sheets that work with the batch |
---|
554 | importers</ulink> available for download. This file can be |
---|
555 | used to import a set of data from the biosource level down |
---|
556 | to hybridizations with proper associations and properties |
---|
557 | simply by using the batch importers. |
---|
558 | </para> |
---|
559 | |
---|
560 | <para> |
---|
561 | The input file must be organised into columns separated by a |
---|
562 | specified character such as a tab or comma character. The |
---|
563 | data header line contains the column headers which defines |
---|
564 | the contents of each column and defines the beginning of |
---|
565 | item data in the file. The item data block continues until |
---|
566 | the end of the file or to an optional data footer line |
---|
567 | defining the end of the data block. |
---|
568 | </para> |
---|
569 | |
---|
570 | <para> |
---|
571 | When reading data for an item the plug-in must use some |
---|
572 | information for identifying items. Depending on item type |
---|
573 | there are two or three options to select the item identifier |
---|
574 | <itemizedlist> |
---|
575 | <listitem> |
---|
576 | <para> |
---|
577 | Using the internal <property>id</property>. This is |
---|
578 | always unique for a specific BASE server. |
---|
579 | </para> |
---|
580 | </listitem> |
---|
581 | <listitem> |
---|
582 | <para> |
---|
583 | Using the <property>name</property>. This may or may |
---|
584 | not be unique. |
---|
585 | </para> |
---|
586 | </listitem> |
---|
587 | <listitem> |
---|
588 | <para> |
---|
589 | Some items have |
---|
590 | an <property>externalId</property>. This may or may |
---|
591 | not be unique. |
---|
592 | </para> |
---|
593 | </listitem> |
---|
594 | <listitem> |
---|
595 | <para> |
---|
596 | Array slides may have a <property>barcode</property> |
---|
597 | which is similar to |
---|
598 | the <property>externalId</property>. |
---|
599 | </para> |
---|
600 | </listitem> |
---|
601 | </itemizedlist> |
---|
602 | It is important that the identifier selected |
---|
603 | is <emphasis>unique</emphasis> in the file used, or if the |
---|
604 | file is used to update items already existing in BASE the |
---|
605 | identifier should also be unique in BASE for the user |
---|
606 | performing the update. The plug-in will check uniqueness |
---|
607 | when default parameters are used but the user may change the |
---|
608 | default behaviour. |
---|
609 | </para> |
---|
610 | |
---|
611 | <para> |
---|
612 | Data for a single item may be split into multiple lines. The |
---|
613 | first line contains simple properties and single-item |
---|
614 | references, and the first multi-item reference. If there are |
---|
615 | more multi-item references they should be on the following |
---|
616 | lines with empty values in all other columns, except for the |
---|
617 | column holding the item identifier. The item identifier must |
---|
618 | have the same value on all lines associated with the |
---|
619 | item. Lines containing other data than multi-item references |
---|
620 | will be ignored or may be considered as an error depending |
---|
621 | on plug-in parameter settings. The reason for treating |
---|
622 | copied data entries as an error is to catch situations where |
---|
623 | two items is given the same item identifier by accident. |
---|
624 | </para> |
---|
625 | |
---|
626 | </sect2> |
---|
627 | |
---|
628 | <sect2 id="import_data.batch.running"> |
---|
629 | <title>Running the item batch importer</title> |
---|
630 | |
---|
631 | <para> |
---|
632 | This section discuss specific parameters and features of the |
---|
633 | batch importers. The general use of the batch importers |
---|
634 | follow the description outlined in |
---|
635 | <xref linkend="import_data.import" /> and the setting of |
---|
636 | column mapping parameters is assisted with |
---|
637 | the <guilabel>Test with file</guilabel> function described |
---|
638 | in <xref linkend="plugins.configuration.testwithfile" |
---|
639 | />. The column headers are mapped to item properties at each |
---|
640 | use of the plug-in but, as pointed out above, they can also |
---|
641 | be predefined by saving settings as a plug-in |
---|
642 | configuration. The configuration also includes separator |
---|
643 | character and other information that is needed to parse |
---|
644 | files. The ability to save configurations depends on user |
---|
645 | credential and is by default only granted to administrators. |
---|
646 | </para> |
---|
647 | |
---|
648 | <para> |
---|
649 | The plug-in parameter follows the standard BASE plug-in |
---|
650 | layout and shows help information for selected |
---|
651 | parameters. The list below comments on some of the |
---|
652 | parameters available. |
---|
653 | <variablelist> |
---|
654 | <varlistentry> |
---|
655 | <term> |
---|
656 | <guilabel>Mode</guilabel> |
---|
657 | </term> |
---|
658 | <listitem> |
---|
659 | <para> |
---|
660 | Select the mode of the plug-in. The plug-in can |
---|
661 | create new items and/or update items already |
---|
662 | existing in BASE. This setting is available to allow |
---|
663 | the user to make a conscious choice of how to treat |
---|
664 | missing or already existing items. For example, if |
---|
665 | the user selects to only update items already |
---|
666 | existing the plug-in will complain if an item in the |
---|
667 | file does not exist in BASE (using default error |
---|
668 | condition treatment). This adds an extra layer of |
---|
669 | security and diagnostics for the user during import. |
---|
670 | </para> |
---|
671 | </listitem> |
---|
672 | </varlistentry> |
---|
673 | <varlistentry> |
---|
674 | <term> |
---|
675 | <guilabel>Identification method</guilabel> |
---|
676 | </term> |
---|
677 | <listitem> |
---|
678 | <para> |
---|
679 | This parameter defines the method to use to find |
---|
680 | already existing items. The parameter can only be |
---|
681 | set to a set of item properties listed in the |
---|
682 | plug-in parameter dialog. The property selected by |
---|
683 | the user must be mapped to a column in the file. If |
---|
684 | it is not set there is obviously no way for the |
---|
685 | plug-in to identify if an item already exists . |
---|
686 | </para> |
---|
687 | </listitem> |
---|
688 | </varlistentry> |
---|
689 | <varlistentry> |
---|
690 | <term> |
---|
691 | <guilabel>Owned by me</guilabel>, <guilabel>Shared to |
---|
692 | me</guilabel>, <guilabel>In current |
---|
693 | project</guilabel>, and <guilabel>Owned by |
---|
694 | others</guilabel> |
---|
695 | </term> |
---|
696 | <listitem> |
---|
697 | <para> |
---|
698 | Defines the set of items the plug-in should look in |
---|
699 | when it checks whether an item already exists. The |
---|
700 | options are the same that are available in list |
---|
701 | views and the actual set of parameters depends in |
---|
702 | user credentials. |
---|
703 | </para> |
---|
704 | <para> |
---|
705 | When <property>id</property> is used as |
---|
706 | the <guilabel>Identification method</guilabel>, the |
---|
707 | plug-in looks for the item irrespective the setting |
---|
708 | of these parameters. Of course, the user still must |
---|
709 | have proper access to the item referenced. |
---|
710 | </para> |
---|
711 | </listitem> |
---|
712 | </varlistentry> |
---|
713 | <varlistentry> |
---|
714 | <term> |
---|
715 | <guilabel>Column mapping expressions</guilabel> |
---|
716 | </term> |
---|
717 | <listitem> |
---|
718 | <para> |
---|
719 | Use the <guilabel>Test with file</guilabel> function |
---|
720 | described in |
---|
721 | <xref linkend="plugins.configuration.testwithfile" |
---|
722 | /> to set the column mapping parameters. |
---|
723 | </para> |
---|
724 | <para> |
---|
725 | When creating pooled items, |
---|
726 | the <property>pooled</property> property is used to |
---|
727 | tell the plug-in that an item is pooled. Pooled in |
---|
728 | BASE language really means that the item parent is |
---|
729 | of the same type as the item itself. If an item is |
---|
730 | not pooled then the parent is of another type |
---|
731 | following a predefined hierarchy in BASE. In |
---|
732 | ascending order the BASE ordering |
---|
733 | of <emphasis>parent - child - grandchild - |
---|
734 | ...</emphasis> item relation is <emphasis>biosource |
---|
735 | - sample - extract - labeled extract</emphasis>. |
---|
736 | </para> |
---|
737 | <para> |
---|
738 | The values accepted for <property>pooled</property> |
---|
739 | are <constant>empty (' ')</constant>, |
---|
740 | <constant>0</constant>, <constant>1</constant>, |
---|
741 | <constant>no</constant>, <constant>yes</constant>, |
---|
742 | <constant>false</constant>, |
---|
743 | and <constant>true</constant>. Any other string is |
---|
744 | interpreted as the item is pooled. Sometimes all |
---|
745 | items in a file to be imported are pooled but there |
---|
746 | is no column that marks the pooled status. This can |
---|
747 | be resolved by setting |
---|
748 | the <property>pooled</property> mapping to a |
---|
749 | constant string |
---|
750 | <constant>'1'</constant> which make all items to be |
---|
751 | treated as pooled in the import (no backslash '\' |
---|
752 | character, compare with column header mapping |
---|
753 | strings that contain backslash characters |
---|
754 | like <constant>'\pool column\'</constant>). |
---|
755 | </para> |
---|
756 | </listitem> |
---|
757 | </varlistentry> |
---|
758 | </variablelist> |
---|
759 | After setting the parameters, |
---|
760 | select <guilabel>Next</guilabel>. Another parameter dialog |
---|
761 | will appear where error handling options can be set among |
---|
762 | with |
---|
763 | <variablelist> |
---|
764 | <varlistentry> |
---|
765 | <term> |
---|
766 | <guilabel>Log file</guilabel> |
---|
767 | </term> |
---|
768 | <listitem> |
---|
769 | <para> |
---|
770 | Setting this parameter will turn on logging. The |
---|
771 | plug-in will give detailed information about how the |
---|
772 | file is parsed. This is useful for resolving file |
---|
773 | parsing issues. |
---|
774 | </para> |
---|
775 | </listitem> |
---|
776 | </varlistentry> |
---|
777 | <varlistentry> |
---|
778 | <term> |
---|
779 | <guilabel>Dry run</guilabel> |
---|
780 | </term> |
---|
781 | <listitem> |
---|
782 | <para> |
---|
783 | Enable or disable test run of the plug-in. If |
---|
784 | enabled the plug-in will parse and simulate an |
---|
785 | import. When enabling this option you should set |
---|
786 | the <guilabel>Log file</guilabel> also. The dry run |
---|
787 | mode allows testing of large imports and updates by |
---|
788 | creating a log file that can be examined for |
---|
789 | inconsistencies before actually performing the action |
---|
790 | without a safety net. |
---|
791 | </para> |
---|
792 | </listitem> |
---|
793 | </varlistentry> |
---|
794 | </variablelist> |
---|
795 | </para> |
---|
796 | |
---|
797 | <para> |
---|
798 | During file parsing the plug-in will look for items |
---|
799 | referenced on each line. There are three outcomes of this |
---|
800 | item search |
---|
801 | <itemizedlist> |
---|
802 | <listitem> |
---|
803 | <para> |
---|
804 | No item is found. Depending on parameter settings this |
---|
805 | may abort the plug-in, the plug-in may ignore the |
---|
806 | line, or a new item is created. |
---|
807 | </para> |
---|
808 | </listitem> |
---|
809 | <listitem> |
---|
810 | <para> |
---|
811 | One item is found. This is the item that is going to |
---|
812 | be updated. |
---|
813 | </para> |
---|
814 | </listitem> |
---|
815 | <listitem> |
---|
816 | <para> |
---|
817 | More than one item is found. Depending on parameter |
---|
818 | settings this may abort the plug-in or the plug-in may |
---|
819 | ignored the line. |
---|
820 | </para> |
---|
821 | </listitem> |
---|
822 | </itemizedlist> |
---|
823 | </para> |
---|
824 | |
---|
825 | </sect2> |
---|
826 | |
---|
827 | <sect2 id="import_data.batch.comments"> |
---|
828 | <title>Comments on the item batch importers</title> |
---|
829 | |
---|
830 | <para> |
---|
831 | The item batch importers are not designed to change or |
---|
832 | create annotations. There is another plug-in for this, see |
---|
833 | <xref linkend="annotations.massimport" /> for an |
---|
834 | introduction to the annotation importer. |
---|
835 | </para> |
---|
836 | |
---|
837 | <para> |
---|
838 | There is no need to map all columns when running the |
---|
839 | importer. When new items are created usually the only |
---|
840 | mandatory entry is <property>Name</property>, and when |
---|
841 | running the plug-in in update mode only the column defining |
---|
842 | the item identification property needs to be defined. This |
---|
843 | can be utilized when only one or a few properties needs to |
---|
844 | be updated; map only columns that should be changed and the |
---|
845 | plug-in will ignore the other properties and leave them as |
---|
846 | they are already stored in BASE. This also means that if one |
---|
847 | property should be deleted then that property must be mapped |
---|
848 | and the value must be empty in the file. Note, multi-item |
---|
849 | reference cannot be deleted with the batch importer, and |
---|
850 | deletion of multi-item references must be done using the web |
---|
851 | interface. |
---|
852 | </para> |
---|
853 | |
---|
854 | <para> |
---|
855 | When parent and other relations are created using the |
---|
856 | plug-in the referenced items are properly linked and |
---|
857 | updated. This means that when a quantity that decreases a |
---|
858 | referenced item is used, the referenced item is updated |
---|
859 | accordingly. In consequence, if the relation is removed in a |
---|
860 | later update - maybe wrong parent was referenced - the |
---|
861 | referenced item is restored and any decrease of quantities |
---|
862 | are also reset. |
---|
863 | </para> |
---|
864 | |
---|
865 | <para> |
---|
866 | A common mistake is to forget to make sure that some of the |
---|
867 | referenced items already exists in BASE, or at least are |
---|
868 | accessible for the user performing the import. Items such as |
---|
869 | protocols and labels must be added before referencing |
---|
870 | them. This is of course also true for other items but during |
---|
871 | batch import one usually follows the natural order of first |
---|
872 | importing biosources, samples, extracts, and so on. In this |
---|
873 | way the parents are always present and may be referenced |
---|
874 | without any issues. |
---|
875 | </para> |
---|
876 | |
---|
877 | </sect2> |
---|
878 | |
---|
879 | </sect1> |
---|
880 | |
---|
881 | </chapter> |
---|