1 | <?xml version="1.0" encoding="UTF-8"?> |
---|
2 | <!DOCTYPE chapter PUBLIC |
---|
3 | "-//Dawid Weiss//DTD DocBook V3.1-Based Extension for XML and graphics inclusion//EN" |
---|
4 | "../../../../lib/docbook/preprocess/dweiss-docbook-extensions.dtd" |
---|
5 | [ |
---|
6 | <!ENTITY runplugin.configure.common |
---|
7 | "The top of the window displays the names of the selected plug-in and |
---|
8 | configuration, a list with parameters to the left, an area for input fields to the |
---|
9 | right and buttons to proceed with at the bottom. |
---|
10 | Click on a parameter in the parameter list to show the form fields |
---|
11 | for entering values for the parameter to the right. Parameters |
---|
12 | with an <guilabel>X</guilabel> in front of their names already have a |
---|
13 | value. Parameters marked with a blue rectangle are required and must |
---|
14 | be given a value before it is possible to proceed." |
---|
15 | > |
---|
16 | ]> |
---|
17 | <!-- |
---|
18 | $Id: import_data.xml 5798 2011-10-11 16:32:48Z nicklas $ |
---|
19 | |
---|
20 | Copyright (C) 2007 Peter Johansson, Nicklas Nordborg, Martin Svensson |
---|
21 | Copyright (C) 2008 Jari Häkkinen |
---|
22 | |
---|
23 | This file is part of BASE - BioArray Software Environment. |
---|
24 | Available at http://base.thep.lu.se/ |
---|
25 | |
---|
26 | BASE is free software; you can redistribute it and/or |
---|
27 | modify it under the terms of the GNU General Public License |
---|
28 | as published by the Free Software Foundation; either version 3 |
---|
29 | of the License, or (at your option) any later version. |
---|
30 | |
---|
31 | BASE is distributed in the hope that it will be useful, |
---|
32 | but WITHOUT ANY WARRANTY; without even the implied warranty of |
---|
33 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
---|
34 | GNU General Public License for more details. |
---|
35 | |
---|
36 | You should have received a copy of the GNU General Public License |
---|
37 | along with BASE. If not, see <http://www.gnu.org/licenses/>. |
---|
38 | --> |
---|
39 | <chapter id="import_data" chunked="0"> |
---|
40 | <title>Import of data</title> |
---|
41 | <para> |
---|
42 | In some places the only way to get data into BASE is to import it |
---|
43 | from a file. This typically includes <emphasis>raw data</emphasis>, |
---|
44 | <emphasis>array design features</emphasis>, <emphasis>reporters</emphasis> |
---|
45 | and other things, which would be inconvenient |
---|
46 | to enter by hand due to the large number of data items. There is |
---|
47 | also convenience batch importers for importing other items such as |
---|
48 | <emphasis>biosources</emphasis>, <emphasis>samples</emphasis>, and |
---|
49 | <emphasis>annotations</emphasis>. The batch importers are |
---|
50 | described later in this chapter after the general import |
---|
51 | description. |
---|
52 | </para> |
---|
53 | <para> |
---|
54 | Normally, a plug-in handles one type of items and may require a |
---|
55 | configuration. For example, most import plug-ins need some |
---|
56 | information about how to find headers and data lines in |
---|
57 | files. BASE ships with a number of import plug-ins as a part of |
---|
58 | the core plug-ins package, cf. <xref linkend="coreplugins.import" |
---|
59 | />. The core plug-in section links to configuration examples for |
---|
60 | some of the plugins. Go to |
---|
61 | <menuchoice> |
---|
62 | <guimenu>Administrate</guimenu> |
---|
63 | <guimenuitem>Plug-ins & extensions</guimenuitem> |
---|
64 | <guisubmenu>Plug-in definitions</guisubmenu> |
---|
65 | </menuchoice> |
---|
66 | to check which plug-ins are installed on your BASE server. When |
---|
67 | BASE finds a plug-in that supports import of a certain type of |
---|
68 | item an &gbImport; button is displayed in the toolbar on either |
---|
69 | the list view or the single-item view. |
---|
70 | </para> |
---|
71 | <note> |
---|
72 | <title>No "Import" button?</title> |
---|
73 | <para> |
---|
74 | If the import button is missing from a page were you would expect |
---|
75 | to find them this usually means that: |
---|
76 | </para> |
---|
77 | <itemizedlist> |
---|
78 | <listitem> |
---|
79 | <simpara> |
---|
80 | The logged in user does not have permission to use the plug-in. |
---|
81 | </simpara> |
---|
82 | </listitem> |
---|
83 | <listitem> |
---|
84 | <simpara> |
---|
85 | The plug-in requires a configuration, but no one has been |
---|
86 | created or the logged in user does not have permission to |
---|
87 | use any of the existing configurations. |
---|
88 | </simpara> |
---|
89 | </listitem> |
---|
90 | </itemizedlist> |
---|
91 | <para> |
---|
92 | Contact the server administrator or a similar user that has permission to |
---|
93 | administrate the plug-ins. |
---|
94 | </para> |
---|
95 | </note> |
---|
96 | |
---|
97 | <sect1 id="import_data.import"> |
---|
98 | <title>General import procedure</title> |
---|
99 | |
---|
100 | <para> |
---|
101 | Starting a data import is done by a wizard-like interface. There |
---|
102 | are a number of step you have to go through: |
---|
103 | </para> |
---|
104 | |
---|
105 | <orderedlist> |
---|
106 | <listitem> |
---|
107 | <simpara> |
---|
108 | Select a plug-in and file format to use, or use the |
---|
109 | <emphasis>auto detect</emphasis> option. |
---|
110 | </simpara> |
---|
111 | </listitem> |
---|
112 | <listitem> |
---|
113 | <simpara> |
---|
114 | If you selected the auto detection function, you must select |
---|
115 | a file to use. |
---|
116 | </simpara> |
---|
117 | </listitem> |
---|
118 | <listitem> |
---|
119 | <simpara> |
---|
120 | Specify plug-in parameters. |
---|
121 | </simpara> |
---|
122 | </listitem> |
---|
123 | <listitem> |
---|
124 | <simpara> |
---|
125 | Add the import job to the job queue. |
---|
126 | </simpara> |
---|
127 | </listitem> |
---|
128 | <listitem> |
---|
129 | <simpara> |
---|
130 | Wait for the job to finish. |
---|
131 | </simpara> |
---|
132 | </listitem> |
---|
133 | </orderedlist> |
---|
134 | |
---|
135 | <sect2 id="import_export_data.import.plugin_fileformat"> |
---|
136 | <title>Select plug-in and file format</title> |
---|
137 | <para> |
---|
138 | Click on the &gbImport; button |
---|
139 | in the toolbar to start the import wizard. The first step is to |
---|
140 | select which plug-in and, if supported, which |
---|
141 | file format to use. There is also an <guilabel>auto detect</guilabel> |
---|
142 | option that lets you select a file and have BASE try to find a suitable |
---|
143 | plug-in/file format to use. |
---|
144 | </para> |
---|
145 | |
---|
146 | <figure id="import_export_data.figures.select_import_plugin"> |
---|
147 | <title>Select plug-in and file format</title> |
---|
148 | <screenshot> |
---|
149 | <mediaobject> |
---|
150 | <imageobject><imagedata fileref="figures/select_import_plugin.png" format="PNG" /></imageobject> |
---|
151 | </mediaobject> |
---|
152 | </screenshot> |
---|
153 | </figure> |
---|
154 | |
---|
155 | |
---|
156 | <helptext external_id="import.selectplugin" |
---|
157 | title="Select plug-in and file format for data import"> |
---|
158 | |
---|
159 | <variablelist> |
---|
160 | <varlistentry> |
---|
161 | <term><guilabel>Plugin + file format</guilabel></term> |
---|
162 | <listitem> |
---|
163 | <para> |
---|
164 | This is a combined list of plug-ins and their |
---|
165 | respective file format configurations. The list only |
---|
166 | includes combinations that |
---|
167 | the logged in user has permission to use. If you select |
---|
168 | an entry a short description about the plug-in and configuration |
---|
169 | is displayed |
---|
170 | below the lists. More information about the plug-ins can |
---|
171 | be found under the menu choices |
---|
172 | <menuchoice> |
---|
173 | <guimenu>Administrate</guimenu> |
---|
174 | <guimenuitem>Plug-ins & extensions</guimenuitem> |
---|
175 | <guisubmenu>Plug-in definitions</guisubmenu> |
---|
176 | </menuchoice> |
---|
177 | and |
---|
178 | <menuchoice> |
---|
179 | <guimenu>Administrate</guimenu> |
---|
180 | <guimenuitem>Plug-ins & extensions</guimenuitem> |
---|
181 | <guisubmenu>Plug-in configuration</guisubmenu> |
---|
182 | </menuchoice> |
---|
183 | </para> |
---|
184 | <note> |
---|
185 | <title>File format vs. Configuration</title> |
---|
186 | <simpara> |
---|
187 | A file format is the same thing as a plug-in configuration. |
---|
188 | It may be confusing that the interface sometimes use |
---|
189 | <emphasis>file format</emphasis> and sometimes use |
---|
190 | <emphasis>configuration</emphasis>, but for now, we'll have |
---|
191 | to live with it. |
---|
192 | </simpara> |
---|
193 | </note> |
---|
194 | </listitem> |
---|
195 | </varlistentry> |
---|
196 | </variablelist> |
---|
197 | |
---|
198 | <para> |
---|
199 | Proceed to the next step by clicking on the |
---|
200 | &gbNext; button. |
---|
201 | </para> |
---|
202 | |
---|
203 | <seeother> |
---|
204 | <other external_id="import.autodetect">The auto detect function</other> |
---|
205 | </seeother> |
---|
206 | </helptext> |
---|
207 | |
---|
208 | <sect3 id="import_export_data.import.plugin_fileformat.autodetect"> |
---|
209 | <title>The auto detect function</title> |
---|
210 | |
---|
211 | <helptext |
---|
212 | external_id="import.autodetect" |
---|
213 | title="The auto detect function"> |
---|
214 | |
---|
215 | <para> |
---|
216 | The auto detect function lets you select a file and have |
---|
217 | BASE try to find a suitable plug-in and file format. This option is |
---|
218 | selected by default in the combined plug-in and file format list when there is |
---|
219 | at least one plug-in that supports auto detection. |
---|
220 | </para> |
---|
221 | <note> |
---|
222 | <title>Support of auto detect</title> |
---|
223 | <para> |
---|
224 | Not all plug-ins support auto detection. The ones that do are marked in |
---|
225 | the list with <guilabel>×</guilabel>. |
---|
226 | </para> |
---|
227 | </note> |
---|
228 | |
---|
229 | <para> |
---|
230 | Select the <guilabel>auto detect (all)</guilabel> option to search for a file format |
---|
231 | in all plug-ins that supports the feature, or select the <guilabel>auto detect (plugin)</guilabel> |
---|
232 | option to only search the file formats for a specific plug-in. |
---|
233 | Continue to the next step by clicking on the &gbNext; button. |
---|
234 | </para> |
---|
235 | |
---|
236 | <seeother> |
---|
237 | <other external_id="import.selectplugin">Select plug-in and file format for data import</other> |
---|
238 | <other external_id="import.autodetect.selectfile">Select file for auto detection</other> |
---|
239 | </seeother> |
---|
240 | |
---|
241 | </helptext> |
---|
242 | |
---|
243 | <para> |
---|
244 | You must now select a file to import from. |
---|
245 | </para> |
---|
246 | |
---|
247 | <figure id="import_export_data.figures.select_autodetect_file"> |
---|
248 | <title>Select file for auto detection</title> |
---|
249 | <screenshot> |
---|
250 | <mediaobject> |
---|
251 | <imageobject><imagedata fileref="figures/select_autodetect_file.png" format="PNG" /></imageobject> |
---|
252 | </mediaobject> |
---|
253 | </screenshot> |
---|
254 | </figure> |
---|
255 | |
---|
256 | <helptext external_id="import.autodetect.selectfile" |
---|
257 | title="Select file for auto detection"> |
---|
258 | |
---|
259 | <variablelist> |
---|
260 | <varlistentry> |
---|
261 | <term><guilabel>Plugin</guilabel></term> |
---|
262 | <listitem> |
---|
263 | <para> |
---|
264 | Displays the selected plug-in or <guilabel>all</guilabel> if the |
---|
265 | auto-detection is used on all supporting plug-ins. |
---|
266 | </para> |
---|
267 | </listitem> |
---|
268 | </varlistentry> |
---|
269 | <varlistentry> |
---|
270 | <term><guilabel>File</guilabel></term> |
---|
271 | <listitem> |
---|
272 | <para> |
---|
273 | Enter the path and file name for the |
---|
274 | file you want to use. Use the <guibutton>Browse…</guibutton> |
---|
275 | button to browse after the file in BASE's file system. |
---|
276 | If the file does not exist in the file system you have the option |
---|
277 | to upload it. |
---|
278 | <nohelp>Read more about this in <xref linkend="file_system" />.</nohelp> |
---|
279 | </para> |
---|
280 | </listitem> |
---|
281 | </varlistentry> |
---|
282 | <varlistentry> |
---|
283 | <term><guilabel>Character set</guilabel></term> |
---|
284 | <listitem> |
---|
285 | <para> |
---|
286 | The character set used in text files. If the selected file has been configured |
---|
287 | with a character set the correct option is automatically selected. In all |
---|
288 | cases, you have the option to override the default selection. Most files, |
---|
289 | typically use either the UTF-8 or ISO-8859-1 character set. |
---|
290 | </para> |
---|
291 | </listitem> |
---|
292 | </varlistentry> |
---|
293 | <varlistentry> |
---|
294 | <term><guilabel>Recently used</guilabel></term> |
---|
295 | <listitem> |
---|
296 | <para> |
---|
297 | A list of files you have recently used |
---|
298 | for auto detection. |
---|
299 | </para> |
---|
300 | </listitem> |
---|
301 | </varlistentry> |
---|
302 | </variablelist> |
---|
303 | |
---|
304 | <para> |
---|
305 | Click on the &gbNext; button |
---|
306 | to start the auto detection. There are three possible outcomes: |
---|
307 | </para> |
---|
308 | |
---|
309 | <itemizedlist> |
---|
310 | <listitem> |
---|
311 | <para> |
---|
312 | Exactly one matching plug-in and file format is found. The next step is |
---|
313 | to configure any additional parameters needed |
---|
314 | by the plug-in. This is the same step as if you had selected |
---|
315 | the same plug-in and file format in the first step. |
---|
316 | </para> |
---|
317 | </listitem> |
---|
318 | <listitem> |
---|
319 | <para> |
---|
320 | If no matching plug-in and file format is found an error message |
---|
321 | is displayed. If logged in with enough permissions to do so there |
---|
322 | is an option to create a new file format/configuration. |
---|
323 | </para> |
---|
324 | </listitem> |
---|
325 | <listitem> |
---|
326 | <para> |
---|
327 | If multiple matching plug-ins and file formats are found |
---|
328 | you will be taken back to the first step. This time |
---|
329 | the lists will only include the matching plug-ins/file formats |
---|
330 | and the auto detect option is not present. |
---|
331 | </para> |
---|
332 | </listitem> |
---|
333 | </itemizedlist> |
---|
334 | |
---|
335 | <seeother> |
---|
336 | <other external_id="import.selectplugin">Select plug-in and file format for data import</other> |
---|
337 | <other external_id="import.autodetect">The auto detect function</other> |
---|
338 | </seeother> |
---|
339 | |
---|
340 | </helptext> |
---|
341 | |
---|
342 | </sect3> |
---|
343 | |
---|
344 | </sect2> |
---|
345 | |
---|
346 | <sect2 id="import_export_data.import.pluginparameters"> |
---|
347 | <title>Specify plug-in parameters</title> |
---|
348 | <para> |
---|
349 | When you have selected a plug-in and file format or used |
---|
350 | the auto detect function to find one, a form where you |
---|
351 | you can enter additional parameters for the plug-in is displayed. |
---|
352 | </para> |
---|
353 | |
---|
354 | <figure id="import_export_data.figures.configure_plugin"> |
---|
355 | <title>Specify plug-in parameters</title> |
---|
356 | <screenshot> |
---|
357 | <mediaobject> |
---|
358 | <imageobject> |
---|
359 | <imagedata |
---|
360 | scalefit="1" width="100%" |
---|
361 | fileref="figures/plugin_parameters.png" format="PNG" /> |
---|
362 | </imageobject> |
---|
363 | </mediaobject> |
---|
364 | </screenshot> |
---|
365 | </figure> |
---|
366 | |
---|
367 | <helptext external_id="runplugin.configure.import" |
---|
368 | title="Specify plug-in parameters"> |
---|
369 | <para> |
---|
370 | &runplugin.configure.common; |
---|
371 | </para> |
---|
372 | |
---|
373 | <para> |
---|
374 | The parameter list is very different from plug-in to plug-in. |
---|
375 | Common parameters for import plug-ins are: |
---|
376 | </para> |
---|
377 | |
---|
378 | <variablelist> |
---|
379 | <varlistentry> |
---|
380 | <term><guilabel>File</guilabel></term> |
---|
381 | <listitem> |
---|
382 | <para> |
---|
383 | The file to import data from. A value is already set if |
---|
384 | you used the auto detect function. |
---|
385 | </para> |
---|
386 | </listitem> |
---|
387 | </varlistentry> |
---|
388 | |
---|
389 | <varlistentry> |
---|
390 | <term><guilabel>File parser regular expressions</guilabel></term> |
---|
391 | <listitem> |
---|
392 | <para> |
---|
393 | Various regular expressions that are used when parsing the file |
---|
394 | to ensure that the data is found. In most cases, all values |
---|
395 | are taken from the matched configuration and can be left as is. |
---|
396 | </para> |
---|
397 | </listitem> |
---|
398 | </varlistentry> |
---|
399 | |
---|
400 | <varlistentry> |
---|
401 | <term><guilabel>Error handling</guilabel></term> |
---|
402 | <listitem> |
---|
403 | <para> |
---|
404 | A section which contains different options how to |
---|
405 | handle errors when parsing the file. Normally you can |
---|
406 | select if the import should fail as a whole or if |
---|
407 | only the line with the error should be skipped. |
---|
408 | </para> |
---|
409 | </listitem> |
---|
410 | </varlistentry> |
---|
411 | </variablelist> |
---|
412 | |
---|
413 | <para> |
---|
414 | Continue to the next step by clicking the |
---|
415 | &gbNext; button. |
---|
416 | </para> |
---|
417 | |
---|
418 | <seeother> |
---|
419 | <other external_id="runplugin.configure">The plug-in configuration wizard</other> |
---|
420 | </seeother> |
---|
421 | </helptext> |
---|
422 | |
---|
423 | </sect2> |
---|
424 | |
---|
425 | <sect2 id="import_export_data.import.jobqueue"> |
---|
426 | <title>Add the import job to the job queue</title> |
---|
427 | |
---|
428 | <figure id="import_export_data.figures.finish_job"> |
---|
429 | <title>Job name and options</title> |
---|
430 | <screenshot> |
---|
431 | <mediaobject> |
---|
432 | <imageobject> |
---|
433 | <imagedata |
---|
434 | fileref="figures/finish_job.png" format="PNG" /> |
---|
435 | </imageobject> |
---|
436 | </mediaobject> |
---|
437 | </screenshot> |
---|
438 | </figure> |
---|
439 | |
---|
440 | <helptext external_id="runplugin.finshjob" |
---|
441 | title="Set job name and options"> |
---|
442 | <para> |
---|
443 | In this window should information about the job be filled in, like name and |
---|
444 | description. Where name is required and need to have valid string as a value. There |
---|
445 | are also two check boxes in this page. |
---|
446 | <variablelist> |
---|
447 | <varlistentry> |
---|
448 | <term> |
---|
449 | <guilabel>Name</guilabel> |
---|
450 | </term> |
---|
451 | <listitem> |
---|
452 | <para> |
---|
453 | Most plug-ins should suggest a name for the job, but you can change it if |
---|
454 | you want to. |
---|
455 | </para> |
---|
456 | </listitem> |
---|
457 | </varlistentry> |
---|
458 | <varlistentry> |
---|
459 | <term> |
---|
460 | <guilabel>Use job agent</guilabel> |
---|
461 | </term> |
---|
462 | <listitem> |
---|
463 | <para> |
---|
464 | This option is only available if the BASE system has been configured with |
---|
465 | job agents and the logged in user has <constant>SELECT_JOBAGENT</constant> |
---|
466 | permission. Select the <guilabel>automatic</guilabel> option to let |
---|
467 | BASE automatically select a job agent or select a specific option |
---|
468 | to force the use of that particular job agent. |
---|
469 | </para> |
---|
470 | </listitem> |
---|
471 | </varlistentry> |
---|
472 | <varlistentry> |
---|
473 | <term> |
---|
474 | <guilabel>Send message</guilabel> |
---|
475 | </term> |
---|
476 | <listitem> |
---|
477 | <para> |
---|
478 | Tick this check box if the job should send you a message when it is |
---|
479 | finished, otherwise untick it |
---|
480 | </para> |
---|
481 | </listitem> |
---|
482 | </varlistentry> |
---|
483 | <varlistentry> |
---|
484 | <term> |
---|
485 | <guilabel>Remove job</guilabel> |
---|
486 | </term> |
---|
487 | <listitem> |
---|
488 | <para> |
---|
489 | If this check box is ticked, the job will be marked as removed when |
---|
490 | it is finished, on condition that it was finished successfully. This |
---|
491 | is only available for import- and export- plugins. |
---|
492 | </para> |
---|
493 | </listitem> |
---|
494 | </varlistentry> |
---|
495 | </variablelist> |
---|
496 | </para> |
---|
497 | <para> |
---|
498 | Clicking on |
---|
499 | &gbFinish; |
---|
500 | when everything is set will end the job configuration and place the job in the job queue. |
---|
501 | A self-refreshing window appears with information about the |
---|
502 | job's status and execution time. How long time it takes before the job starts to run |
---|
503 | depends on which priority it and the other jobs in the queue have. The job does not |
---|
504 | depend on the status window to be able to run and the window can be |
---|
505 | closed without interrupting the execution. |
---|
506 | </para> |
---|
507 | <tip> |
---|
508 | <title>View job status</title> |
---|
509 | <para> |
---|
510 | A job's status can be viewed at any time by opening it from the job list page, |
---|
511 | <menuchoice> |
---|
512 | <guimenuitem>View</guimenuitem> |
---|
513 | <guimenuitem>Jobs</guimenuitem> |
---|
514 | </menuchoice>. |
---|
515 | </para> |
---|
516 | </tip> |
---|
517 | </helptext> |
---|
518 | </sect2> |
---|
519 | |
---|
520 | </sect1> |
---|
521 | |
---|
522 | <sect1 id="import_data.batch"> |
---|
523 | <title>Batch import of data</title> |
---|
524 | |
---|
525 | <para> |
---|
526 | There are in general several possibilities to import data into |
---|
527 | BASE. Bulk data such as reporter information and raw data |
---|
528 | imports are handled by plug-ins created for these tasks. For |
---|
529 | item types that are imported in more moderate quantities a |
---|
530 | suite of batch item importers available |
---|
531 | (<xref linkend="coreplugins.import.batch" />). These importers |
---|
532 | allows the user to create new items in BASE and define item |
---|
533 | properties and associations between items using tab-separated |
---|
534 | (or equivalent) files. |
---|
535 | </para> |
---|
536 | |
---|
537 | <para> |
---|
538 | The batch importers are available for most users and they may |
---|
539 | have been pre-configured but there is no requirement to |
---|
540 | configure the batch importer plug-ins. Here we assume that no |
---|
541 | plug-in configuration exists for the batch |
---|
542 | importers. Pre-configuration of the importers is really only |
---|
543 | needed for facilities that perform the same imports regularly |
---|
544 | whereas for occasional use the provided wizard is |
---|
545 | sufficient. Configuring the importers follows the route |
---|
546 | described in <xref linkend="plugins.configuration" />. |
---|
547 | </para> |
---|
548 | |
---|
549 | <para> |
---|
550 | The batch importers either creates new items or updates |
---|
551 | already existing items. In either mode the plugin can set |
---|
552 | values for |
---|
553 | <itemizedlist> |
---|
554 | <listitem> |
---|
555 | <para> |
---|
556 | Simple properties, <emphasis>eg.</emphasis>, string |
---|
557 | values, numeric values, dates, etc. |
---|
558 | </para> |
---|
559 | </listitem> |
---|
560 | <listitem> |
---|
561 | <para> |
---|
562 | Single-item references, <emphasis>eg.</emphasis>, |
---|
563 | protocol, label, software, owner, etc. |
---|
564 | </para> |
---|
565 | </listitem> |
---|
566 | <listitem> |
---|
567 | <para> |
---|
568 | Multi-item references are references to several other |
---|
569 | items of the same type. The extracts of a |
---|
570 | physical bioassay or pooled samples are two examples of |
---|
571 | items that refer to several other items; a physical bioassay |
---|
572 | may contain several extracts and a sample may be |
---|
573 | a pool of several samples. In some cases a multi-item |
---|
574 | reference is bundled with simple |
---|
575 | values, <emphasis>eg.</emphasis>, used quantity of a |
---|
576 | source biomaterial, the position an extract is |
---|
577 | used on, etc. Multi-item references are never removed by |
---|
578 | the importer, only added or updated. Removing an item |
---|
579 | from a multi-item reference is a manual procedure to be |
---|
580 | done using the web interface. |
---|
581 | </para> |
---|
582 | </listitem> |
---|
583 | </itemizedlist> |
---|
584 | The batch importers do not set values for annotations since |
---|
585 | this is handled by the annotation importer |
---|
586 | plug-in (<xref linkend="annotations.massimport" />). However, |
---|
587 | the annotation importer and batch item importers have similar |
---|
588 | behaviour and functionality to minimize the learning cost for |
---|
589 | users. |
---|
590 | </para> |
---|
591 | |
---|
592 | <para> |
---|
593 | The importer only works with one type of items at each use and can be |
---|
594 | used in a <emphasis>dry-run</emphasis> mode where everything |
---|
595 | is performed as if a real import is taking place, but the work |
---|
596 | (transaction) is not committed to the database. The result of |
---|
597 | the test can be stored to a log file and the user can examine |
---|
598 | the output to see how an actual import would perform. Summary |
---|
599 | results such as the number of items imported and the number of |
---|
600 | failed items are reported after the import is finished, and in |
---|
601 | the case of non-recoverable failure the reason is reported. |
---|
602 | </para> |
---|
603 | |
---|
604 | <sect2 id="import_data.batch.fileformat"> |
---|
605 | <title>File format</title> |
---|
606 | |
---|
607 | <para> |
---|
608 | For proper and efficient use of the batch importers users |
---|
609 | need to understand how the files to be imported should be |
---|
610 | formatted. The input file must be organised into columns separated by a |
---|
611 | specified character such as a tab or comma character. The |
---|
612 | data header line contains the column headers which defines |
---|
613 | the contents of each column and defines the beginning of |
---|
614 | item data in the file. The item data block continues until |
---|
615 | the end of the file or to an optional data footer line |
---|
616 | defining the end of the data block. |
---|
617 | </para> |
---|
618 | |
---|
619 | <para> |
---|
620 | When reading data for an item the plug-in must use some |
---|
621 | information for identifying items. Depending on item type |
---|
622 | there are two or three options to select the item identifier |
---|
623 | <itemizedlist> |
---|
624 | <listitem> |
---|
625 | <para> |
---|
626 | Using the internal <property>id</property>. This is |
---|
627 | always unique for a specific BASE server. |
---|
628 | </para> |
---|
629 | </listitem> |
---|
630 | <listitem> |
---|
631 | <para> |
---|
632 | Using the <property>name</property>. This may or may |
---|
633 | not be unique. |
---|
634 | </para> |
---|
635 | </listitem> |
---|
636 | <listitem> |
---|
637 | <para> |
---|
638 | Some items have |
---|
639 | an <property>externalId</property>. This may or may |
---|
640 | not be unique. |
---|
641 | </para> |
---|
642 | </listitem> |
---|
643 | <listitem> |
---|
644 | <para> |
---|
645 | Array slides may have a <property>barcode</property> |
---|
646 | which is similar to |
---|
647 | the <property>externalId</property>. |
---|
648 | </para> |
---|
649 | </listitem> |
---|
650 | </itemizedlist> |
---|
651 | It is important that the identifier selected |
---|
652 | is <emphasis>unique</emphasis> in the file used, or if the |
---|
653 | file is used to update items already existing in BASE the |
---|
654 | identifier should also be unique in BASE for the user |
---|
655 | performing the update. The plug-in will check uniqueness |
---|
656 | when default parameters are used but the user may change the |
---|
657 | default behaviour. |
---|
658 | </para> |
---|
659 | |
---|
660 | <para> |
---|
661 | Data for a single item may be split into multiple lines. The |
---|
662 | first line contains simple properties and single-item |
---|
663 | references, and the first multi-item reference. If there are |
---|
664 | more multi-item references they should be on the following |
---|
665 | lines with empty values in all other columns, except for the |
---|
666 | column holding the item identifier. The item identifier must |
---|
667 | have the same value on all lines associated with the |
---|
668 | item. Lines containing other data than multi-item references |
---|
669 | will be ignored or may be considered as an error depending |
---|
670 | on plug-in parameter settings. The reason for treating |
---|
671 | copied data entries as an error is to catch situations where |
---|
672 | two items is given the same item identifier by accident. |
---|
673 | </para> |
---|
674 | |
---|
675 | </sect2> |
---|
676 | |
---|
677 | <sect2 id="import_data.batch.running"> |
---|
678 | <title>Running the item batch importer</title> |
---|
679 | |
---|
680 | <para> |
---|
681 | This section discuss specific parameters and features of the |
---|
682 | batch importers. The general use of the batch importers |
---|
683 | follow the description outlined in |
---|
684 | <xref linkend="import_data.import" /> and the setting of |
---|
685 | column mapping parameters is assisted with |
---|
686 | the <guilabel>Test with file</guilabel> function described |
---|
687 | in <xref linkend="plugins.configuration.testwithfile" |
---|
688 | />. The column headers are mapped to item properties at each |
---|
689 | use of the plug-in but, as pointed out above, they can also |
---|
690 | be predefined by saving settings as a plug-in |
---|
691 | configuration. The configuration also includes separator |
---|
692 | character and other information that is needed to parse |
---|
693 | files. The ability to save configurations depends on user |
---|
694 | credential and is by default only granted to administrators. |
---|
695 | </para> |
---|
696 | |
---|
697 | <para> |
---|
698 | The plug-in parameter follows the standard BASE plug-in |
---|
699 | layout and shows help information for selected |
---|
700 | parameters. The list below comments on some of the |
---|
701 | parameters available. |
---|
702 | </para> |
---|
703 | |
---|
704 | <variablelist> |
---|
705 | <varlistentry> |
---|
706 | <term> |
---|
707 | <guilabel>Mode</guilabel> |
---|
708 | </term> |
---|
709 | <listitem> |
---|
710 | <para> |
---|
711 | Select the mode of the plug-in. The plug-in can |
---|
712 | create new items and/or update items already |
---|
713 | existing in BASE. This setting is available to allow |
---|
714 | the user to make a conscious choice of how to treat |
---|
715 | missing or already existing items. For example, if |
---|
716 | the user selects to only update items already |
---|
717 | existing the plug-in will complain if an item in the |
---|
718 | file does not exist in BASE (using default error |
---|
719 | condition treatment). This adds an extra layer of |
---|
720 | security and diagnostics for the user during import. |
---|
721 | </para> |
---|
722 | </listitem> |
---|
723 | </varlistentry> |
---|
724 | <varlistentry> |
---|
725 | <term> |
---|
726 | <guilabel>Data directory</guilabel> |
---|
727 | </term> |
---|
728 | <listitem> |
---|
729 | <para> |
---|
730 | This option is only available for items that has support for |
---|
731 | attaching files (eg. array design, derived bioassay, etc.). |
---|
732 | This setting is used to resolve file references that doesn't |
---|
733 | include a complete absolute path. |
---|
734 | </para> |
---|
735 | </listitem> |
---|
736 | </varlistentry> |
---|
737 | <varlistentry> |
---|
738 | <term> |
---|
739 | <guilabel>Identification method</guilabel> |
---|
740 | </term> |
---|
741 | <listitem> |
---|
742 | <para> |
---|
743 | This parameter defines the method to use to find |
---|
744 | already existing items. The parameter can only be |
---|
745 | set to a set of item properties listed in the |
---|
746 | plug-in parameter dialog. The property selected by |
---|
747 | the user must be mapped to a column in the file. If |
---|
748 | it is not set there is obviously no way for the |
---|
749 | plug-in to identify if an item already exists. |
---|
750 | </para> |
---|
751 | </listitem> |
---|
752 | </varlistentry> |
---|
753 | <varlistentry> |
---|
754 | <term> |
---|
755 | <guilabel>Item subtypes</guilabel> |
---|
756 | </term> |
---|
757 | <listitem> |
---|
758 | <para> |
---|
759 | Only look for existing items among the selected subtypes. If no subtype |
---|
760 | is selected all items are searched. If exactly one subtype is selected |
---|
761 | new items are automatically created with this subtype (unless it is overridden |
---|
762 | by specific subtype values in the import file). |
---|
763 | </para> |
---|
764 | </listitem> |
---|
765 | </varlistentry> |
---|
766 | <varlistentry> |
---|
767 | <term> |
---|
768 | <guilabel>Owned by me</guilabel>, <guilabel>Shared to |
---|
769 | me</guilabel>, <guilabel>In current |
---|
770 | project</guilabel>, and <guilabel>Owned by |
---|
771 | others</guilabel> |
---|
772 | </term> |
---|
773 | <listitem> |
---|
774 | <para> |
---|
775 | Defines the set of items the plug-in should look in |
---|
776 | when it checks whether an item already exists. The |
---|
777 | options are the same that are available in list |
---|
778 | views and the actual set of parameters depends in |
---|
779 | user credentials. |
---|
780 | </para> |
---|
781 | <para> |
---|
782 | When <property>id</property> is used as |
---|
783 | the <guilabel>Identification method</guilabel>, the |
---|
784 | plug-in looks for the item irrespective the setting |
---|
785 | of these parameters. Of course, the user still must |
---|
786 | have proper access to the item referenced. |
---|
787 | </para> |
---|
788 | </listitem> |
---|
789 | </varlistentry> |
---|
790 | <varlistentry> |
---|
791 | <term> |
---|
792 | <guilabel>Column mapping expressions</guilabel> |
---|
793 | </term> |
---|
794 | <listitem> |
---|
795 | <para> |
---|
796 | Use the <guilabel>Test with file</guilabel> function |
---|
797 | described in |
---|
798 | <xref linkend="plugins.configuration.testwithfile" |
---|
799 | /> to set the column mapping parameters. |
---|
800 | </para> |
---|
801 | <para> |
---|
802 | When working with biomaterial items, the |
---|
803 | <guilabel>Parent type</guilabel> property is used to |
---|
804 | tell the plug-in how to find parent items. This only |
---|
805 | has to be set if the parent item is of the same type |
---|
806 | as the biomaterial being imported since the default |
---|
807 | is to look for the nearest parent type in the predefined hierarchy. |
---|
808 | In ascending order the BASE ordering |
---|
809 | of <emphasis>parent - child - grandchild - |
---|
810 | ...</emphasis> item relation is <emphasis>biosource |
---|
811 | - sample - extract</emphasis>. |
---|
812 | </para> |
---|
813 | <para> |
---|
814 | The values accepted for <guilabel>Parent type</guilabel> |
---|
815 | are <constant>BIOSOURCE</constant>, |
---|
816 | <constant>SAMPLE</constant> or <constant>EXTRACT</constant>. |
---|
817 | Sometimes all items in a file to be imported have the same parent |
---|
818 | type but there is no column with this information. This can |
---|
819 | be resolved by setting |
---|
820 | the <guilabel>Parent type</guilabel> mapping to a |
---|
821 | constant string (eg. no backslash '\' character). |
---|
822 | </para> |
---|
823 | </listitem> |
---|
824 | </varlistentry> |
---|
825 | <varlistentry> |
---|
826 | <term> |
---|
827 | <guilabel>Permissions</guilabel> |
---|
828 | </term> |
---|
829 | <listitem> |
---|
830 | <para> |
---|
831 | This is a column mapping that can be used to update the permissions |
---|
832 | set on items. Normally, new items are only shared to the active project |
---|
833 | (if any). By naming a permission template, new items are shared using |
---|
834 | the permissions from that template instead. Permissions on already existing |
---|
835 | items are merged with the permission from the template. |
---|
836 | </para> |
---|
837 | </listitem> |
---|
838 | </varlistentry> |
---|
839 | </variablelist> |
---|
840 | |
---|
841 | <para> |
---|
842 | After setting the parameters, |
---|
843 | select <guilabel>Next</guilabel>. Another parameter dialog |
---|
844 | will appear where error handling options can be set among |
---|
845 | with |
---|
846 | </para> |
---|
847 | |
---|
848 | <variablelist> |
---|
849 | <varlistentry> |
---|
850 | <term> |
---|
851 | <guilabel>Log file</guilabel> |
---|
852 | </term> |
---|
853 | <listitem> |
---|
854 | <para> |
---|
855 | Setting this parameter will turn on logging. The |
---|
856 | plug-in will give detailed information about how the |
---|
857 | file is parsed. This is useful for resolving file |
---|
858 | parsing issues. |
---|
859 | </para> |
---|
860 | </listitem> |
---|
861 | </varlistentry> |
---|
862 | <varlistentry> |
---|
863 | <term> |
---|
864 | <guilabel>Dry run</guilabel> |
---|
865 | </term> |
---|
866 | <listitem> |
---|
867 | <para> |
---|
868 | Enable or disable test run of the plug-in. If |
---|
869 | enabled the plug-in will parse and simulate an |
---|
870 | import. When enabling this option you should set |
---|
871 | the <guilabel>Log file</guilabel> also. The dry run |
---|
872 | mode allows testing of large imports and updates by |
---|
873 | creating a log file that can be examined for |
---|
874 | inconsistencies before actually performing the action |
---|
875 | without a safety net. |
---|
876 | </para> |
---|
877 | </listitem> |
---|
878 | </varlistentry> |
---|
879 | </variablelist> |
---|
880 | |
---|
881 | |
---|
882 | <para> |
---|
883 | During file parsing the plug-in will look for items |
---|
884 | referenced on each line. There are three outcomes of this |
---|
885 | item search |
---|
886 | </para> |
---|
887 | |
---|
888 | <itemizedlist> |
---|
889 | <listitem> |
---|
890 | <para> |
---|
891 | No item is found. Depending on parameter settings this |
---|
892 | may abort the plug-in, the plug-in may ignore the |
---|
893 | line, or a new item is created. |
---|
894 | </para> |
---|
895 | </listitem> |
---|
896 | <listitem> |
---|
897 | <para> |
---|
898 | One item is found. This is the item that is going to |
---|
899 | be updated. |
---|
900 | </para> |
---|
901 | </listitem> |
---|
902 | <listitem> |
---|
903 | <para> |
---|
904 | More than one item is found. Depending on parameter |
---|
905 | settings this may abort the plug-in or the plug-in may |
---|
906 | ignore the line. |
---|
907 | </para> |
---|
908 | </listitem> |
---|
909 | </itemizedlist> |
---|
910 | |
---|
911 | </sect2> |
---|
912 | |
---|
913 | <sect2 id="import_data.batch.comments"> |
---|
914 | <title>Comments on the item batch importers</title> |
---|
915 | |
---|
916 | <para> |
---|
917 | The item batch importers are not designed to change or |
---|
918 | create annotations. There is another plug-in for this, see |
---|
919 | <xref linkend="annotations.massimport" /> for an |
---|
920 | introduction to the annotation importer. |
---|
921 | </para> |
---|
922 | |
---|
923 | <para> |
---|
924 | There is no need to map all columns when running the |
---|
925 | importer. When new items are created usually the only |
---|
926 | mandatory entry is <property>Name</property>, and when |
---|
927 | running the plug-in in update mode only the column defining |
---|
928 | the item identification property needs to be defined. This |
---|
929 | can be utilized when only one or a few properties needs to |
---|
930 | be updated; map only columns that should be changed and the |
---|
931 | plug-in will ignore the other properties and leave them as |
---|
932 | they are already stored in BASE. This also means that if one |
---|
933 | property should be deleted then that property must be mapped |
---|
934 | and the value must be empty in the file. Note, multi-item |
---|
935 | reference cannot be deleted with the batch importer, and |
---|
936 | deletion of multi-item references must be done using the web |
---|
937 | interface. |
---|
938 | </para> |
---|
939 | |
---|
940 | <para> |
---|
941 | When parent and other relations are created using the |
---|
942 | plug-in the referenced items are properly linked and |
---|
943 | updated. This means that when a quantity that decreases a |
---|
944 | referenced item is used, the referenced item is updated |
---|
945 | accordingly. In consequence, if the relation is removed in a |
---|
946 | later update - maybe wrong parent was referenced - the |
---|
947 | referenced item is restored and any decrease of quantities |
---|
948 | are also reset. |
---|
949 | </para> |
---|
950 | |
---|
951 | <para> |
---|
952 | A common mistake is to forget to make sure that some of the |
---|
953 | referenced items already exists in BASE, or at least are |
---|
954 | accessible for the user performing the import. Items such as |
---|
955 | protocols and labels must be added before referencing |
---|
956 | them. This is of course also true for other items but during |
---|
957 | batch import one usually follows the natural order of first |
---|
958 | importing biosources, samples, extracts, and so on. In this |
---|
959 | way the parents are always present and may be referenced |
---|
960 | without any issues. |
---|
961 | </para> |
---|
962 | |
---|
963 | </sect2> |
---|
964 | |
---|
965 | </sect1> |
---|
966 | |
---|
967 | </chapter> |
---|