Structure-based knowledge acquisition from electronic lab notebooks for research data provenance documentation

1

Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PAC, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone S-A, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016; 3:160018. https://doi.org/10.1038/sdata.2016.18.

Article  Google Scholar 

2

Jacobsen A, de Miranda Azevedo R, Juty N, Batista D, Coles S, Cornet R, Courtot M, Crosas M, Dumontier M, Evelo CT, Goble C, Guizzardi G, Hansen KK, Hasnain A, Hettne K, Heringa J, Hooft RWW, Imming M, Jeffery KG, Kaliyaperumal R, Kersloot MG, Kirkpatrick CR, Kuhn T, Labastida I, Magagna B, McQuilton P, Meyers N, Montesanti A, van Reisen M, Rocca-Serra P, Pergl R, Sansone S-A, da Silva Santos LOB, Schneider J, Strawn G, Thompson M, Waagmeester A, Weigel T, Wilkinson MD, Willighagen EL, Wittenburg P, Roos M, Mons B, Schultes E. FAIR principles: Interpretations and implementation considerations. Data Intell. 2020; 2(1-2):10–29. https://doi.org/10.1162/dint_r_00024.

Article  Google Scholar 

3

Yu F, Zhou B, Lu T, Gu N. Research on data provenance model for multidisciplinary collaboration. In: Computer Supported Cooperative Work and Social Computing. Singapore: Springer: 2018. p. 32–49. https://doi.org/10.1007/978-981-13-3044-5_3.

Google Scholar 

4

Moreau L, Groth P. Provenance: An introduction to PROV. Synth Lect Semant Web Theory Technol. 2013; 3(4):1–129. https://doi.org/10.2200/s00528ed1v01y201308wbe007.

Article  Google Scholar 

5

Belhajjame K, B’Far R, Cheney J, Coppens S, Cresswell S, Gil Y, Groth P, Klyne G, Lebo T, McCusker J, Miles S, Myers J, Sahoo S, Tilmes C. Prov-dm: The prov data model. Project report, World Wide Web Consortium. 2013. https://www.w3.org/TR/2013/REC-prov-dm-20130430/.

6

Hogan A, Blomqvist E, Cochez M, d’Amato C, de Melo G, Gutierrez C, Gayo JEL, Kirrane S, Neumaier S, Polleres A, Navigli R, Ngomo A-CN, Rashid SM, Rula A, Schmelzeisen L, Sequeda J, Staab S, Zimmermann A. Knowledge Graphs. 2020. http://arxiv.org/abs/2003.02320.

7

Staehlke S, Koertge A, Nebe B. Intracellular calcium dynamics dependent on defined microtopographical features of titanium. Biomaterials. 2015; 46:48–57. https://doi.org/10.1016/j.biomaterials.2014.12.016.

Article  Google Scholar 

8

Schröder M, Staehlke S, Nebe B, Krüger F. Towards in-situ knowledge acquisition for research data provenance from electronic lab notebooks. In: Proceedings of the 1st Workshop on Research Data Management for Linked Open Science (DaMaLOS) Co-located with 19th International Semantic Web Conference. Athens: 2020. https://doi.org/10.4126/FRL01-006423288.

9

Bar Y. Calcium imaging v1. ZappyLab, Inc. 2018. https://doi.org/10.17504/protocols.io.tqfemtn.

10

CARPi N, Minges A, Piel M. eLabFTW: An open source laboratory notebook for research labs. J Open Source Softw. 2017; 2(12):146. https://doi.org/10.21105/joss.00146.

Article  Google Scholar 

11

Ram S, Liu J. Understanding the semantics of data provenance to support active conceptual modeling. In: Lecture Notes in Computer Science. Berlin: Springer: 2007. p. 17–29. https://doi.org/10.1007/978-3-540-77503-4_3.

Google Scholar 

12

Herschel M, Diestelkämper R, Lahmar HB. A survey on provenance: What for? what form? what from?VLDB J. 2017; 26(6):881–906. https://doi.org/10.1007/s00778-017-0486-1.

Article  Google Scholar 

13

Lim C, Lu S, Chebotko A, Fotouhi F. Prospective and retrospective provenance collection in scientific workflow environments. In: 2010 IEEE International Conference on Services Computing. IEEE: 2010. https://doi.org/10.1109/scc.2010.18.

14

Belhajjame K, Wolstencroft K, Corcho O, Oinn T, Tanoh F, William A, Goble C. Metadata management in the taverna workflow system. In: 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID). IEEE: 2008. https://doi.org/10.1109/ccgrid.2008.17.

15

Altintas I, Barney O, Jaeger-Frank E. Provenance collection support in the kepler scientific workflow system. In: Provenance and Annotation of Data. Berlin: Springer: 2006. p. 118–32. https://doi.org/10.1007/11890850_14.

Google Scholar 

16

Goecks J, Nekrutenko A, Taylor J, Team TG. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010; 11(8):86. https://doi.org/10.1186/gb-2010-11-8-r86.

Article  Google Scholar 

17

Samuel S, König-Ries B. Provbook: Provenance-based semantic enrichment of interactive notebooks for reproducibility. In: International Semantic Web Conference (P&D/Industry/BlueSky): 2018. http://ceur-ws.org/Vol-2180/paper-57.pdf.

18

Soldatova LN, Nadis D, King RD, Basu PS, Haddi E, Baumlé V, Saunders NJ, Marwan W, Rudkin BB. EXACT2: the semantics of biomedical protocols. BMC Bioinformatics. 2014; 15(S14). https://doi.org/10.1186/1471-2105-15-s14-s5.

19

Giraldo OL, Castro AG, Corcho O. SMART Protocols: SeMAntic RepresenTation for Experimental Protocols. In: LISC @ ISWC: 2014. p. 36–47. https://oa.upm.es/36778/1/INVE_MEM_2014_194359.pdf.

20

Hughes G, Mills H, Roure DD, Frey JG, Moreau L, m.c.schraefel, Smith G, Zaluska E. The semantic smart laboratory: a system for supporting the chemical eScientist. Org Biomol Chem. 2004; 2(22):3284. https://doi.org/10.1039/b410075a.

Article  Google Scholar 

21

Moreau L, Batlajery BV, Huynh TD, Michaelides D, Packer H. A templating system to generate provenance. IEEE Trans Softw Eng. 2018; 44(2):103–21. https://doi.org/10.1109/tse.2017.2659745.

Article  Google Scholar 

22

Curcin V, Fairweather E, Danger R, Corrigan D. Templates as a method for implementing data provenance in decision support systems. J Biomed Inform. 2017; 65:1–21. https://doi.org/10.1016/j.jbi.2016.10.022.

Article  Google Scholar 

23

Vogt L, D’Souza J, Stocker M, Auer S. Toward representing research contributions in scholarly knowledge graphs using knowledge graph cells. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries In 2020. New York: ACM: 2020. https://doi.org/10.1145/3383583.3398530.

Google Scholar 

24

Samuel S, König-Ries B. The Semantic Web: ESWC 2017 Satellite Events In: Blomqvist E, Hose K, Paulheim H, Ławrynowicz A, Ciravegna F, Hartig O, editors. Cham: Springer: 2017. p. 17–20. https://doi.org/10.1007/978-3-319-70407-4_4.

25

Moreau L, Groth P, Cheney J, Lebo T, Miles S. The rationale of PROV. J Web Semant. 2015; 35:235–57. https://doi.org/10.1016/j.websem.2015.04.001.

Article  Google Scholar 

26

Ciccarese P, Soiland-Reyes S, Belhajjame K, Gray AJ, Goble C, Clark T. PAV ontology: provenance, authoring and versioning. J Biomed Semant. 2013; 4(1):37. https://doi.org/10.1186/2041-1480-4-37.

Article  Google Scholar 

27

Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone S-A, Scheuermann RH, Shah N, Whetzel PL, Lewis S. The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnol. 2007; 25(11):1251–5. https://doi.org/10.1038/nbt1346.

Article  Google Scholar 

28

Smith B, Kumar A, Bittner T. Basic Formal Ontology for Bioinformatics: IFOMIS Reports; 2005.

29

Bartusch F, Hanussek M, Krüger J. Automatic generation of provenance metadata during execution of scientific workflows In: Atkinson M, Gesing S, editors. Proceedings of the 10th International Workshop on Science Gateways (IWSG 2018): 2018. http://ceur-ws.org/Vol-2357/paper8.pdf.

30

Murta L, Braganholo V, Chirigati F, Koop D, Freire J. noworkflow: Capturing and analyzing provenance of scripts In: Ludäscher B, Plale B, editors. Provenance and Annotation of Data and Processes. Cham: Springer: 2015. p. 71–83. https://doi.org/10.1007/978-3-319-16462-5_6.

Google Scholar 

31

Bose R, Frew J. Lineage retrieval for scientific data processing: A survey. ACM Comput Surv. 2005; 37(1):1–28. https://doi.org/10.1145/1057977.1057978.

Article  Google Scholar 

32

Davidson SB, Freire J. Provenance and scientific workflows: Challenges and opportunities. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD ’08). New York: Association for Computing Machinery: 2008. p. 1345–1350. https://doi.org/10.1145/1376616.1376772.

Google Scholar 

33

Deelman E, Gannon D, Shields M, Taylor I. Workflows and e-science: An overview of workflow system features and capabilities. Futur Gener Comput Syst. 2009; 25(5):528–40. https://doi.org/10.1016/j.future.2008.06.012.

Article  Google Scholar 

34

Budde K, Zimmermann J, Neuhaus E, Schroder M, Uhrmacher AM, van Rienen U. Requirements for documenting electrical cell stimulation experiments for replicability and numerical modeling. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE: 2019. https://doi.org/10.1109/embc.2019.8856863.

35

Rasmussen KB, Blank G. The data documentation initiative: a preservation standard for research. Arch Sci. 2007; 7(1):55–71. https://doi.org/10.1007/s10502-006-9036-0.

Article  Google Scholar 

36

Weibel S, Kunze JA, Lagoze C, Wolf M. Dublin Core Metadata for Resource Discovery. RFC Editor. 1998. https://doi.org/10.17487/RFC2413.

37

Kunze JA, Baker T. The Dublin Core Metadata Element Set. RFC Editor. 2007. https://doi.org/10.17487/RFC5013.

38

Data catalog vocabulary (dcat) - version 2. Project report, World Wide Web Consortium. 2020. https://www.w3.org/TR/2020/REC-vocab-dcat-2-20200204/.

39

Buchanan EM, Crain SE, Cunningham AL, Johnson HR, Stash H, Papadatou-Pastou M, Isager PM, Carlsson R, Aczel B. Getting started creating data dictionaries: How to create a shareable data set. Adv Methods Pract Psychol Sci. 2021; 4(1):251524592092800. https://doi.org/10.1177/2515245920928007.

Article  Google Scholar 

40

Rashid SM, McCusker JP, Pinheiro P, Bax MP, Santos HO, Stingone JA, Das AK, McGuinness DL. The semantic data dictionary – an approach for describing and annotating data. Data Intell. 2020; 2(4):443–86. https://doi.org/10.1162/dint_a_00058.

Article  Google Scholar 

41

Kunze JA, Littman J, Madden L, Scancella J, Adams C. The BagIt File Packaging Format (V1.0). RFC Editor. 2018. https://doi.org/10.17487/RFC8493.

42

Hankinson A, Brower D, Jefferies N, Metz R, Morley J, Warner S, Woods A. The oxford common file layout: A common approach to digital preservation. Publications. 2019; 7(2):39. https://doi.org/10.3390/publications7020039.

Article  Google Scholar 

43

Carragáin EO, Goble C, Sefton P, Soiland-Reyes S. A lightweight approach to research object data packaging. 2019. https://doi.org/10.5281/ZENODO.3250687.

44

Chard K, Gaffney N, Jones MB, Kowalik K, Ludascher B, McPhillips T, Nabrzyski J, Stodden V, Taylor I, Thelen T, Turk MJ, Willis C. Application of BagIt-serialized research object bundles for packaging and re-execution of computational analyses. In: 2019 15th International Conference on eScience (eScience). IEEE: 2019. https://doi.org/10.1109/escience.2019.00068.

45

Musen MA. The Protégé project: A look back and a look forward. AI Matters. Assoc Comput Mach Specif Interest Group Artif Intell. 2015;1(4). https://doi.org/10.1145/2557001.25757003.

46

Vrandečić D, Krötzsch M. Wikidata. Commun ACM. 2014; 57(10):78–85. https://doi.org/10.1145/2629489.

Article  Google Scholar 

47

Heath T, Bizer C. Linked data: Evolving the web into a global data space. Synth Lect Semant Web Theory Technol. 2011; 1(1):1–136. https://doi.org/10.2200/s00334ed1v01y201102wbe001.

Article  Google Scholar 

48

Gkoutos GV, Schofield PN, Hoehndorf R. The units ontology: a tool for integrating units of measurement in science. Database. 2012; 2012(0):033. https://doi.org/10.1093/database/bas033.

Article  Google Scholar 

49

Gremse M, Chang A, Schomburg I, Grote A, Scheer M, Ebeling C, Schomburg D. The BRENDA tissue ontology (BTO): the first all-integrating ontology of all organisms for enzyme sources. Nucleic Acids Res. 2010; 39(Database):507–13. https://doi.org/10.1093/nar/gkq968.

Article  Google Scholar 

50

Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, et al.ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 2007; 36(Database):344–50. https://doi.org/10.1093/nar/gkm791.

Article  Google Scholar 

51

Sarntivijai S, Lin Y, Xiang Z, Meehan TF, Diehl AD, et al.CLO: The cell line ontology. J Biomed Semant. 2014; 5(1):37. https://doi.org/10.1186/2041-1480-5-37.

Article  Google Scholar 

52

Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, et al.The ontology for biomedical investigations. PLoS ONE. 2016; 11(4):0154556. https://doi.org/10.1371/journal.pone.0154556.

Article  Google Scholar 

53

Kulkarni C, Xu W, Ritter A, Machiraju R. An annotated corpus for machine reading of instructions in wet lab protocols. In: Proc. of NAACL. ACL: 2018. https://doi.org/10.18653/v1/n18-2016.

54

Schröder M, LeBlanc H, Spors S, Krüger F. Intra-consortia data sharing platforms for interdisciplinary collaborative research projects. IT Inf Technol. 2020; 62(1):19–28. https://doi.org/10.1515/itit-2019-0039.

Google Scholar 

55

Mons B, Neylon C, Velterop J, Dumontier M, da Silva Santos LOB, Wilkinson MD. Cloudy, increasingly fair; revisiting the fair data guiding principles for the european open science cloud. Inf Serv Use. 2017; 37(1):49–56. https://doi.org/10.3233/ISU-170824.

Google Scholar 

56

Staehlke S, Nebe JB. Research data of Calcium Imaging after electrical stimulation. Zenodo. 2021. https://doi.org/10.5281/ZENODO.4923173.

留言 (0)

沒有登入
gif