Authors: Egon Willighagen and Iseult Lynch
Text mining for (nano)materials is far from trivial. The proprietary SpringerNature Nano database extracts information for many materials, but it is not easy to find materials of a certain size in this database. Use of identifiers in text, for example as compact identifiers1, supports text mining. In a small project, we have starting mining the nanosafety literature for mentions of the JRC representative industrial nanomaterials which have an easily idenitiable reference name such as NM-100, NM-101 etc. which is used as an identifier in publications. The resulting associations between JRC nanomaterials and articles are stored in Wikidata, which we can then query to list an overview of all articles and to create a histogram of the number of articles using the JRC test nanomaterials by year (see Figure 1). To facilitate this search, each JRC nanomaterial has been given an ontology identifier or Internationalized Resource Identifier (IRI) as described in this article. For each material we can then further link out to Scholia to list literature specifically for one specific nanomaterial.2,3
Figure 1: Illustration of the increasing number of publications citing the JRC test nanomaterials and the ease of mining the literature for these specific terms to collect literature relating to the health and environmental implications of a specific nanomaterial where the nanomaterials have specific and well-used unique identifiers.
The JRC list of nanomaterials however only covers a small number of representative test nanomaterials. To be able to use the text mining approach more generally, but also for database searching, dataset integration, nanomaterials modelling and grouping and to facilitate separation of nanomaterials into different nanoforms in regulatory settings, a more general identifier is needed. While such identifiers have long existed for chemicals (e.g. SMILES, InChI etc.), encoding of nanomaterial properties (such as composition including stabilizers, coatings, dopants etc., size, shape, crystal phase etc.) has proven a challenge. Over the last year, NanoCommons worked closely with the H2020 nanoinformatics project NanoSolveIT to develop a proposal for how to extend the InChI used for small molecules to nanomaterials, leading to the InChI for nano or NInChI4, shown schematically in Figure 2. A preliminary version of a tool to generate NInChIs has also been developed, although this will likely evolve as the NInChI becomes formalised as a standard. This proposal has been very well received by the community and will now undergo a series of iterations to establish it as part of the InChI standard5, which will facilitate its adoption into databases worldwide including Pubchem.
Figure 2: Schematic illustration of how the nanomaterials InChI (NInChI) will be constructed as three layers, the first indicating the version number of the standard (currently 0 as its an alpha version) and then the middle layer will indicate the specifics of the various components listed alphabetically as well as details such as morphology, size, shape and crystal structure that are linked to their biological activity), and the final layer will indicate in which order the constituents are put together to form the final nanomaterial (core, shell, coating etc.).
- Wimalaratne, S.M., et al., Uniform resolution of compact identifiers for biomedical data. Scientific Data volume 5, Article number: 180029 (2018).
- Nielsen F.Å., Mietchen D., Willighagen E. (2017) Scholia, Scientometrics and Wikidata. In: Blomqvist E., Hose K., Paulheim H., Ławrynowicz A., Ciravegna F., Hartig O. (eds) The Semantic Web: ESWC 2017 Satellite Events. ESWC 2017. Lecture Notes in Computer Science, vol 10577. Springer, Cham. https://doi.org/10.1007/978-3-319-70407-4_36
- Willighagen, Egon; Jahn, Najko; Nielsen, Finn Årup (2018): The EU NanoSafety Cluster as Linked Data visualized with Scholia. figshare. Journal contribution. https://doi.org/10.6084/m9.figshare.6727931.v2
- Heller SR, Pletnev I, Stein S, Tchekhovskoi D, InChI, the IUPAC International Chemical Identifier. J. Cheminformatics, 2015, 7: 23 https://doi.org/10.1186/s13321-015-0068-4
- Lynch I, Afantitis A, Exner T, Himly M, Lobaskin V, Doganis P, Maier D, Sanabria N, Papadiamantis AG, Rybinska-Fryca A, Gromelski M, Puzyn T, Willighagen E, Johnston BD, Gulumian M, Matzke M, Green Etxabe A, Bossa N, Serra A, Liampa I, Harper S, Tämm K, Jensen ACØ, Kohonen P, Slater L, Tsoumanis A, Greco D, Winkler DA, Sarimveis H, Melagraki G. Can an InChI for nano address the need for a simplified representation of complex nanomaterials across experimental and nanoinformatics studies? Nanomaterials, 2020, 12, 2493. https://www.mdpi.com/2079-4991/10/12/2493