Metadata Archeology – Finding Affiliations in Crossref Metadata
One of the interesting challenges encountered in the migration of Dryad metadata technology into the California Digital Library is the identification of organizations associated with the research data archived in the Dryad Repository. This information is not included in the original Dryad metadata, so digging around for it, i.e. “metadata archeology”, is required.
Dryad includes DOIs for the papers associated with the data, so Crossref is the first place to look for affiliation data. We can use the Crossref Participation Reports to find out which Crossref collections include affiliations.
Recent analysis of Crossref metadata showed that roughly 11% of the Crossref metadata include affiliation information and that this number has almost doubled over the last several years. Thus, affiliations are a small, but growing, part of these metadata. It is well known that affiliation data are generally not well normalized and, therefore, difficult to work with. Nevertheless, identifying good examples of consistent affiliation information providers within the Crossref community and automating the identification of organizations are important steps towards the goal of assigning identifiers (e.g. RORs) to these organizations. Starting this process soon will set the stage for improvements across the broader scientific publication community.
Crossref members that include affiliations in 100% of their metadata records fall into three groups. Some include affiliation information in addition to many other elements in the Crossref Participation reports. These collections look like the Korean Association of Quality Assurance for Clinical Laboratory collection shown in Figure 1 and generally have large numbers for backfile and current in the Table below. Their metadata includes content in many elements along with affiliation content. In addition to being complete for affiliations, this collection shows the largest increase in completeness, from 1.17 to 6.91, across all elements of any in this sample. This increase is clear in Figure 1 as the backfile period (orange) shows content for just two elements, affiliations and open references while the current period (blue) is complete for six.
The second group of collections has complete affiliation information but is missing some of the other elements included in the Crossref Participation reports. The Information Processing Society of Japan (Figure 2) is an example from this group. Like the first example, this member has greatly increased completeness for affiliations (and similarity checking) recently, but they have not yet taken advantage of many other metadata elements and associated services.
Finally, there are new kids on the block, recently joined members with no backfile (zero in the backfile column in the Table) , but very complete metadata in their new collections. These members have joined Crossref after the creation of new services enabled by these metadata elements and they are taking advantage of them. The University of Chitral (Figure 3) is a good example of this group. Their collection has complete metadata for seven of the Participation Report elements and nearly complete (0.93) for the last. Keep up the great work!
The slideshow and Table below shows a selection of the members with complete affiliations during the current period (2017 – present). Scroll the slideshow or click the member name to see the participation report plots. The completeness indices (backfield and current) and the change in the index between the backfile and current time are shown.
All of the collections in this Table can provide examples of affiliation content in all records. However, it is important to keep in mind several provisos. First, many papers include multiple authors and only one of the authors needs to have affiliation information for the record as whole to be counted as having affiliations. Also, existence is not a measure of quality or usability. The next challenge is being able to convert this affiliation information into standard organization identifiers. That is the target of current work.
Crossref Metadata for Selected Members with Complete Affiliations
Click the member name to see plot.
Note: Data in this Table were collected on June 18, 2019