Skip to Main Content

Web of Science XML Data: Overview


The Web of Science XML Data includes metadata from over 12,500 journals spanning over 250 science, social science and humanities disciplines. Conference proceedings and book metadata are also available.

Data are available back to 1900 and include over 63 million article records and 1 billion cited references to date.

Some key data elements:

  • ORCID identifiers are included in over 6.2 million records to support author disambiguation
  • funding acknowledgements, including agency and grant numbers, are indexed
  • full author and institutional affiliation information are indexed to enhance attribution of research and collaboration analysis
  • extensive unification of institution names to aggregate complex naming variations and sub-organizations


The data is 226 GB.

The following editions and date ranges are included in the 2021 Web of Science XML Data refresh:

  • Science Citation Index Expanded XML (SCIE XML) – 1900-2021
  • Social Sciences Citation Index XML (SSCI XML) – 1900-2021
  • Arts & Humanities Citation Index XML (AHCI XML) – 1975-2021
  • Conference Proceedings Citation Index-Science & Technical XML (CPCI-S XML) – 1990-2021
  • Conference Proceedings Citation Index-Social Sciences & Humanities XML (CPCI-SSH XML) – 1990-2021
  • Book Citation Index-Science XML (BKCI-S XML) – 2005-2021
  • Book Citation Index-Social Sciences & Humanities XML (BKCI-SSH XML) – 2005-2021
  • Emerging Sources Citation Index XML (ESCI XML) – 2005-2021


PMIDs (PubMed IDentifiers) have been integrated into all editions.

There are Cited Reference files for all editions.

A standalone Times Cited text file and DAIS ID (Distinct Author Identification System ID) text file are also provided.