The UK government is currently running a survey to elicit ideas on how it should update data.gov.uk. As one of the oldest such portals, despite various stages of evolution and upgrade, it is, unsurprisingly, showing signs of age. Yesterday’s blog post by Owen Boswarva offers a good summary of the kind of issues that arise when considering the primary and secondary functions of a data portal. Boswarva emphasizes the need for discovery metadata (title, description, issue date, subject matter etc.) which is certainly crucial, but so too is structural metadata (use our Tabular Metadaata standards to describe your CSV, for example), licensing information, the use of URIs as identifiers for and within datasets, information about the quality of the data, location information, update cycle, contact point, feedback loops, usage information and more.
It’s these kind of questions that gave rise to the Data on the Web Best Practices WG whose primary document is now at Candidate Recommendation. We need help from the likes of Owen Boswarva and data.gov.* portals around the world to help us gather evidence of implementation of course. The work is part of a bigger picture that includes two ancillary vocabularies that can be used to provide structured information about data quality and dataset usage, the outputs of the Spatial Data on the Web Working Group, in which we’re collaborating with fellow standards body OGC, and the Permissions and Obligations Expression WG that is developing machine readable license terms and more, beginning with the output of the ODRL Community Group.
A more policy-oriented view is provided by a complementary set of Best Practices developed by the EU-funded Share-PSI project. It was under the aegis of that project that the role of the portal was discussed at great length at a workshop I ran back in November last year. That showed that a portal must be a lot more than a catalog: it should be the focus of a community.
Last year’s workshop took place a week after the launch of the European Data Portal, itself a relaunch in response to experience gained through running earlier versions. One of the aims of that particuilar portal is that it should act as a gateway to datasets available throughout Europe. That implies commonly agreed discovery metadata standards for which W3C Recommends the Data Catalog Vocabulary, DCAT. However, it’s not enough. What profile of DCAT should you use? The EU’s DCAT-AP is a good place to start but how do you validate against that? Enter SHACL for example.
Those last points highlight the need for further work in this area which is one of the motivations for the Smart Descriptions & Smarter Vocabularies (SDSVoc) workshop later this year that we’re running in collaboration with the VRE4EIC project. We also want to talk in more general terms about vocabulary development & management at W3C.
Like any successful activity, if data is to be discoverable, usable, useful and used, it needs to be shared among people who have a stake in the whole process. We need to think in terms of an ecosystem, not a shop window.