As we usher in this new era of open data and government transparency by making raw data available to the public in platform independent formats on Data.gov, we could be overlooking one of the simplest and most sincere opportunities to show what the open data movement is about: Linked Data.
I am hoping to create interest for a community-led project to prototype an RDF enabled dataset for Data.gov and link it to other related data sources in the Linked Data Cloud. This community project would follow direction prescribed by Tim Berners Lee ( see Putting Government Data Online ) and countless other semantic web experts to provide a meaningful proof of concept of linked government data.
Linked data is the key to connecting government data sets to other government data and external data sources on the web. We can establish a more widely usable and universally valuable dataset on the emerging Linked Data cloud by publishing an example set of data from Data.gov as RDF (Resource Description Framework). Looking at the latest Linking Open Data Cloud Diagram by Chris Bizer you can see several government resources are already available, but not Data.gov and not nearly enough.
Data.gov does currently provide an access point for raw data in non-proprietary and useful standards with wide adoption (e.g. XML web services). Rather than just play catch up with current calls for more useful and accessible formats, there is an opportunity to demonstrate and acknowledge a better and more forward looking standard. Data.gov could be improved by establishing a more widely connected and universally valuable dataset in Tim Berners Lee's call for RDF enabled Linked Data.
Currently, government data is distributed across, and buried within agency sites and various aggregated sites. Data.gov is a good start towards a central clearinghouse of available data, but the technical community has recognized and critiqued the shortcomings of the format, frequency, accessibility and usefulness of the data currently being published. Data.gov contains many options for datasets that can be RDF enabled to overcome these issues. Therefore, Data.gov is a highly visible website that would provide a good platform on which such a project could be built.
Newly appointed National CIO Vivek Kundra stated that he is "...deeply committed to opening up data to make it machine readable and easier for people to use, mix and mash..." While many formats may serve the need of making data easier to use, RDF is the most widely accepted standard that allows data to be linked and "machine readable". As the government progresses in its adoption of Web 2.0, it is important that it consider building data relationships through the addition of RDF and head towards the semantic web vision of Web 3.0.
The Linked Data.gov Experiment Project will call upon the collaboration of committed individuals, government and business to use their creativity and passion to advance the goals of transparency and efficiency with this community led project. If a community of specialists can come together and produce an RDF enabled data set that is linked, we will prove several concepts fundamental to the goals of the new administration and the technical community at the same time. Citizens win too by gaining a more optimized set of data from which to draw actionable information from their government.
The barriers to success on this project are very low. Four simple things are required:
- Getting together those in the community with the commitment and know how to attempt this project. A challenge that could be mitigated by a call to action by groups like the Sunlight Foundation and our thriving open source/open data community where existing interest can be leveraged.
- Finding well-structured data on Data.gov to serve our goals. A challenge that could be mitigated by the planned addition of more data available soon.
- Finding the time to organize and work. A challenge that could be mitigated by good online collaboration tools and/or a bar camp style gathering. Again potentially drawing from existing groups.
- Publish and document the results and findings of the project. A public blog or website would serve nicely.The result of this project would provide a meaningful proof of concept and example of the benefits of future RDF publishing efforts. As well as highlight the contributions of a committed community of forward thinking data advocates.