Open Software

Open Science

In this presentation I try to explain a bidirectional relationship between the open software and the Open science. The first first direction goes from the th Open Software to the Open Science. The idea is that the open software is an important inspiration of to actual vision of open science both in terms of collaborative development and in terms of thecnological infrastructures. The keywords are collaborative developments and (a teechnological idea) of Version Control System. And this is maily the past and the present. The opposite direction maybe not so well investigated, from Open Science to Open Software. The growth of Open Science will provide an important contribute to the open source software use and developmente. The keywords are computational science, reproducibility

Open software as collaborative development model

An example: evolution of QGIS

Generally, behind an open source software there is an open communities of developers, users, someone that write documentations and so on. To understand better, let see a video about the evolution of an open source project like QGis. At the beginning there was only a developer If we go on a bit ... the project has grown. there are many developers (about 10) that maybe work at the same moment at the project and maybe at the same files. let's move on yet ... a bigger project with more developers And now beginning to be a real community. They need to choose a leader and a committee to take decisions and so on. We can call this a self-structuring process of the software community. And there is a tool that allows the collaborative development of the software and allow to create this videos after more thant 10 years.

Version Control System

The long history: cvs, svn, mercurial, git

records changes over time
you can recall specific versions later
lets multiple users simultaneously edit their own copies
strategies: merging and conflict resolutions

Nothing that is committed to version control is ever lost. Since all old versions of files are saved, it’s always possible to go back in time to see exactly who wrote what on a particular day, or what version of a program was used to generate a particular set of results. As we have this record of who made what changes when, we know who to ask if we have questions later on, and, if needed it, revert to a previous version, much like the “undo” feature in an editor. When several people collaborate in the same project, it’s possible to accidentally overlook or overwrite someone’s changes: the version control system automatically notifies users whenever there’s a conflict between one person’s work and another’s.

GitHub.com

2008: Git + Social tools = GitHub.

Code, documentations, discussions, reviews.

Transparency, visible feedback

The lessons of the open source

Version Control System and collaborative web platforms
Self-structuring communities
Cooperation between volunteers and professionals
Licenses: what exactly "Open" means

Computational science

Our view is that we have reached the point that, with some exceptions, anything less than release of actual source code is an indefensible approach for any scientific results that depend on computation, because not releasing such code raises needless, and needlessly confusing, roadblocks to reproducibility.

Roger D. Peng, Reproducible Research in Computational Science, Science, 2011 http://www.sciencemag.org/content/334/6060/1226

One aim of the reproducibility standard is to fill the gap in the scientific evidence-generating process between full replication of a study and no replication. Between these two extreme end points, there is a spectrum of possibilities, and a study may be more or less reproducible than another depending on what data and code are made available Software is science, in the sense that when developed in the course of computational science research, it can't be classified as "infrastructure" or "support". If a telescope is infrastructure for astronomy, then its equivalent or computational mathematics is a supercomputer. On the other hand, code produced to prove a theorem, simulate an explosion or study an image denoising algorithm is the subject-matter of the science being investigated. We need this work to be recognized and time and effort invested in it must be rewarded by scholarship standards. On the other hand, when science is based on software foundations, the computational part of the research should be subject to validation by the scientific method. This implies that every program used in the elaboration of a research paper should, at least, be usable by referees, and if possible be publicly available is source form. Open Source software is part of the equation, but it is not sufficient to answer the questions raised here.

Computational Science Wheel

For a growing number of scientists, though, the process looks like this: The data that the scientist collects is stored in an open access repository like figshare or Zenodo, possibly as soon as it’s collected The scientist creates a new repository on GitHub to hold her work. As she does her analysis, she pushes changes to her scripts (and possibly some output files) to that repository. She also uses the repository for her paper; that repository is then the hub for collaboration with her colleagues. When she’s happy with the state of her paper, she posts a version to arXiv or some other preprint server to invite feedback from peers. Based on that feedback, she may post several revisions before finally submitting her paper to a journal. The published paper includes links to her preprint and to her code and data repositories, which makes it much easier for other scientists to use her work as starting point for their own research. This open model accelerates discovery: the more open work is, the more widely it is cited and re-used. However, people who want to work this way need to make some decisions about what exactly “open” means and how to do it. The conceptual stages of your work are documented, including who did what and when. Every step is stamped with an identifier (the commit ID) that is for most intents and purposes is unique. You can tie documentation of rationale, ideas, and other intellectual work directly to the changes that spring from them. You can refer to what you used in your research to obtain your computational results in a way that is unique and recoverable. With a distributed version control system such as Git, the version control repository is easy to archive for perpetuity, and contains the entire history.

The growth of Open Science will provide an important contribute to the Open Source software.

Open Science needs always more Open Source software

A concrete proposal

There are excellent IT solutions
There are great experiences around the world
We (OPERAS, ISMAR, CNR?) could start to collaborate with "experts":
- Software Carpentry
- Data Carpentry

software-carpentry.org

DataCarpentry.org

Collaborative Free Open Source Development

by S. Menegon is licensed under a Creative Commons Attribution 4.0 International License.