Nowadays astronomical data is recorded in two ways. Either remotely, for example with space telescopes, or in ground based observatories. In the latter case however, astronomers do not usually perform the observation by themselves, but they propose which object (and how) should be observed. Then, if the proposal is approved, a mission control centre or a resident astronomer in a specific observatory sets up the instrument to perform the observations. What happens next?
Usually (with few exceptions), the data is almost immediately moved to a server and access is granted to the astronomer that proposed that observation via a password-protected download link. The data is then kept private for a period of time that can vary but is almost always of one year, although some observatories keep a lifetime proprietary policy -which I repute a true scientific scandal. After that year, the data is unlocked and the entire astronomical community can access the (now public) data.
In my opinion this process is today overly slow and inefficient. Right before the internet era, the data was collected and recorded on different means and then directly physically delivered to the astronomer. The data was then transferred to an hard drive and processed with relatively slow computers. If a group of collaborators was working on the same data, then a physical copy had to be made and delivered to the collaborators, in what appears today as a tedious never ending process. Then when the results were finally collected, cross-checked and approved, the paper had to be written. I have been lucky enough to never experience the need to perform a bibliographic search without internet (and ADS), but you can guess what that meant back then.
It is straightforward to understand how much internet has improved and sped up the entire process of data reduction, analysis and results validation. So what is the need today of a one year proprietary data ? It is certainly true that it might take about one year before a peer-reviewed paper is accepted and published, even today. But the data analysis process does not (in most circumstances) last for much more than a few weeks. There are recent examples of space telescopes that immediately release data only in public format and the process seems to work very well. In my experience these data remain “hot” for a longer time, as many different groups can work on the data at almost the same time. This usually leads to more publications per observation performed and to a richer scientific debate. If other groups can access the data at the same time then it is also possible to cross-check on the fly the results of other groups and possibly correct mistakes before they spread too far. Also, it has happened quite often that public data is used for a reason different than the one originally proposed and this has triggered unexpected and/or serendipitous discoveries. Using public data allows also a more efficient planning of future related observations.
I have been working with such satellites (e.g., with Swift and with the now decommissioned RXTE) and the downside of all this is certainly the stressful and painful constant fear of being scooped one moment or the other by someone just slightly faster than you. It means also constantly working under pressure, thus increasing the chances of errors in the data analysis.
However, despite this, I believe that a one year long proprietary data window is still not fully justified. One month or two is more than sufficient to almost certainly guarantee the lead of the proposer on a certain work in order to be the first to publish a specific result obtained with the data collected. Furthermore, the common practice of posting papers just submitted to refereed journals on the arXiv would increase that chance even more. The possibility of cross-checking in “real-time” the results of other groups by far compensates the increase of possible mistakes due to a faster data analysis process (which in any case is not even so true as basically no-one nowadays really works on the same data for one year).
The lack of public data access prevents other scientists to schedule their observations more efficiently, as the non-accessible information contained in those data could be crucial to plan a different observation with other observatories.
In short, my proposal is to reduce the proprietary data time window down to a minimal period of one month, just to avoid the big stress that immediately public data can have. This will sharply increase the scientific production and quality and decrease the costs of research in our field. Is there any specific strong reason why shouldn’t we do this ?
Leave a Reply