We need to find a way of quantifying the benefits of "cyberinfrastructure"--the technology that underpins and enables eScience. We need this information if we are to justify spending on infrastructure (or not), decide what infrastructure to build, and understand how to improve the infrastructures that we have.
But quantifying benefits is hard.
An anecdote: In building the Globus-based Earth System Grid (ESG: see the picture for participating sites) we put a lot of effort into
instrumentation and quantifying usage. Thus we can know
that our more than 3000 registered users have downloaded more than 100 Terabytes of climate
simulation data. Yet this data does not provide any real insight into
whether the people downloading that data found it useful--or did anything useful with it. We did survey users, and got useful information, but response rates were low.
Fortunately, one of the two data collections made accessible via ESG was the International Panel on Climate Change (IPCC) assessment simulation data, and the IPCC team was able to document that over 300 scientific papers had been produced [by early 2006] from data downloaded from ESG.
However, we can't always get such nice data. Thus, we may ask: What metrics are important? What data do we need? What is feasible to get? How do we get it? What can it tell us (and what not)?
I think we need to learn how to build infrastructures that can collect this sort of information automatically. We should involve social scientists in designing such systems and in assessing their effectiveness.