Abstract
Over time, I have worked on a suite of data infrastructure projects,
ranging from small to large and simple to complex. Some have been
successful, others less so, and a couple are so notorious they are
generally not mentioned in polite company. Sometimes the outcome was
predictable—well-organized and well-managed projects are likely to
succeed, and those with clear flaws often don’t rise above them—but
sometimes it was unexpected. Building effective data infrastructure is
not a solved problem, it is an area of research in and of itself, and
the larger the project, the more difficult and uncertain it is. For each
of my current and past projects, I consider their greatest strength,
their showcase best practice, their weaknesses, and their epic fails, as
well as the external and internal factors that contributed to positive
or negative outcomes. Despite the heterogeneity, certain patterns emerge
as lessons learned: Listening to your users is critical, and there are
no short-cuts; it requires a long-term investment, including cultivating
individual relationships. Good is better than more; hardening
infrastructure is time consuming but critical for adoption. Have a clear
mission with concrete benefits to a defined user community, and expand
thoughtfully from there. In spite of our best intentions, meeting
emerging community expectations usually requires a catalyst. External
groups identifying best practices, mini-grants for implementation, and
multi-project groups collaborating to rise together give needed nudges.
Interestingly, I have never been involved with a project that failed
because it picked the wrong technology. It can cause pain and consume
resources, but I have not seen a terminal impact. Finally, invest in the
people behind the infrastructure. Committed, engaged staff supported by
ongoing professional development, rational management, and sufficient
autonomy can, and regularly do, accomplish the impossible.