Why Data Quality Matters
Follow her on LinkedIn
Originally published at http://stateofinnovation.thomsonreuters.com/why-data-quality-matters
Data takes the guesswork out of management and provides clear guidance on nearly every university-related issue facing leaders today. That’s why it’s so important to make sure the data that you’re using is high quality. Poor data quality equates directly to poor visibility of key trends and emerging fields.
Quality data enables staff and management teams to trust the accuracy of reports and analysis they are given. Without that confidence, apparent trends or new opportunities will always leave users wondering whether the picture presented by the data is accurate. With a complete and accurate view of the research landscape, though, comes the confidence to make well-informed business decisions and commit fully to strategic planning.
“Data quality and what we call a reliable data foundation is the first thing you need,” says Ramon Chen, chief marketing officer of the master data management firm Reltio. “If the data’s not accurate, you can’t connect the dots.”
Whether you’re seeking donations, hiring faculty or investing in research, quality data allows you to make the right decision with confidence.
What is quality data?
Not every piece of data is immediately identifiable. A seemingly simple matter – sorting out people's names – can present profound complications. In literature searches, for example, a persistent problem is author-name ambiguity. The name of a given author might be recorded in publications in different ways (e.g., with or without a middle initial), or two or more people might share the same name. Similarly, two unrelated research trends in completely different fields might share author names or key phrases, or two research fields could be so closely related that search engines might not be able to easily distinguish them.
There’s also an issue of noise. Low-quality data may assign irrelevant information an equal status with the information you want to analyze. This could give you a completely useless or even erroneous result. If you’re counting citations, for instance, without filtering citations in non-peer-reviewed publications, you’ll end up with a vastly different result than if you ensure you’re only counting citations in quality journals.
None of these is a unique situation, and quality data takes this into account. Quality data has been thoroughly “cleaned,” meaning that any analytical program will be able to quickly and easily identify which information is relevant in its analysis.
“If you just took one student, 'John Smith,' he’s in our course system, also recorded in a separate system showing payment for tuition, also indicated in another file as a son of a big donor – there’s no way for someone to gather that information and connect the dots,” says Chen. “The first thing you have to do is try to figure out whether it’s the same John Smith in all these systems, and to do that you have to clean the data.”
What can data do?
Data can help with virtually everything in university management. It can help identify which faculty to recruit, or decide between applicants for a job. It can help an institution more effectively solicit donations by analyzing donor activity and connections.
On the research level, it can help identify key research trends and leaders in emerging fields. It can help you compare your institution with others to learn how you can grow and improve. This can help with hiring, funding and PhD recruitment decisions.
“The beauty of a pool of data is that if you continuously maintain it, any time you want to ask it a question you’re not going and buying a separate application and reassembling facts,” says Chen. “You’re just adding the extra bit of insight you need to augment the question."
A database can only do this, though, if the data has been cleaned and curated. University growth requires strong strategy, building on strengths to improve the institution as a whole. Poor quality data makes that much harder, meaning you could end up using valuable resources in unproductive ways.