data governance
Master Data Management
The MDM Institute defines Data Governance as “the formal orchestration of people, process, and technology to enable an organization to leverage data as an enterprise asset.” Yet, despite the critical broken data pipe importance of Data Governance to an organization’s success, little support has been available from data product vendors with their legacy tools. This tool gap impedes even the most aggressive Data Governance initiatives as was noted by the MDM Institute in their Enterprise Master Data Management market review: “In fact, most vendors will point to their data steward console as the acme of their data governance capabilities. In reality, what’s needed are formal processes, assisted by workflow software, to enable formalized decision making, documentation, and delegation, regarding the rules rendered as part of the governance lifecycle. Another gaping hole in data governance capabilities of the majority of MDM vendors is their inability to directly store and execute such governance-generated procedures as part of the MDM logic that controls the software which in turn should enforce the governance. True data governance mandates the integration of people, process, and technologies via a formalized framework. These formal structures are inevitable as they are the key enablers of data governance policy functions much are so than paper-based methodologies and accelerator/frameworks.”
TECHi2 designed and built a high-performance innovative Enterprise Data Environment (EDE) as a fully integrated set of tools, processes, and specifications that process and manage data over a full lifecycle of data collection, transformation, integration, unification, and publishing in its SOA framework. Data quality is built into the EDE at every level and every working data pipe step of processing with direct traceability of technical functions to governance defined business processes and rules. It is built entirely with open architecture tools, methods, and specifications using modern proven high performance techniques for web clients, web services, application services, database server, metadata registries, metadata files, and secure data handling. It uses our new innovative technology that provides a single integrated semantic framework with high performance data integration and cleansing engine all tied together and using a single managed set of data specifications. The KORS™ semantic unification framework seamlessly organizes and combines the key knowledge, concepts, rules, metadata, and specifications of an organization into a scalable unified data and metadata system. Never before has the separate and complicated technologies of knowledge engineering, ontologies, rules, metadata, data models, and high-performance computing been brought together in a practical engineering solution.
We used our in-depth knowledge of the technologies used in legacy systems and newly available for SOA, metadata, and data engineering. The challenges of low data quality and expensive disjointed systems are not new; indeed, the DoD and industry have attempted to solve them with technology advances in databases, applications, and networks for decades. However, these attempts failed to realize their potential because they made a critical error. They relied on technology alone to solve what is inherently an integrated governance process and technology problem.
TECHi2 recognized this critical failure due to our years of service as system and EA developers but also as technical experts reviewing systems in many Federal agencies. We also used our extensive expertise in the technologies themselves arising from our key personnel who were scientists in these technology areas, and who worked with the leading research sponsors on these technologies as they were being matured. This afforded us with broad and deep knowledge of a new arsenal of technologies based on open standards and modular extensible methods that offer a good opportunity to solve the data quality and sharing problem in an affordable, maintainable framework that can evolve and extend as new requirements, technologies, and processes inevitably emerge.
The operational overview is shown in figure as an Enterprise Architecture (DODAF OV-1) diagram which highlights the SOA architecture but also shows the critical design strategy of linking technology, EDE OV-1 process, and governance throughout the EDE. Indeed, the EDE name was chosen to emphasize that it is not just technology. That is, it is an environment of integrated but modular technology tools operating according to authoritative rules specified by formally decreed organizational governance and business authorities. This is most clearly seen in the EDE’s explicit data QA/QC component in the OV-1 which acts as a quality gateway.
Key aspects of this technology are:
  • Governance based: Detailed guidance is collected from published documents and by working with governance groups following the EDE Data Unification process. These groups include the decreed authorities in each business domain as well as higher level organizational authorities. This guidance is converted into a set of operational rules that are documented in EA views and EDE specifications such as code encyclopedias and data dictionaries. All subsequent EDE work is traceable and aligned to these rules. >
  • Standards based: Only industry and Government standard techniques and tools are used in the EDE open architecture. These standards include: XML files, XML schemas, JavaScript Object Notation (JSON) data format, DHTML, HTTP and HTTPS protocols, Digital Object Identifiers (DOI), metadata schema, FIPS 140 encryption, and many others.
  • Conforms to Govt and industry policies and guidance: EDE follows relevant polices for data sharing, XML Naming and Design Rules, Security Technical Implementation Guides (STIG), Security Reference Architecture, Protection of Sensitive Agency Information, and others as directed by governance groups.
  • Open architecture: The EDE software architecture is built in modules and services, with internal functions using the same methods for data access that are exposed to external users. This allows additions and changes to methods through individual modules without complicated changes to the entire system. Even different programming languages can be used for different modules since they are linked in their compiled form thereby allowing seamless integration of C#, Java, and other languages. The data and metadata use standard storage and I/O methods (i.e. databases, XML files) with the details of their access hidden from the service customer so that they do not need to tie their code to the physical structure of EDE components. EDE functions are accessed through a service using the API that follows industry standards for object based software methods.
  • Scaleable: EDE uses scaleable techniques throughout and has been designed for high performance distributed web operations. EDE uses low overhead database (i.e. no complicated SQL predicates) and file commands to save and retrieve data/metadata yielding very high data I/O, which is critical in high performance transactional systems. This has been measured with a large multi-megabyte data load being processed round trip from web client to database back to web client in 200 msec. The transactional database connection time is minimized and the physical database connections are pooled per industry best practice ensuring that a high speed conduit is always available for a transaction without the slow opening and closing of network connections each time. Additionally, the web client request is transmitted using an AJAX callback which minimizes both the data load between the web client and server and the web page refresh time. For web scalability, the number of calls to the server is reduced to the minimum necessary to retrieve or save data by performing application logic on the web client itself with JavaScript application functions, which are programmed with high performance DHTML methods.
  • Unified data semantics: The TECHi2 EDE technology uses a data framework that semantically integrates an organization’s core knowledge into a unified data model with very little design and development cost through ontology templates that allow rapid adaptation to a specific organization and application domain. This is a major technology advance over standard data modeling methods that require extensive manual effort and typically cannot scale to a large organization’s need to unite many disparate but equally important functional perspectives. In fact, this is the single greatest source of failure in attempting to use older data technology in a SOA architectural approach. The conceptual model is mapped to the EDE logical data model which is an object based model using an entity-attribute-value (EAV) object structure providing repeatable, consistent, flexible, extensible, and comprehensive coverage of data requirements with direct traceability to governance rules and the conceptual model.
  • Secure: We built multiple security mechanisms into EDE in all three tiers (web client, application server, data server) and in network transmissions. All users are assigned one or more domain based roles that control their access and application privileges per business domain per application function, such as for workflow processing of authoritative data artifacts (e.g. data dictionaries). The application and database servers require encrypted (FIPS 140) transactional credentials placed in the web service SOAP header or protected server memory. Sensitive data, including all PII data, is stored encrypted and decrypted only after all access control gates have been passed on the server avoiding the risk of client side hacking of credentials. Network transmission is secured with a SSL connection.
  • Metadata repository/registry: The Integrated Metadata Repository (IMR) stores both rich semantic metadata in XML files, as required by policy and industry specifications for some types of content objects, and data information and artifacts like data dictionaries, code encyclopedias, and workflow status. The IMR conforms to standards. This metadata is accessible through a data access service method and can be viewed and edited in a portal.
  • Light-weight application services: The EDE design enables small but highly functional application modules offered as services. This builds on the foundation of integrated, harmonized data with reusable functions provided as units of business logic, which is a central approach in a SOA. We build the application services following well defined requirements and software engineering plans documented in DODAF BPMN and UML models. These application services can have sophisticated business logic with very small software footprints (typically <50kB for comparable functionality to a monolithic application of > 100MB).
  • Rules-based data quality processing: A major component of the TECHi2 EDE technology is the integration of comprehensive data quality processing. Data quality is a foundational part of EDE which is required to be performed on all data. The data quality process is part of the overall Data Unification Process with the rules defining what is acceptable and unacceptable collected from governance and business authorities and then placed in XML rules files. The EDE data quality engine uses these rules to confirm or modify data element values as they come into EDE, either in a transaction or batch mode, and then if passed store the value or if failed, produce an exception report. These rules can be sophisticated multi-variable analyses in contrast to the simple data quality processes found in most systems that merely compare spelling or perform table lookups to predefined transformation.