Future Directions

for the

University of California Corporate Information Systems

(Draft)

 

   

Information Resources & Communications
University of California

November 30, 1998

 

 

 

 Table of Contents

 


Summary

This document describes a proposed new environment for the University of California Office of the President Corporate Systems and lays out an approach and timeline for implementing it. Deloitte & Touche, in their 1997 study of the Corporate Systems, recommended that the existing systems be replaced with a new data warehouse environment that enables end users to make better informed decisions by providing them with easy access to integrated, accurate, and timely information.

The vision of the new Corporate Data Warehouse environment includes data from at least eight subject areas: student applicants, students, employees, annuitants, financial and budget, contracts and grants, facilities, and equipment. It is anticipated that the campuses will be the primary source of data and that the input files to the current corporate systems can be used to populate the majority of the data warehouse. The task will be to identify data no longer needed, data missing or otherwise incomplete or inaccurate, data integrity problems across source systems, algorithms or techniques to clean up bad data, new data needed from campus systems, and new data needed from external sources.

The proposed Corporate Data Warehouse will integrate the subject areas on common related data fields and will significantly expand the amount of historical data maintained online.

The cornerstone of Corporate Data Warehouse will be a new relational database. Unlike the current systems, the architecture will be open, will be supported by a large number of 3rd party vendors so that "best of breed" query tools can be selected, and will employ industry standards. In order to provide easy access to information to UCOP staff for strategic decision making, the Corporate Data Warehouse will support a suite of query tools that will satisfy the needs of the majority of casual, power, and expert users. IR&C and the user community will evaluate tools and identify a common set to be supported. The Corporate Data Warehouse environment will include a central data dictionary to help all users understand the data and exploit the power the data warehouse.

The recommended approach for implementing the Corporate Data Warehouse is to develop a high level road map or model of the entire data warehouse and then build the pieces of it from the bottom up in a staged implementation. The goal will be to add value to some groups of users as soon as possible. It is recommended that the financial and budget subject area be the first or pilot project with the other subjects areas following.

As recognized by Deloitte & Touche in their study of the Corporate Systems, the implementation of the new Corporate Data Warehouse will require the support of management, IR&C, and the user community. The commitment of an appropriate level of resources will be necessary to ensure that the project succeeds. 


Background

Current Systems

The current Corporate Systems, developed in the early 1980's, are repositories of data that support the policy planning and evaluation, resource acquisition and allocation, and external reporting functions for which the of the Office of the President is responsible. The systems are widely depended upon to support the analytical and reporting needs of almost every department. They include the following:

Common data definitions are used at the point of interface between the campus systems and the corporate systems and, where possible, the same data elements, formats and definitions are used across systems. For each system there is a set of "Input Specifications" and data element definitions to which all campuses must adhere. Campuses are required to map local code values to the corporate definitions prior to transmitting files.

The systems contain the data determined to meet the corporate responsibilities of the Office of the President which, in most cases, is a subset of the data already collected by campuses for local operational needs. The quality of data received from the campuses has a high correlation to the degree to which the data is needed on the campuses. The input files from the campuses are edited to determine if the data is acceptable. If the data is not acceptable, the files must either be replaced or the data corrected in the following submission cycle depending on the system. For some of the systems, the edit process includes the review of a number of diagnostic reports which identify errors that are undetected by the edit.

A large number of standard hard copy reports are produced from the corporate systems. The reports are run in the production environment in batch mode and are distributed to users. With the exception of the equipment portion of the Corporate Equipment and Facilities System, the systems are available to authorized users for running ad hoc queries. FOCUS is the standard report generator. Detail and summary data can be downloaded to the user’s desktop for further manipulation.

Deloitte & Touche Study

In the summer of 1997 Information Resources & Communications engaged Deloitte & Touche to conduct an independent assessment of the corporate systems. Deloitte & Touche was specifically asked to evaluate the ease of access to the corporate data by management and staff for reporting and ad hoc retrieval of information; to assess whether the data currently captured in the systems meet the needs of Office of the President staff; and to recommend directions, including retrieval and reporting tools, that the University should explore to enhance the ability of the Office of the President to use corporate data more effectively in analysis and decision-making. The evaluation was conducted through a series of interviews with managers and analytical staff who are the primary users of the corporate data and with staff in Information Resources and Communications who are responsible for the maintenance, technology infrastructure, and data quality of the systems.

Deloitte & Touche completed the study in February 1998 and in it identified technology and process improvement findings and made recommendations to address these findings.

To address the technology and process findings, Deloitte & Touche recommended that, through a variety of short and long-term actions, the existing systems be replaced with a new data warehouse environment that consolidates data from the campuses into an integrated, reliable, and secure environment and that implements new business intelligence tools. IR&C has already taken steps to improve systems documentation and user training and continues its efforts to identify and resolve data quality issues. IR&C will review the recharge structure for the current and proposed systems.

Deloitte & Touche recognized that the steps to implement a new environment will require in-depth study of user data, access, and reporting requirements, analysis of data warehouse product offerings, the development of a detailed work plan, and the commitment of resources to develop the Corporate Data Warehouse.

This document, a step in this process, describes a proposed new environment and lays out a time-line and action plan for implementing it.

The Proposed New Environment

It is proposed that IR&C and the user community undertake the development activities necessary to migrate the current corporate information systems to a modern corporate data warehouse with the objective of enabling end users to make better informed decisions by providing them with easy access to integrated, accurate, and timely information.

A data warehouse is a read-only repository of corporate (enterprise-wide) information that has been extracted from transactional or operational systems for the purpose of analysis and reporting. By this definition, the current corporate systems comprise a data warehouse. Data in all of the systems has been extracted from campus operational systems and the systems are used exclusively for analysis and reporting. The systems, however, are not integrated, nor are they perceived as being easy to use. They reside in a proprietary database format (FOCUS) which precludes the introduction of simple, easy to use query tools. In contrast, a modern data warehouse environment generally includes a relational or multidimensional database in an open architecture that permits a wide variety of query tools to be selected and used.

A data warehouse includes many components that can be implemented in many ways. The components include:

A discussion of these components as they are envisioned in the new Corporate Data Warehouse follows.

Subject Areas

The Corporate Data Warehouse will contain a broad range of subject areas that closely follows the subject areas of the current corporate systems. In addition, at least two new subject areas, annuitant data and equipment inventory data, will be required. The subject areas in the warehouse will closely match the functional organization of information on University activities. Once consultation with users has begun, additional subject areas may be identified.

Corporate Data Warehouse Subject Areas

Student Applicants

Demographics
Academic History
Students/Post Docs

Demographics
Performance
Degrees
Financial Aid
Major Code Tables
School Code Tables
Employees

Demographics
Appointments
Pay
H&W Benefits
Dependent Data
Retirement Benefits
Title Code Table
Account/Fund Table
Annuitants

Demographics
H&W Benefits
Retirement Benefits
Financial & Budget

Revenues
Expenditures
Plant Asset
Budgeted $ & FTE
Staffing
Account/Fund Table
Contracts & Grants

Proposals
Awards
Account/Fund Table
Facilities

Buildings
Rooms
Equipment

IR&C will be responsible for acquiring data and building, maintaining, and updating the Corporate Data Warehouse.

Sources of Data

In most data warehouse implementations the identification and analysis of the sources of data is a major undertaking. In the case of implementing the Corporate Data Warehouse, the task will be far easier as it is anticipated that the input files to the current corporate systems can be used to populate the majority of the data warehouse. For the past fifteen years, the campuses have created these files by extracting data from their operational systems and IR&C has consolidated and edited the data from these 100+ source systems to build the corporate systems. The task will be to identify data no longer needed, data missing or otherwise incomplete or inaccurate, data integrity problems across source systems, algorithms or techniques to clean up bad data, new data needed from campus systems, and new data needed from external sources.

In the course of the Deloitte & Touche review, users identified a number of specific data items which should be added to the data warehouse. For example, the Contracts and Grants data should be expanded to include the identification of multiple principal/co-investigators with their employee numbers so that a link can be made to the personnel data. The proposal to present expanded information to the Board of Regents each year on budget and actual data by campus for all fund sources will require new information from the campuses. It is also possible that the financial subject area will need to be expanded to include data from the Treasurers' Investment System and the Endowment Database. Each of the proposed new data items will need to be evaluated and a source identified. If necessary, campuses will be consulted and revised data input specifications issued.

Sources of Data

Subject Area Source of Input
Student Applicants
UG Applicant
Grad Applicant
School Code

Campuses, ETS
Campuses
UCOP - School Code Table
Student/Post Doctoral Scholars
3rd Week Enrollment
End of Term Enrollment
Degrees Conferred
Post Doctoral Scholars
School Code
Financial Aid
Account/Fund Attributes

Campuses
Campuses
Campuses
Campuses
UCOP - School Code Table
Campuses
Campuses
Employees
Employee
Activity
Appointment
Pay Distribution
Mail
Dependents
W-2
Academic
Account/Fund Attributes
Retirement
Title Code

Campuses, Labs
Campuses
Campuses, Labs
Campuses, Labs
Campuses, Labs
Campuses, Labs
Campuses, Labs
Campuses
Campuses
UCOP - UCRS or BCS
UCOP - TCS
Annuitants UCOP - UCRS
Financial & Budget
GL Balance
Plant Asset
Budget
Staffing
Account/Fund Attributes

Campuses, UCOP
Campuses
Campuses
Campuses
Campuses
Contracts & Grants
Proposals
Awards
Account/Fund Attributes

Campuses
Campuses
Campuses
Facilities
Buildings
Rooms

Campuses, UCOP
Campuses, UCOP
Equipment Campuses

 

Data Content

Data content refers to the level of detail and the amount of historical data that the Corporate Data Warehouse will contain for each broad subject area. The Corporate Data Warehouse will include data at the level of detail needed for analysis and reporting by UCOP staff. In most cases, at the lowest level this will be the same as in the current corporate systems. Applicants and enrolled students will be captured at the individual student level, employees at the pay distribution level for each employee, equipment at the item level, and facilities at the room level. Financial and budget data will be recorded at various levels depending on user requirements. For reporting efficiency and performance, the data warehouse will also contain commonly used aggregations of data.

The major differences between the current systems and the proposed Corporate Data Warehouse will be in the integration of the subject areas on common related data fields and in the significant expansion of the amount of historical data maintained online. The current systems are standalone databases and it is not possible for end users to easily link the databases on shared fields. Contract and grant project data in the Corporate Contracts and Grants database cannot be linked to expenditure data in the Corporate Financial System on the shared field "fund". The Corporate Data Warehouse will provide this type of integration. All of these interrelationships across the subject areas will need to be identified.

Users' needs will dictate the amount of historical information available. The following chart displays the current database cycles available and those proposed.

Subject Area

Current

Proposed

Student Applicants
Undergrad Admission Current year + 2 prior Fall Term Current year + 3 prior years
Grad Admission Current Fall + 1 prior Fall Current Fall + 1-5 prior Fall
Student/Post Docs
CSS Registration Database Current year + 1 prior Current year + 2 prior
CSS Financial Aid Current year + 1 prior Current year + 2 prior
CSS UG Longitudinal 7 entering cohorts for 7 years 7 entering cohorts for 7 years
CSS Grad Longitudinal 12 entering cohorts for 12 years 12 entering cohorts for 12 years
Employees
CPS October 1 cycle 5 cycles
CPS April 1 cycle 1 cycle
CPS Fiscal Year Not available in User Access Current year + previous year
CPS Monthly Current month + previous month Current month + 2-5 previous months
CPS Academic File Not available in User Access 2 years
CPS Dependent Data Not available in User Access Current month + 2-5 previous months
Annuitants
Membership Current month  
Insurance Current month  
Financial/Budget
CFS Master (GL Bal) Current month + 2 prior quarters + 1 prior year Current month + current year quarters and prior year quarters + 5 prior years
CFS Budget Current + prior cycle Current month + current year quarters and prior year quarters + 5 prior years
Staffing Current year + prior year Current year + prior year
Contracts & Grants
Proposals Current year + prior year Current year + prior year
Awards Active awards until no activity for 5 years Active awards until no activity for 5 years
Facilities
Facilities Database 3 Fall cycles 3 Fall cycles
Equipment
Equipment Database Not available in User Access 2 years

 

Database

The central data warehouse database will be the cornerstone of the Corporate Data Warehouse environment. Characteristics of the warehouse, its size, user ad hoc reporting and business requirements, and the UCOP computing environment, will drive the selection of the database/s. While the data warehouse is estimated to be small to medium in size, the database needs to be scalable. It should also be open in its architecture, be supported by a large number of 3rd party vendors so that "best of breed" tools can be selected, employ industry standards, and fit within the overall IR&C environment. It should support interactive query with rapid response for 15-30 concurrent users.

The database needs to also support "batch" processing in production mode of input file edits, database loads, and the running of standard reports. The needs of many users today are satisfied entirely or in part by standard hardcopy reports produced from the Corporate Systems. The Corporate Personnel System has 345 standard production reports produced on varying frequencies (monthly, quarterly, annually). Corporate Student, Financial, and Contracts & Grants are sources of large numbers of standard reports as well. The standard production reports will need to be reviewed and a determination made as to whether they are still required in hardcopy form. It is anticipated that the improved user access environment envisioned for the Corporate Data Warehouse will decrease the need for hardcopy standard production reports.

A careful review of the strengths and weaknesses of various relational database management systems and specialized multidimensional data warehouse products offered by vendors will be undertaken to find a suitable fit for the Corporate Data Warehouse.

Access Tools

The principal purpose of creating the Corporate Data Warehouse is to provide easy access to information to UCOP staff for strategic decision making. Since the data warehouse will have a broad range of subject areas and users across many departments, there will inevitably be different ways of looking at the data and different reporting requirements. It is highly unlikely that any single query tool on the market can handle the access needs of all users. The Corporate Data Warehouse will, therefore, support a suite of tools that will satisfy the needs of the majority of casual, power, and expert users.

The query or business intelligence tools available today fall into the following categories:

It is anticipated that the Corporate Data Warehouse environment will support a variety of access tools and provide a rich array of reports through a managed reporting or EIS environment. IR&C and the user community will evaluate tools and identify a common set to be supported. Tools selected should be easy to use with an intuitive, consistent interface, have a short learning curve, provide high performance, and be Web-enabled where possible. It will also be possible for users to choose other tools to meet specialized needs such as advanced statistical processing.

Metadata

Metadata is data about the data in the warehouse. The Corporate Data Warehouse will contain hundreds of data elements in a complex database structure. The metadata repository will make up an online data dictionary or directory. It will provide interactive access to users to understand the content and to find data. The metadata repository will answer such questions as

It will also provide information about the data from a user perspective.

The Corporate Data Warehouse environment will include a central metadata repository to help all users exploit the power the data warehouse.

Architecture

Several of the components of the Corporate Data Warehouse architecture, the sources, the database, the access tools, the metadata repository, have been discussed. A few other components need to be mentioned as they will add to the complexity of the environment.

The environment will include tools that extract and transform data from source files and load data into the database (ETL tools) and an administrative layer that will provide services such as security and reporting of usage and status.

At some point in the data warehouse implementation the environment might also include one or more "data marts." A data mart is a subset of data of the central data warehouse designed to support the needs of a specific department or set of users. For example, it may be appropriate to identify a subset of financial, student, and employee data to combine with budget data to support the needs of the Budget Office. A data mart is often tuned for specific reporting purposes and consists of aggregate or multidimensional data. A mart may physically reside in the same database as the central data warehouse or it may reside in a separate physical database. Data marts supporting the particular needs of UCOP departments could be created by either IR&C or by the departments themselves.

 

Implementation Plan

Approach

Many data warehouse implementations have failed because they employed a centralized top-down approach. Implementers attempted to collect the entire span of subject area information into a unified view. These implementations required a long time and large expenditures and end users gained no value until the entire warehouse was developed, if it ever was.

The recommended approach for the Corporate Data Warehouse is to develop a high level road map or data model of the entire data warehouse as it is envisioned and then build the pieces of it from the bottom up in a staged implementation. The corporate data model will identify the boundaries and linkages between subject areas, provide consistency, and will guide the development as various pieces are added. The goal will be to add value to some groups of users as soon as possible.

The financial and budget data as it is currently captured in the Corporate Financial System is a good candidate subject area for a first or pilot project for a number of reasons:

Once the financial/budget data has been added to the Corporate Data Warehouse, the staged implementation of other subject areas will begin.

As recognized by Deloitte & Touche in their study of the Corporate Systems, the implementation of the new Corporate Data Warehouse will require the support of management, IR&C, and the user community. The commitment of an appropriate level of resources will be necessary to ensure that the project succeeds.

Timeline

The implementation of the Corporate Data Warehouse, like the original development of the corporate systems, is a major systems development project. The proposed implementation of a pilot project followed by the staged implementation of other subject areas will likely extend over several years. Implementation of the pilot application will provide experience with the database product, the extraction, transformation, and load tools, user query tools, and the development of a managed reporting environment. With this experience, realistic estimates can be made on the resources and time required to fully implement the Corporate Data Warehouse. The less complex subject areas, facilities, equipment, and contracts and grants, will take less time to develop than employee and student. A draft timeline for the project is included below.

Task

Start Date

End Date

Project Planning Phase 11/2/98 1/1/99
    Approval "Vision" & Implementation Plan
12/1498 1/15/99
    Identify Staff Resources
11/30/98 12/31/98
    Develop Enterprise Data Model
11/2/98 1/1/99
Identify Data Warehouse Architecture 11/3/98 3/19/99
    Determine Overall Architecture & Flow
11/3/98 11/30/98
    Select Database Vendor and ETL Tools
11/2/98 1/1/99
    Install Database Products
1/4/99 2/26/99
    Complete Training on Vendor Products
12/3/98 3/19/99
Stage 1 - Pilot Project 12/14/98 10/1/99
    Identify and Confirm Pilot Application
12/14/98 12/31/98
    Identify Primary User Group
12/14/98 1/15/99
    Identify User Requirements
1/18/99 6/24/99
    Design Data Repositories
1/4/99 4/30/99
    Evaluate User Access Tools
2/2/99 4/30/99
    Load Data Repositories for Testing
5/3/99 6/4/99
    Develop EIS Reporting
3/1/99 7/30/99
    Test User Access Tools and EIS
5/3/99 9/3/99
    User Training
8/3/99 10/1/99
Stage 2 - 2nd Subject Area 5/3/99 2/18/00
    Implementation of 2nd Subject Area
5/3/99 2/18/00
Stage 3 - 3rd Subject Area 10/1/99 7/20/00
    Implementation of 3rd Subject Area
10/1/99 7/20/00
Stage 4 - 4th Subject Area 3/1/00 12/19/00
    Implementation of 4th Subject Area
3/1/00 12/19/00
Stage 5 - 5th Subject Area 10/2/00 4/30/01
    Implementation of 5th Subject Area
10/2/00 4/30/01
Stage 6- 6th Subject Area 1/1/01 7/31/01
    Implementation of 6th Subject Area
1/1/01 7/31/01



November 30, 1998