Future Directions
for the
University of California Corporate Information Systems
(Draft)
Information Resources & Communications
University of California
November 30, 1998
Table of Contents
This document describes a proposed new environment for the University of California Office of the President Corporate Systems and lays out an approach and timeline for implementing it. Deloitte & Touche, in their 1997 study of the Corporate Systems, recommended that the existing systems be replaced with a new data warehouse environment that enables end users to make better informed decisions by providing them with easy access to integrated, accurate, and timely information.
The vision of the new Corporate Data Warehouse environment includes data from at least eight subject areas: student applicants, students, employees, annuitants, financial and budget, contracts and grants, facilities, and equipment. It is anticipated that the campuses will be the primary source of data and that the input files to the current corporate systems can be used to populate the majority of the data warehouse. The task will be to identify data no longer needed, data missing or otherwise incomplete or inaccurate, data integrity problems across source systems, algorithms or techniques to clean up bad data, new data needed from campus systems, and new data needed from external sources.
The proposed Corporate Data Warehouse will integrate the subject areas on common related data fields and will significantly expand the amount of historical data maintained online.
The cornerstone of Corporate Data Warehouse will be a new relational database. Unlike the current systems, the architecture will be open, will be supported by a large number of 3rd party vendors so that "best of breed" query tools can be selected, and will employ industry standards. In order to provide easy access to information to UCOP staff for strategic decision making, the Corporate Data Warehouse will support a suite of query tools that will satisfy the needs of the majority of casual, power, and expert users. IR&C and the user community will evaluate tools and identify a common set to be supported. The Corporate Data Warehouse environment will include a central data dictionary to help all users understand the data and exploit the power the data warehouse.
The recommended approach for implementing the Corporate Data Warehouse is to develop a high level road map or model of the entire data warehouse and then build the pieces of it from the bottom up in a staged implementation. The goal will be to add value to some groups of users as soon as possible. It is recommended that the financial and budget subject area be the first or pilot project with the other subjects areas following.
As recognized by Deloitte & Touche in their study of the Corporate Systems, the implementation of the new Corporate Data Warehouse will require the support of management, IR&C, and the user community. The commitment of an appropriate level of resources will be necessary to ensure that the project succeeds.
The current Corporate Systems, developed in the early 1980's, are repositories of data that support the policy planning and evaluation, resource acquisition and allocation, and external reporting functions for which the of the Office of the President is responsible. The systems are widely depended upon to support the analytical and reporting needs of almost every department. They include the following:
Common data definitions are used at the point of interface between the campus systems and the corporate systems and, where possible, the same data elements, formats and definitions are used across systems. For each system there is a set of "Input Specifications" and data element definitions to which all campuses must adhere. Campuses are required to map local code values to the corporate definitions prior to transmitting files.
The systems contain the data determined to meet the corporate responsibilities of the Office of the President which, in most cases, is a subset of the data already collected by campuses for local operational needs. The quality of data received from the campuses has a high correlation to the degree to which the data is needed on the campuses. The input files from the campuses are edited to determine if the data is acceptable. If the data is not acceptable, the files must either be replaced or the data corrected in the following submission cycle depending on the system. For some of the systems, the edit process includes the review of a number of diagnostic reports which identify errors that are undetected by the edit.
A large number of standard hard copy reports are produced from the corporate systems. The reports are run in the production environment in batch mode and are distributed to users. With the exception of the equipment portion of the Corporate Equipment and Facilities System, the systems are available to authorized users for running ad hoc queries. FOCUS is the standard report generator. Detail and summary data can be downloaded to the user’s desktop for further manipulation.
In the summer of 1997 Information Resources & Communications engaged Deloitte & Touche to conduct an independent assessment of the corporate systems. Deloitte & Touche was specifically asked to evaluate the ease of access to the corporate data by management and staff for reporting and ad hoc retrieval of information; to assess whether the data currently captured in the systems meet the needs of Office of the President staff; and to recommend directions, including retrieval and reporting tools, that the University should explore to enhance the ability of the Office of the President to use corporate data more effectively in analysis and decision-making. The evaluation was conducted through a series of interviews with managers and analytical staff who are the primary users of the corporate data and with staff in Information Resources and Communications who are responsible for the maintenance, technology infrastructure, and data quality of the systems.
Deloitte & Touche completed the study in February 1998 and in it identified technology and process improvement findings and made recommendations to address these findings.
Technology Findings
Three of the six technology findings relate to the user interface to the corporate systems. Users and potential users of the systems find that the lack of an easy-to-use interface for querying and reporting, the inability to drill-up/down on corporate system information, and the lack a multi-dimensional querying capability are impediments and discourage them from using the systems. Users also feel that not enough historical information is maintained online and that the lack of integration between the corporate database limits the analyses possible and results in increased work.
Process Findings
Deloitte & Touche found that data quality in the corporate systems is generally good even though the University operates in a complex, decentralized systems environment. Users did identify specific areas where campuses fail to comply with the required data input specifications. Late or incomplete campus data submissions were also cited as a problem that results in delayed availability of data for analyses and reporting.
User documentation for the corporate systems is inadequate. Several of the databases are very complex with many data elements and users who do not frequently access the systems are often confused by this complexity. More documentation with business-like definitions, comments, and usage hints would be helpful to them.
There is user interest in making two additional databases available in the user access environment, the equipment database and the Corporate Personnel System Fiscal Year Database.
The final process improvement finding is the chargeback mechanism for access to the systems. Users state that charges for CPU cycles for accessing the data results in them either not using the systems to the fullest, delaying analyses by executing jobs at cheaper, off-hour rates, or in unnecessarily replicating large amounts of data just in case they might need it at a later date.
To address the technology and process findings, Deloitte & Touche recommended that, through a variety of short and long-term actions, the existing systems be replaced with a new data warehouse environment that consolidates data from the campuses into an integrated, reliable, and secure environment and that implements new business intelligence tools. IR&C has already taken steps to improve systems documentation and user training and continues its efforts to identify and resolve data quality issues. IR&C will review the recharge structure for the current and proposed systems.
Deloitte & Touche recognized that the steps to implement a new environment will require in-depth study of user data, access, and reporting requirements, analysis of data warehouse product offerings, the development of a detailed work plan, and the commitment of resources to develop the Corporate Data Warehouse.
This document, a step in this process, describes a proposed new environment and lays out a time-line and action plan for implementing it.
It is proposed that IR&C and the user community undertake the development activities necessary to migrate the current corporate information systems to a modern corporate data warehouse with the objective of enabling end users to make better informed decisions by providing them with easy access to integrated, accurate, and timely information.
A data warehouse is a read-only repository of corporate (enterprise-wide) information that has been extracted from transactional or operational systems for the purpose of analysis and reporting. By this definition, the current corporate systems comprise a data warehouse. Data in all of the systems has been extracted from campus operational systems and the systems are used exclusively for analysis and reporting. The systems, however, are not integrated, nor are they perceived as being easy to use. They reside in a proprietary database format (FOCUS) which precludes the introduction of simple, easy to use query tools. In contrast, a modern data warehouse environment generally includes a relational or multidimensional database in an open architecture that permits a wide variety of query tools to be selected and used.
A data warehouse includes many components that can be implemented in many ways. The components include:
A discussion of these components as they are envisioned in the new Corporate Data Warehouse follows.
The Corporate Data Warehouse will contain a broad range of subject areas that closely follows the subject areas of the current corporate systems. In addition, at least two new subject areas, annuitant data and equipment inventory data, will be required. The subject areas in the warehouse will closely match the functional organization of information on University activities. Once consultation with users has begun, additional subject areas may be identified.
Corporate Data Warehouse Subject Areas
| Student Applicants Demographics Academic History |
Students/Post Docs Demographics Performance Degrees Financial Aid Major Code Tables School Code Tables |
Employees Demographics Appointments Pay H&W Benefits Dependent Data Retirement Benefits Title Code Table Account/Fund Table |
Annuitants Demographics H&W Benefits Retirement Benefits |
| Financial & Budget Revenues Expenditures Plant Asset Budgeted $ & FTE Staffing Account/Fund Table |
Contracts & Grants Proposals Awards Account/Fund Table |
Facilities Buildings Rooms |
Equipment |
IR&C will be responsible for acquiring data and building, maintaining, and updating the Corporate Data Warehouse.
In most data warehouse implementations the identification and analysis of the sources of data is a major undertaking. In the case of implementing the Corporate Data Warehouse, the task will be far easier as it is anticipated that the input files to the current corporate systems can be used to populate the majority of the data warehouse. For the past fifteen years, the campuses have created these files by extracting data from their operational systems and IR&C has consolidated and edited the data from these 100+ source systems to build the corporate systems. The task will be to identify data no longer needed, data missing or otherwise incomplete or inaccurate, data integrity problems across source systems, algorithms or techniques to clean up bad data, new data needed from campus systems, and new data needed from external sources.
In the course of the Deloitte & Touche review, users identified a number of specific data items which should be added to the data warehouse. For example, the Contracts and Grants data should be expanded to include the identification of multiple principal/co-investigators with their employee numbers so that a link can be made to the personnel data. The proposal to present expanded information to the Board of Regents each year on budget and actual data by campus for all fund sources will require new information from the campuses. It is also possible that the financial subject area will need to be expanded to include data from the Treasurers' Investment System and the Endowment Database. Each of the proposed new data items will need to be evaluated and a source identified. If necessary, campuses will be consulted and revised data input specifications issued.
Sources of Data
| Subject Area | Source of Input |
| Student Applicants UG Applicant Grad Applicant School Code |
Campuses, ETS Campuses UCOP - School Code Table |
| Student/Post Doctoral Scholars 3rd Week Enrollment End of Term Enrollment Degrees Conferred Post Doctoral Scholars School Code Financial Aid Account/Fund Attributes |
Campuses Campuses Campuses Campuses UCOP - School Code Table Campuses Campuses |
| Employees Employee Activity Appointment Pay Distribution Dependents W-2 Academic Account/Fund Attributes Retirement Title Code |
Campuses, Labs Campuses Campuses, Labs Campuses, Labs Campuses, Labs Campuses, Labs Campuses, Labs Campuses Campuses UCOP - UCRS or BCS UCOP - TCS |
| Annuitants | UCOP - UCRS |
| Financial & Budget GL Balance Plant Asset Budget Staffing Account/Fund Attributes |
Campuses, UCOP Campuses Campuses Campuses Campuses |
| Contracts & Grants Proposals Awards Account/Fund Attributes |
Campuses Campuses Campuses |
| Facilities Buildings Rooms |
Campuses, UCOP Campuses, UCOP |
| Equipment | Campuses |
Data content refers to the level of detail and the amount of historical data that the Corporate Data Warehouse will contain for each broad subject area. The Corporate Data Warehouse will include data at the level of detail needed for analysis and reporting by UCOP staff. In most cases, at the lowest level this will be the same as in the current corporate systems. Applicants and enrolled students will be captured at the individual student level, employees at the pay distribution level for each employee, equipment at the item level, and facilities at the room level. Financial and budget data will be recorded at various levels depending on user requirements. For reporting efficiency and performance, the data warehouse will also contain commonly used aggregations of data.
The major differences between the current systems and the proposed Corporate Data Warehouse will be in the integration of the subject areas on common related data fields and in the significant expansion of the amount of historical data maintained online. The current systems are standalone databases and it is not possible for end users to easily link the databases on shared fields. Contract and grant project data in the Corporate Contracts and Grants database cannot be linked to expenditure data in the Corporate Financial System on the shared field "fund". The Corporate Data Warehouse will provide this type of integration. All of these interrelationships across the subject areas will need to be identified.
Users' needs will dictate the amount of historical information available. The following chart displays the current database cycles available and those proposed.
|
Subject Area |
Current |
Proposed |
| Student Applicants | ||
| Undergrad Admission | Current year + 2 prior Fall Term | Current year + 3 prior years |
| Grad Admission | Current Fall + 1 prior Fall | Current Fall + 1-5 prior Fall |
| Student/Post Docs | ||
| CSS Registration Database | Current year + 1 prior | Current year + 2 prior |
| CSS Financial Aid | Current year + 1 prior | Current year + 2 prior |
| CSS UG Longitudinal | 7 entering cohorts for 7 years | 7 entering cohorts for 7 years |
| CSS Grad Longitudinal | 12 entering cohorts for 12 years | 12 entering cohorts for 12 years |
| Employees | ||
| CPS October | 1 cycle | 5 cycles |
| CPS April | 1 cycle | 1 cycle |
| CPS Fiscal Year | Not available in User Access | Current year + previous year |
| CPS Monthly | Current month + previous month | Current month + 2-5 previous months |
| CPS Academic File | Not available in User Access | 2 years |
| CPS Dependent Data | Not available in User Access | Current month + 2-5 previous months |
| Annuitants | ||
| Membership | Current month | |
| Insurance | Current month | |
| Financial/Budget | ||
| CFS Master (GL Bal) | Current month + 2 prior quarters + 1 prior year | Current month + current year quarters and prior year quarters + 5 prior years |
| CFS Budget | Current + prior cycle | Current month + current year quarters and prior year quarters + 5 prior years |
| Staffing | Current year + prior year | Current year + prior year |
| Contracts & Grants | ||
| Proposals | Current year + prior year | Current year + prior year |
| Awards | Active awards until no activity for 5 years | Active awards until no activity for 5 years |
| Facilities | ||
| Facilities Database | 3 Fall cycles | 3 Fall cycles |
| Equipment | ||
| Equipment Database | Not available in User Access | 2 years |
The central data warehouse database will be the cornerstone of the Corporate Data Warehouse environment. Characteristics of the warehouse, its size, user ad hoc reporting and business requirements, and the UCOP computing environment, will drive the selection of the database/s. While the data warehouse is estimated to be small to medium in size, the database needs to be scalable. It should also be open in its architecture, be supported by a large number of 3rd party vendors so that "best of breed" tools can be selected, employ industry standards, and fit within the overall IR&C environment. It should support interactive query with rapid response for 15-30 concurrent users.
The database needs to also support "batch" processing in production mode of input file edits, database loads, and the running of standard reports. The needs of many users today are satisfied entirely or in part by standard hardcopy reports produced from the Corporate Systems. The Corporate Personnel System has 345 standard production reports produced on varying frequencies (monthly, quarterly, annually). Corporate Student, Financial, and Contracts & Grants are sources of large numbers of standard reports as well. The standard production reports will need to be reviewed and a determination made as to whether they are still required in hardcopy form. It is anticipated that the improved user access environment envisioned for the Corporate Data Warehouse will decrease the need for hardcopy standard production reports.
A careful review of the strengths and weaknesses of various relational database management systems and specialized multidimensional data warehouse products offered by vendors will be undertaken to find a suitable fit for the Corporate Data Warehouse.
The principal purpose of creating the Corporate Data Warehouse is to provide easy access to information to UCOP staff for strategic decision making. Since the data warehouse will have a broad range of subject areas and users across many departments, there will inevitably be different ways of looking at the data and different reporting requirements. It is highly unlikely that any single query tool on the market can handle the access needs of all users. The Corporate Data Warehouse will, therefore, support a suite of tools that will satisfy the needs of the majority of casual, power, and expert users.
The query or business intelligence tools available today fall into the following categories:
It is anticipated that the Corporate Data Warehouse environment will support a variety of access tools and provide a rich array of reports through a managed reporting or EIS environment. IR&C and the user community will evaluate tools and identify a common set to be supported. Tools selected should be easy to use with an intuitive, consistent interface, have a short learning curve, provide high performance, and be Web-enabled where possible. It will also be possible for users to choose other tools to meet specialized needs such as advanced statistical processing.
Metadata is data about the data in the warehouse. The Corporate Data Warehouse will contain hundreds of data elements in a complex database structure. The metadata repository will make up an online data dictionary or directory. It will provide interactive access to users to understand the content and to find data. The metadata repository will answer such questions as
What data does the warehouse contain?
Where did it come from?
How was it changed or mapped from the original source?
How has it changed over time?
When was it last updated?
It will also provide information about the data from a user perspective.
The Corporate Data Warehouse environment will include a central metadata repository to help all users exploit the power the data warehouse.
Several of the components of the Corporate Data Warehouse architecture, the sources, the database, the access tools, the metadata repository, have been discussed. A few other components need to be mentioned as they will add to the complexity of the environment.
The environment will include tools that extract and transform data from source files and load data into the database (ETL tools) and an administrative layer that will provide services such as security and reporting of usage and status.
At some point in the data warehouse implementation the environment might also include one or more "data marts." A data mart is a subset of data of the central data warehouse designed to support the needs of a specific department or set of users. For example, it may be appropriate to identify a subset of financial, student, and employee data to combine with budget data to support the needs of the Budget Office. A data mart is often tuned for specific reporting purposes and consists of aggregate or multidimensional data. A mart may physically reside in the same database as the central data warehouse or it may reside in a separate physical database. Data marts supporting the particular needs of UCOP departments could be created by either IR&C or by the departments themselves.

Many data warehouse implementations have failed because they employed a centralized top-down approach. Implementers attempted to collect the entire span of subject area information into a unified view. These implementations required a long time and large expenditures and end users gained no value until the entire warehouse was developed, if it ever was.
The recommended approach for the Corporate Data Warehouse is to develop a high level road map or data model of the entire data warehouse as it is envisioned and then build the pieces of it from the bottom up in a staged implementation. The corporate data model will identify the boundaries and linkages between subject areas, provide consistency, and will guide the development as various pieces are added. The goal will be to add value to some groups of users as soon as possible.
The financial and budget data as it is currently captured in the Corporate Financial System is a good candidate subject area for a first or pilot project for a number of reasons:
Once the financial/budget data has been added to the Corporate Data Warehouse, the staged implementation of other subject areas will begin.
As recognized by Deloitte & Touche in their study of the Corporate Systems, the implementation of the new Corporate Data Warehouse will require the support of management, IR&C, and the user community. The commitment of an appropriate level of resources will be necessary to ensure that the project succeeds.
The implementation of the Corporate Data Warehouse, like the original development of the corporate systems, is a major systems development project. The proposed implementation of a pilot project followed by the staged implementation of other subject areas will likely extend over several years. Implementation of the pilot application will provide experience with the database product, the extraction, transformation, and load tools, user query tools, and the development of a managed reporting environment. With this experience, realistic estimates can be made on the resources and time required to fully implement the Corporate Data Warehouse. The less complex subject areas, facilities, equipment, and contracts and grants, will take less time to develop than employee and student. A draft timeline for the project is included below.
|
Task |
Start Date |
End Date |
| Project Planning Phase | 11/2/98 | 1/1/99 |
|
12/1498 | 1/15/99 |
|
11/30/98 | 12/31/98 |
|
11/2/98 | 1/1/99 |
| Identify Data Warehouse Architecture | 11/3/98 | 3/19/99 |
|
11/3/98 | 11/30/98 |
|
11/2/98 | 1/1/99 |
|
1/4/99 | 2/26/99 |
|
12/3/98 | 3/19/99 |
| Stage 1 - Pilot Project | 12/14/98 | 10/1/99 |
|
12/14/98 | 12/31/98 |
|
12/14/98 | 1/15/99 |
|
1/18/99 | 6/24/99 |
|
1/4/99 | 4/30/99 |
|
2/2/99 | 4/30/99 |
|
5/3/99 | 6/4/99 |
|
3/1/99 | 7/30/99 |
|
5/3/99 | 9/3/99 |
|
8/3/99 | 10/1/99 |
| Stage 2 - 2nd Subject Area | 5/3/99 | 2/18/00 |
|
5/3/99 | 2/18/00 |
| Stage 3 - 3rd Subject Area | 10/1/99 | 7/20/00 |
|
10/1/99 | 7/20/00 |
| Stage 4 - 4th Subject Area | 3/1/00 | 12/19/00 |
|
3/1/00 | 12/19/00 |
| Stage 5 - 5th Subject Area | 10/2/00 | 4/30/01 |
|
10/2/00 | 4/30/01 |
| Stage 6- 6th Subject Area | 1/1/01 | 7/31/01 |
|
1/1/01 | 7/31/01 |