From:  Larry Woods <Larry.Woods at ucop.edu>
To:  Ma, Ying <yma at ais.ucla.edu>
Cc:  Chet Burgess <Chet.Burgess at ucop.edu>, David.Walker at ucop.edu, David L. Wasley <David.Wasley at ucop.edu>, Wu, Albert <albertwu at ucla.edu>
Subject:  RE: Re: UCNetID integrity issue at UCLA
Date:  Wed, 05 May 2004 07:36:19 -0700


Hi Ying,

Thanks.  I will begin correcting the 17 employee records. 

Regarding the treatment of uppercase and lowercase hashed values: my processing is in UNIX and my code specified lowercase hashed ssn values to bypass (nine blanks, etc.)  It is probably a moot point now, since when we make the change in your student input file, all the hashed values will be lowercase.

Foreign students with nine blanks or nine zeroes as an ssn will still be assigned UC netids.  The processing will just not try to find a match on the hashed ssn if it is equivalent to nine blanks or zeroes.   Foreign students with blank or zero ssn's who later become UC employees and are assigned a real ssn will be assigned a new UC netid I think.   We'll just have to live with that.  The goal is to have one UC netid follow a person throughout their UC careers, from student to employee to retiree.  However, realistically, there will be some cases like these foreign students where this will not happen.

We'll discuss the student lowercase hashed ssn resolution later.

 --Larry

At 10:01 PM 5/4/2004 -0700, you wrote:

Hi Larry,

 

Attached is a spreadsheet about the 17 ucla employees who s been assigned two different netids. This is a portion of data in ftpusr4.get.udiremp as of yesterday. I have marked the correct ssn associated with each ucla_id in red based on my best knowledge. Please note there is one case where neither ssn is correct. However, today s employee udir file (ftpusr4.put.udirtele.daily) does contain the correct ssn for this particular ucla_id.

 

Thanks for explaining the cause of problem in such detail. I basically don t have any questions about your explanation, and have no problem with using lowercase hashed ssn and student ID in my student file.  However, I do have question about lowercase and uppercase hashed value being treated differently in the netid generation process. Since they are actually equivalent to the same value in plain text, regardless lowercased or uppercased, they should be processed based on the same rule.

 

Another issue I would like to bring up has to do with the nine-blank/zero SSNs in the student file. I haven t got a change to check out our student record system, but I assume many of these are international students who are not eligible to work in US. Since April 2002, international students are no longer eligible to apply for SSN unless they can prove they are eligible to work in US. Consequently, UCNetID will not cover these students even they are actually enrolled students here at UCLA. In the employee file, there are also around 50 records with nine-blank/zero SSNs. A quick lookup in our payroll records shows most of these are new employees from foreign countries. They are either waiting for a SSN being issued or their payroll records has not yet updated with the SSN they obtained after their starting date at UCLA.

 

Anyway, please let me know if you have any plan for the correction of current problematic record, and the mechanism to prevent future problems if possible. Thanks a lot!

 

Ying

 


From: Larry Woods
Sent: Monday, May 03, 2004 2:45 PM
To: Ma, Ying
Cc: Chet Burgess; David.Walker at ucop.edu; David L. Wasley
Subject: Fwd: Re: UCNetID integrity issue at UCLA


 


Ying,

I may not have been too clear when I was explaining the effect of the uppercase hashed ssn in the student processing.  Normally, a hashed ssn that equated to nine blanks, nine zeroes, or nine 9's would not cause the same netid to be assigned to everyone with that hashed ssn, because my student processing bypasses that step if the hashed ssn was equivalent to one of those values.  However, my code used the lowercase hashed ssn values in testing whether to bypass that step.  In UCLA's case, since the uppercase hashed ssn for nine blanks did not match my lowercase hashed ssn for nine blanks, my code went ahead and tried to find a match (and was successful) in my high-level root table.  Thus, you ended up with 1132 records with the same netid based on an early record that had a hashed ssn equivalent to nine blanks.

Actually, I'm not certain that my explanation clarified things or not.  Let me know if you still have questions.   We'll still have to coordinate later when we'll make the switch to using lowercase hashed ssn (and hashed student ID) in your student file.

 --Larry


Date: Mon, 03 May 2004 08:17:24 -0700
To: "Ma, Ying" <yma at AIS.UCLA.EDU>
From: Larry Woods <Larry.Woods at ucop.edu>
Subject: Fwd: Re: UCNetID integrity issue at UCLA
Cc: "David L. Wasley" <david.wasley at ucop.edu>, UCFEDAUTH-L at LISTSERV.UCOP.EDU, Chet.Burgess at ucop.edu, "Bruce James" <bruce.james at ucop.edu>, "Wu, Albert" <albertwu at ucla.edu>

Ying,

First, let me address the issue of 17 employees with two different netids.  Send me a list of those, and I can make corrections.  In all cases in the past, this has been becaues of an incorrect ssn being submitted by the campus at some point.  The employee processing tries to catch this, but can't in all cases.  I need to know the two netids, employee name, employee ID, and which netid has the correct ssn for the person (if you can determine that from your end).

The student netids are more problematic.  Since the registrars would not allow a clear ssn to be sent to us for students, we only get a 40-byte one-way hashed ssn for them, which makes trouble shooting difficult.   I see now that netid 582011 has a hashed ssn of 302369263F8C7E2B64B62E3307E164D7E77802BB, which is the equivalent of nine blanks.  Netid 585216 has a hashed ssn of 0F58D5A5515F1A8A9D179AA58858B67B2F8A3388, which equates to nine zeroes.  So, I assume that all of these student records have blank or zero ssn's.  If they were assigned blank or zero ssn's initially, but they now have real ssn's, then that's another issue that we will have to address.

Besides this, though, there is a problem with your student hashed ssn's.  Normally our student processing bypasses records where the hashed snn is equivalent to nine blanks, nine zeroes, nine 9's, and another unknown value that occurred many times (da39a3ee5e6b4b0d3255bfef95601890afd80709).  I see now that UCLA student records are being sent to us with the hashed ssn all uppercase.  All of the other campuses are sending the hashed ssn as lowercase, and the employee processing has lowercase hashed ssn's.  I just looked at the original specs for the student files.  The specs say that the SSN is to be passed as a 40-byte character field, one-way hashed value.  In the "General Notes" near the end of the specs, it does say that all alpha character data is to be passed as uppercase, but the hashed ssn in alphanumeric.   In any case, the upper case hashed ssn is the reason that we are assigning two netids to UCLA student employees: we can't match the uppercase student netid with the lowercase employee netids.

The problem of uppercase hashed ssn's on your student file should have been caught in the original testing, but wasn't.  I don't want you to change the student hashed ssn's to lowercase right now.  I will have to investigate how best to approach this problem.  In any case, student employees should not have two netids.  I will have to get back to you later about correcting the student hashed ssn's.

Let me know if you have questions about my explanation.  We can work together to straighten out the student employee records (but perhaps only for future student employees).

 --Larry


X-Sender: dwasley at popserv.ucop.edu
Date: Fri, 30 Apr 2004 17:23:52 -0700
To: "Ma, Ying" <yma at AIS.UCLA.EDU>
From: "David L. Wasley" <david.wasley at ucop.edu>
Subject: Re: UCNetID integrity issue at UCLA
Cc: larry.woods at ucop.edu, "Bruce James" <bruce.james at ucop.edu>,
        UCFEDAUTH-L at LISTSERV.UCOP.EDU

Wow.  That's awful!  Would you be willing to help us investigate how this might be happening?

I don't know if we receive UCLA_IDs as part of the nightly feed.  If we did, perhaps it would help disambiguate the identity matching heuristics.

Thanks for pointing this out.

        David Wasley
        IR&C
-----
At 4:06 PM -0700 on 4/30/04, Ma, Ying wrote:


Hello,
 
We are in the process of loading UCNetID into our Enterprise Directory for the purpose of UCFY/YBO Shibboleth pilot support. However, we have discovered serious integrity issue with UCNetID that we download from UCOP as the result of two daily processes:
 
-         FTPUSR4.GET.UDIREMP: Contains netid / ucla_id mapping for employees
-         FTPUSR4.GET.UDIRSTU: Contains netid / hashed ucla_id mapping for students
 
An analysis on these two files received as of yesterday shows: 
 
FTPUSR4.GET.UDIREMP - employee
-         17 ucla_ids are mapped to two different netids (BAD)
-         Each netid is mapped to exactly one ucla_id (GOOD)
 
FTPUSR4.GET.UDIRSTU - student
-         Each ucla_id is mapped to exactly one netid (GOOD)
-         Two netids are mapped to a number of different ucla_ids (VERY BAD). They are:
o
       netid 582011 is mapped to 1132 different ucla_ids
o       netid 585216 is mapped to 3 different ucla_ids
 
Union of the two files after cleaning up all one-to-many mappings stated above
-         13430 ucla_ids are mapped to two different netids (OK? - These are probably student employees at UCLA. Should they get two NetIDs?)
-         Each netid is mapped to exactly one ucla_id (GOOD)
 
For the purpose of this pilot, we probably don't care about students and that makes us in better position since the worst scenario seems to be related to the student NetIDs. With people owning two different netids, either due to some sort of miss-assignment or because they are both student and employee, we can store both in ED and have Shib return both netid values. This is just a wild suggestion as I'm not aware of any system wide standards defined on this UCNetID attribute (Is it going to be a required UC-wide attribute that must be included in campus person schema? Is there going to be a auxiliary class of ucPerson with a UCNetID attribute?) Anyway, I thought I'd better bring it to the attention of this group before this integrity issue gets into the way of future implementation.
 
Ying Ma
Administrative Information Systems
UCLA