Manager

Manager program.
Tasklist

FS#178 - Export to Stata - missing values incorrect

Attached to Project: Manager
Opened by Jamie Hockin (jhockin) - Tuesday, 24 October 2017, 19:29 GMT
Task Type Bug Report
Category Other
Status Assigned
Assigned To Torsten Bonde Christiansen (torstenchr)
Operating System Mac
Severity Medium
Priority Normal
Reported Version 4.2
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No

Details

See attached .epx

Export to Stata (tested with V8 and V12) does not use Stata missing values for declared missing data (i.e. data matching one of the missing value label values).

System missing fields are always exported properly and recognized as system missing in Stata.

In this test file:
missing value of 9 / 99 in a 1 / 2 digit integer has value 100 in Stata, while the value labels include large values as missing
*** 100 is the largest legal number in byte data; first missing value (corresponding to Epidata system missing) is 101
missing value of 999 / 9999 in a 3 / 4 digit integer has value 32740, with the same large values as missing
*** 32740 is the largest legal number in small integer data; first missing value (corresponding to Epidata system missing) is 324701
value labels have the value 132417700 for the missing values of (8 / 88 / 888 / 8888)
and the value 132417701 for the missing values of (9/ 99 / 999 / 9999 )

With larger integers (5+ digits), the value labels for missing also follow this pattern, but at least the data are coded to match the new missing values.

**The bug**: declared missing values in the data should match the value label for that type of missing data.

While the user should be able to adapt to this by deleting the missing data values in analysis, it would make most sense to retain the original values.

**The feature request**: perhaps as an option, transform all declared missing values to Stata's missing values that represent .a, .b, .c, etc.
so for byte data, system missing remains 101, first missing value becomes 102 (.a), next becomes 103 (.b), etc.
for small integers (3-4 digits), system missing remains 32741, first missing becomes 32742 (.a), etc.
for long integers (5 and up), system and declared missing should match the .dta specification:

maximum nonmissing +2,147,483,620 (0x7fffffe4)
code for . +2,147,483,621 (0x7fffffe5)
code for .a +2,147,483,622 (0x7fffffe6)
code for .b +2,147,483,623 (0x7fffffe7)
...
While value labels for floats are not exported, they can also follow the same pattern for missing values (e.g. properties of the field include a range and declared missing values)

----------- Output from Analysis after reading the Stata 8/9 export file -------------
.list v;
Name Type Length Decimal Label Valuelabels Missing
V1 Integer 3 0 1 digit 1 = OK
2 = OK too
134217700 = N/A
134217701 = Missing
V2 Integer 3 0 2 digits 134217700 = N/A
134217701 = Missing
1 = OK
2 = OK too
V3 Integer 5 0 3 digits 1 = OK
2 = OK too
134217700 = N/A
134217701 = Missing
V4 Integer 5 0 4 digits 1 = OK
2 = OK too
134217700 = N/A
134217701 = Missing
V5 Integer 10 0 5 digits 1 = OK
2 = OK too
134217700 = N/A
134217701 = Missing
V6 Float 18 4 float 5.2
V7 Float 18 4 float 12.8
V8 String 207 0 text
V9 Date (DMY) 10 0 date
V10 Float 18 4 time
V11 Integer 3 0 boolean



.list vl;
_V1 (Integer)
Value Label Missing
1 OK
2 OK too
134217700 N/A
134217701 Missing


_V2 (Integer)
Value Label Missing
134217700 N/A
134217701 Missing
1 OK
2 OK too


_V3 (Integer)
Value Label Missing
1 OK
2 OK too
134217700 N/A
134217701 Missing


_V4 (Integer)
Value Label Missing
1 OK
2 OK too
134217700 N/A
134217701 Missing


_V5 (Integer)
Value Label Missing
1 OK
2 OK too
134217700 N/A
134217701 Missing

.list data !vl; // data as read into Analysis; this matches exactly data as read into R
Obs. No 1 digit 2 digits 3 digits 4 digits 5 digits float 5.2 float 12.8 text date time boolean
1 1 OK 1 OK 1 OK 1 OK 1 OK 1 1 abcdefg 24/10/2017 43200000 1
2 2 OK too 2 OK too 2 OK too 2 OK too 2 OK too 2 2 ccc 11/10/2017 43200000 0
3 100 100 32740 32740 134217700 N/A 99.99 999.99999999 . . .
4 . . . . 134217701 Missing . . . . .
5 . . . . . . . . . .
This task depends upon

Loading...