Task Description
See attached .epx Export to Stata (tested with V8 and V12) does not use Stata missing values for declared missing data (i.e. data matching one of the missing value label values). System missing fields are always exported properly and recognized as system missing in Stata. In this test file: missing value of 9 / 99 in a 1 / 2 digit integer has value 100 in Stata, while the value labels include large values as missing *** 100 is the largest legal number in byte data; first missing value (corresponding to Epidata system missing) is 101 missing value of 999 / 9999 in a 3 / 4 digit integer has value 32740, with the same large values as missing *** 32740 is the largest legal number in small integer data; first missing value (corresponding to Epidata system missing) is 324701 value labels have the value 132417700 for the missing values of (8 / 88 / 888 / 8888) and the value 132417701 for the missing values of (9/ 99 / 999 / 9999 ) With larger integers (5+ digits), the value labels for missing also follow this pattern, but at least the data are coded to match the new missing values. **The bug**: declared missing values in the data should match the value label for that type of missing data. While the user should be able to adapt to this by deleting the missing data values in analysis, it would make most sense to retain the original values. **The feature request**: perhaps as an option, transform all declared missing values to Stata's missing values that represent .a, .b, .c, etc. so for byte data, system missing remains 101, first missing value becomes 102 (.a), next becomes 103 (.b), etc. for small integers (3-4 digits), system missing remains 32741, first missing becomes 32742 (.a), etc. for long integers (5 and up), system and declared missing should match the .dta specification: maximum nonmissing +2,147,483,620 (0x7fffffe4) code for . +2,147,483,621 (0x7fffffe5) code for .a +2,147,483,622 (0x7fffffe6) code for .b +2,147,483,623 (0x7fffffe7) ... While value labels for floats are not exported, they can also follow the same pattern for missing values (e.g. properties of the field include a range and declared missing values) ----------- Output from Analysis after reading the Stata 8/9 export file ------------- .list v; Name Type Length Decimal Label Valuelabels Missing V1 Integer 3 0 1 digit 1 = OK 2 = OK too 134217700 = N/A 134217701 = Missing V2 Integer 3 0 2 digits 134217700 = N/A 134217701 = Missing 1 = OK 2 = OK too V3 Integer 5 0 3 digits 1 = OK 2 = OK too 134217700 = N/A 134217701 = Missing V4 Integer 5 0 4 digits 1 = OK 2 = OK too 134217700 = N/A 134217701 = Missing V5 Integer 10 0 5 digits 1 = OK 2 = OK too 134217700 = N/A 134217701 = Missing V6 Float 18 4 float 5.2 V7 Float 18 4 float 12.8 V8 String 207 0 text V9 Date (DMY) 10 0 date V10 Float 18 4 time V11 Integer 3 0 boolean .list vl; _V1 (Integer) Value Label Missing 1 OK 2 OK too 134217700 N/A 134217701 Missing _V2 (Integer) Value Label Missing 134217700 N/A 134217701 Missing 1 OK 2 OK too _V3 (Integer) Value Label Missing 1 OK 2 OK too 134217700 N/A 134217701 Missing _V4 (Integer) Value Label Missing 1 OK 2 OK too 134217700 N/A 134217701 Missing _V5 (Integer) Value Label Missing 1 OK 2 OK too 134217700 N/A 134217701 Missing .list data !vl; // data as read into Analysis; this matches exactly data as read into R Obs. No 1 digit 2 digits 3 digits 4 digits 5 digits float 5.2 float 12.8 text date time boolean 1 1 OK 1 OK 1 OK 1 OK 1 OK 1 1 abcdefg 24/10/2017 43200000 1 2 2 OK too 2 OK too 2 OK too 2 OK too 2 OK too 2 2 ccc 11/10/2017 43200000 0 3 100 100 32740 32740 134217700 N/A 99.99 999.99999999 . . . 4 . . . . 134217701 Missing . . . . . 5 . . . . . . . . . . |