Techniques on importing vertical column data into singular record sets.
Sample of original Data:
| Number |
Kid's name |
foster parent |
bio parents (1-2) |
| 1 |
Schmo, Joe Kid |
John Smith |
Eleanor Schmo |
|
dob: 9/22/90 |
43 Oleander Blvd |
99 Broadway |
|
EntranceDate: 4/30/06 |
Brooklyn, NY 11389 |
NY, NY |
|
AnotherDate: 7/7/05 |
|
|
|
|
|
Bill Schmo |
|
|
|
49 Elm St |
|
|
|
Grenwich, CT 89898 |
To distinguishing between foster and bio parent address fields,
I used Excel's autofil feature to quickly create delimiter columns:
| Number |
Kid's name |
|
foster parent |
|
bio parents (1-2) |
| 1 |
Schmo, Joe Kid |
FOSTER |
John Smith |
BIO |
Eleanor Schmo |
|
dob: 9/22/90 |
FOSTER |
43 Oleander Blvd |
BIO |
99 Broadway |
|
EntranceDate: 4/30/06 |
FOSTER |
Brooklyn, NY 11389 |
BIO |
NY, NY |
|
AnotherDate: 7/7/05 |
FOSTER |
|
BIO |
|
|
|
FOSTER |
|
BIO |
Bill Schmo |
|
|
FOSTER |
|
BIO |
49 Elm St |
|
|
FOSTER |
|
BIO |
Grenwich, CT 89898 |
|
|
FOSTER |
|
BIO |
|
Export to tab delimited:
1 TAB "Schmo, Joe Kid" TAB FOSTER TAB "John Smith" TAB BIO TAB "Eleanor Schmo"
TAB "dob: 9/22/90" TAB FOSTER TAB "43 Oleander Blvd" TAB BIO TAB "99 Broadway"
TAB "EntranceDate: 4/30/06" TAB FOSTER TAB "Brooklyn, NY 11389" TAB BIOTAB "NY, NY"
TAB "AnotherDate: 7/7/05" TAB FOSTER TAB TAB BIO TAB
TAB TAB FOSTER TAB TAB BIO TAB "Bill Schmo"
TAB TAB FOSTER TAB TAB BIO TAB "49 Elm St"
TAB TAB FOSTER TAB TAB BIO TAB "Grenwich, CT 89898"
TAB TAB FOSTER TAB TAB BIO TAB
Edit with Vim to remove superfluous end-of-line characters
and predictably distinguish records from each other:
1) Replace the last line of each set of
data with an
end-of-record cue. I chose "NEWRECORD" .
:%s/^IFOSTER^I^IBIO^I$/NEWRECORD/c
2) Remove all end-of-line characters:
:%s/\n//c
3) Restore end-of-line characters where
records end:
:%s/NEWRECORD/^M/gc
(Although
in mixed windows and linux environments end-of-line characters are
tricky. I think that to create end-of-line characters to read on a
windows system you want to use ^M.
I think it's something else in unix. But I can't seem to recall exactly
at the moment.
Remember that to identify tab or enter characters in vim, you must hit
CTRL-V
and then either the tab or enter buttons. The text turns blue then.)
This is the end of what I know I can safely manipulate with vim. I
*had* been working with re-importing the file into excel so I could
easily drag fields into alignment under the appropriate column headers
(foster Address1, bio Address2, etc). This part's a pain in the neck. I
just can't safely predict how many fields are used and the alignment is
terrible. There are all these extraneous notes everywhere. But once I have column headers established, it's simple to
import into MS access or save as tab-delimited and insert with sql.
I suppose I could further implement Excel's autofill to also attach an
increasing integer to give me an idea of what row I was on. (eg BIO1
BIO2, etc) Then I could insert BIO1 into the field for the primary
biological parent's name...and not concern myself with whether it was
the correct field or not. This way I could push this job back
onto them to move the data into the correct fields.