Time
S
Nick
Message
06:42
Virgile joined #dataverse
07:55
juancorr joined #dataverse
11:01
Virgile joined #dataverse
11:18
Virgile joined #dataverse
12:20
yoh joined #dataverse
12:26
donsizemore joined #dataverse
14:31
pdurbin joined #dataverse
14:50
donsizemore
@pdurbin good morning
14:52
donsizemore
@pdurbin you know how you don't like marathon commits?
14:53
pameyer joined #dataverse
14:59
donsizemore
O ye @pdurbin @pameyer hive mind. To export a dataset I'm calling /api/datasets/export?exporter=dataverse_json&persistentId=doi:10.5072/FK2/AT8YFM if this looks right?
15:00
donsizemore
then I call http://guides.dataverse.org/en/latest/api/native-api.html#import-a-dataset-into-a-dataverse (which, I'm going to submit a PR to fix that example)
15:00
donsizemore
which dies "Error parsing dataset json. Json:"
15:02
pameyer
my first thought is that it might be a upload-file vs post body thing - I get those mixed up pretty frequently
15:02
pameyer
might be a dumb question - why go through the exporter vs the native API GET?
15:02
donsizemore
that's the PR part. the example says upload file but -F data= seems to send it
15:03
donsizemore
to answer your question, because I don't find instructions on how to do this, so I'm stumbling step by step
15:05
donsizemore
API GET produces similar-looking JSON at a glance and import throws the same error: Unexpected char 45 at (line no=1, column no=2, offset=1)
15:13
pdurbin
error parsing dataset... that's no good
15:14
donsizemore
the exported dataverse_json is totally different from the desired example import JSON in the guide
15:14
pdurbin
I think the import docs provide a sample JSON file.
15:14
pdurbin
Is it XML ?
15:15
* pdurbin
ducks
15:15
pdurbin
They should at least resemble each other. Cousins if not siblings.
15:17
donsizemore
stripping up to the leading {"datasetVersion seemed like a good first step but import dies on line 1 col 2
15:17
pdurbin
Can you run the JSON through jq to make sure it's valid?
15:18
pdurbin
(or similar)
15:18
pdurbin
cat foo.json | jq .
15:18
donsizemore
looks right to me
15:18
donsizemore
just the ordering of elements is totally different from what import wants
15:19
pdurbin
well, order doesn't matter in json
15:19
pameyer
I woudln't have thought that order of elements in a json dictionary would make a difference
15:19
pameyer
... messages crossing
15:20
pdurbin
I document the stripping out of datasetVersion in the docs on how to edit a dataset via the native API . It's a pain.
15:20
pdurbin
(Which is partially why we now support partial edits.)
15:21
pdurbin
donsizemore: for fun do you want to try importing the sample file from the docs? Maybe import is broken.
15:21
donsizemore
oh oh if i cat the export-dataverse_json through jq, redirect that into a file and call it with --upload-file i get further (but now a ton of null-pointers)
15:22
* pameyer
looking for my import scripts
15:22
donsizemore
java.lang.IllegalStateException: During synchronization a new object was found through a relationship that was not marked cascade PERSIST: [Dataset id:261 ].
15:23
donsizemore
and max(id) is 260
15:23
donsizemore15 joined #dataverse
15:23
pdurbin
yikes
15:24
donsizemore15
i stole Danny's dataset from https://dataverse5.odum.unc.edu/dataset.xhtml?persistentId=doi:10.5072/FK2/AT8YFM
15:24
pdurbin
If you can get the API to emit a 500 error, that's a paddlin'. Please feel free to create an issue. And please upload a file that produces it.
15:25
donsizemore15
and am trying to import it into an EC2 instance I was using to test those dataverse-ansible merges, just so I can tell Thu-Mai and Jon whether we can give a group of folks their datasets once they stand up Dataverse #64(!)
15:26
pdurbin
Mmmm, Dataverse #64.
15:26
pdurbin
donsizemore15: wait, did you try the sample file from the docs?
15:28
donsizemore15
i will, lemme make a dummy storage identifier to match
15:29
pdurbin
k
15:29
pameyer
definately was --upload-file
15:30
donsizemore15
that succeeds
15:31
donsizemore15
so, leonid thinks I should export with export?exporter=dataverse_json (which was the first thing I did)
15:31
donsizemore15
if sample.json succeeds and dataverse_json doesn't... does that mean an issue?
15:32
pdurbin
Maybe the issue is about how to get the equivalent of sample.json?
15:32
donsizemore15
that was my question.
15:33
donsizemore15
I've tried Native API GET and export?exporter=dataverse_json. If someone could (at some point) steer me toward the proper workflow I'll be happy to write this up in the guides for future generation
15:33
donsizemore15
today I mostly want to know what to tell Jon and Thu-Mai
15:36
pdurbin
Well, there are two ways to export that JSON . The most straightforward way is to click on JSON in the UI under Metadata exports. The other way is documented in the API Guide somewhere. I'm not sure if they produce the same exact output or not.
15:37
donsizemore15
they're off by 9 characters.
15:40
pdurbin
:)
15:41
pdurbin
Off By 9 Characters is the name of my band!
15:42
pameyer
another dumb question - `curl $exporter_apo | jq .datasetVersion > importer_input.json` doesn't do the trick?
15:43
pameyer
I'm not sure if the top-level "citation" key under datasetVersion would cause problems for the import api or not; but I'd guess it wouldn't be needed
15:43
donsizemore15
i tried stripping down to datasetVersion, will try again with your command after lunch (and thanks)
15:44
pameyer
good luck (and enjoy lunch)
15:45
pameyer
I found my importer stuff; but the input json was a custom endpoint rather than one of the dataverse ones - so may not be as helpful here as it could be
16:26
donsizemore15
it occurred to me over lunch I wasn't setting Content-Type: application/json
16:41
donsizemore15
@pameyer trying to import the json produced by 'jq .datasetVersion' throws a 500 error
16:51
donsizemore15
yah cat sample.json |jq .datasetVersion |jq -r 'keys[]' |wc -l 7
16:51
donsizemore15
yee cat D8BEAL_full.json |jq .datasetVersion |jq -r 'keys[]' |wc -l 16
17:02
donsizemore15
removed the citation contents and still got java.lang.IllegalStateException: During synchronization a new object was found through a relationship that was not marked cascade PERSIST: [Dataset id:272 ].
17:19
pdurbin
woof
17:19
pdurbin
So what are you gonna tell Jon and Thu-Mai?
17:22
pameyer
cat 499-export_f.json | jq -r 'keys[]' |wc -l -> 1
17:23
pameyer
so my guess is that my initial `jq .datasetVersion` bit was wrong
17:26
donsizemore15
your .datasetVersion prints the sub-keys. the example and my "full" dataverse_json export (which Leonid says is the best JSON Dataverse can export) were closer, key-wise, though wildly disparate
17:27
pameyer
yeah; it was bad - definately needs the toplevel datasetVersion key
17:30
pameyer
will take a bit for my dev environment to spin up to see if the old json still works - I'd guess it would, but don't know for sure
17:34
donsizemore15
you don't need to go to that trouble
17:35
pameyer
_shouldn't_ be too much trouble
17:37
pdurbin
famous last words
17:43
pameyer
yeah
17:47
donsizemore15
https://www.hashicorp.com/blog/announcing-waypoint
17:49
pdurbin
Developers *do* just want to deploy.
18:18
pameyer
once I cut the custom block, my old examples still work on 7112-payara5_aio-5a77f49f7
18:18
pameyer
so not the latest code, but not too far from develop in between 5.0 and 5.1
18:19
pdurbin
that's good
18:38
nightowl313 joined #dataverse
18:39
nightowl313
hi all ... gotta a question for anyone that can answer ... we just made our dataverse live and we already have a request for adding a large amount of data (ie: 10T or more) ... is the best method of getting that data in fast to use direct upload?
18:41
pdurbin
10T is a lot but yes, direct upload to S3, I'd think.
18:42
donsizemore15
@nightowl313 have them upgrade to RoadRunner extra-zippy first!
18:42
nightowl313
eeks .. wasn't quire ready to just jump in with that much data but we do have the request ... now I just need to get direct upload working :-|
18:43
nightowl313
haha don will ask them to!
18:43
donsizemore15
if you're already in S3, you turn on upload-direct as a jvm-option and you'll need to set CORS on the bucket
18:43
nightowl313
has any other org used it for this amount of data? probably won't be all that at first, but probably a lot
18:43
donsizemore15
http://guides.dataverse.org/en/latest/developers/big-data-support.html?highlight=upload%20direct#s3-direct-upload-and-download
18:44
nightowl313
not sure yet who is paying for it but we're working on that
18:44
donsizemore15
I understand Harvard has 43TB+
18:44
nightowl313
i've done all of this except for the cors ...
18:45
nightowl313
i think you were helping me with that before! I gave up just to get dataverse installed .. time to pick it back up very quickly!
18:45
donsizemore15
for some reason cors keeps getting removed from my test bucket... haven't pinned down how that's happening but i don't own the bucket so
18:48
pdurbin
nightowl313: especially since pameyer is here I'd feel remiss if I didn't point out that another option for big data is rsync. Not sure if you've seen the docs on that. They're in the dev guide.
18:49
nightowl313
oh ... of course ... I have used rclone ... but, if I transfer from the current source to the s3 bucket using rclone .. how does it then get associated with a dataset?
18:51
pameyer
pdurbin: I appreciate the hat tip for rsync, but for an s3 dataverse install and 10TB of data it might not be the way to go (would need 10TB temp space for the DCM, and _might_ need public-only - I don't remember if s3/DCM requires that or not)
18:52
pdurbin
I don't remember either. I don't think anyone is using that combo.
18:52
pameyer
and 10TB of client-side checksum calculations will take a while
18:53
pdurbin
yeah
18:53
pdurbin
nightowl313: maybe just stick with direct to s3 upload :)
18:54
pameyer
I'm out of the loop for direct s3 upload, but if that lets you use the aws cli and s3 sync, I'd go with that over a browser
18:55
pdurbin
It's supported by DVUploader, which is a command line app.
18:58
pameyer
I should, but don't, remember if that's got a sync primitive
19:02
pdurbin
Hmm, dunno. I'm sure it's calling S3 APIs under the hood. I could try to summon Jim if you want.
19:03
nightowl313
thinking since this is our first dataverse of all ... we will ask the research team to take it slow with uploading things ... i think they are willing to work with us to figure things out
19:03
nightowl313
but would love direction on the best course to try with it ... i'm such a newbie! this is all exciting and scary
19:04
pdurbin
Published research data! It's a thing! :)
19:06
nightowl313
=)
19:09
pameyer
pdurbin: thanks, but not important enough to try a Jim summoning - just curious
19:11
pdurbin
Ok. He's in Slack these days. Was letting him know earlier that donsizemore15 broke his pull request. :)
19:11
pameyer
nightowl313: having a testing/staging setup is very helpful for figuring out "am I going to break production with this" type things
19:13
pameyer
I'm not sure if it's something you have available or not (sometimes it's more trouble than it's worth); but it was an unexpected thing I learned a while back
19:13
nightowl313
so just reading through all of this ... I should configure S3 direct upload and then use DVUploader (which utilizes command line utilites such as cli and rclone ) to work with the direct upload?
19:14
nightowl313
we do have a staging site... will def try there first!
19:14
nightowl313
I told the research team it will be a few days before we are ready to start uploading anything
19:14
pdurbin
With direct S3 upload you can use either DVUploader or the normal Dataverse web interface.
19:16
nightowl313
ah okay ... looking at how it works now .. thank you all! so thankful for this chat/community
19:17
pdurbin
Sure. And it's been nice to see you on community calls.
19:18
nightowl313
i hope i can help with answering some of them some day! for now I am just a taker! lol
19:20
pdurbin
nah, you're good
19:21
pdurbin
We'd be bored without the questions. :)
19:23
nightowl313
oh i'm sure I'll have a lot more! =)
19:26
pdurbin
You've made it so I don't have to answer my standard question... When do you want your installation on the map? ... Thanks for that.
19:26
* pdurbin
looks at pameyer
19:27
nightowl313
i think it's on there! yay! We added it last week! that was the most exciting part of this for me!
19:28
pdurbin
Please feel free to help us fill in any more details about your installation. There's a spreadsheet linked from https://github.com/IQSS/dataverse-installations
19:28
nightowl313
yes, there's one in AZ ... although if you click our link there is no data! (that's what we're working on now)
19:29
nightowl313
okay, will add more to that ...
19:29
pdurbin
here's a direct link to the spreadsheet: https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0
19:30
pdurbin
please feel free to request edit access
19:30
pdurbin
and make a pull request, but that comes later
19:32
nightowl313
oh i mean, we need to get some data into our dataverse! haha ... requesting edit access for the spreadsheet
19:34
pdurbin
Hmm, I don't see the request yet. Maybe it takes a while.
19:56
nightowl313
i did hit "send" a couple of times ... I think this happened before
19:58
pdurbin
🤷
20:01
nightowl313 left #dataverse
20:36
nightowl313 joined #dataverse
20:54
nightowl313 left #dataverse
21:02
pdurbin
I hate to end on a shrug so I'll say goodnight all. See ya tomorrow.
21:02
pdurbin left #dataverse