IRC log for #dataverse, 2018-03-29

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

All times shown according to UTC.

Time	Nick	Message
00:06		kevin joined #dataverse
01:08		jri joined #dataverse
04:08		jri joined #dataverse
07:35		jri joined #dataverse
07:40		jri joined #dataverse
07:40		jri joined #dataverse
08:45		jri joined #dataverse
08:56		jri joined #dataverse
13:29	andrewSC	morning all!
13:30	andrewSC	So I'm trying to design some sort of action plan for an ask that was made of me and i wanted to run it past you guys to get some thoughts (if any).
13:31	andrewSC	We have some datasets (probably around 400ish?) in our DV instance that need to have some normalization applied to them (changing first names of authors to full first names (mike -> michael), updating keywords, casing on titles, etc.
13:32	andrewSC	What i was planning on doing is getting all the datasets using the mydata endpoint (since all our datasets are unpublished atm), handing them off to some folks to do the normalization, then taking those modified datasets and editing/updating the current DV instance.
13:33	andrewSC	Something I noticed yesterday is that the mydata endpoint doesn't seem to include keywords in its dump of each dataset?
13:33	andrewSC	but the solr results do?
13:33	andrewSC	so i'm kinda trying to figure that part out
13:34	andrewSC	then it seems the only way to do any sort of "re-ingest" would be using the SWORD interface
13:34	andrewSC	which seems to imply that i may want to be extracting these using that interface in the first place so there isn't a funky translation step from SWORD to JSON
13:34	andrewSC	or vice versa
13:34	andrewSC	any thoughts or possible suggestions for what i'm trying to accomplish?
14:05		pameyer joined #dataverse
14:06	pameyer	hi andrewSC
14:07	pameyer	one thing that occurs to me would be to use the native API to get info for all the datasets, possibly convert to something more user friendly than JSON, and use that edited JSON to update the unpublished datasets
14:08	pameyer	should have all the information in it; but be aware that that's sometimes referred to as an "edit" native API is actually an "over-write everything with the new values" API
14:12	andrewSC	mmmmmmmmmm
14:13	andrewSC	ahhh
14:13	andrewSC	for some reason i thought the undocumented mydata endpoint was the only way to get draft datasets via api these days..
14:14	andrewSC	seems i can just specify :draft on the native api, provide the persistent id, and work it out
14:14	andrewSC	i'll dig into that and see what comes of it!
14:14	andrewSC	BTW
14:15	andrewSC	the final solution to that email situation i was having and the google suite control policy enforced at ncsu had to do with not having the system email set... and it actually wasnt the sending that was the issue, it was the delivery to the ncsu.edu email address
14:16	andrewSC	not even sure why that changed? because i set the root email address in DV as well as the two locations in glassfish where the email should be set
14:16	andrewSC	but that system email had to be set via curl command
14:20	andrewSC	and i mean that's my own damn fault because it is _very_ clearly stated in the documentation that without setting that system email, email just won't work lol
14:21	andrewSC	so yeah.. just thought i'd update you guys on what happened :)
14:21	andrewSC	i did end up getting mailgun working though so we have that going for us as well
14:22	pameyer	andrewSC: thanks for the update, and good to hear that you finally got it working
14:23	andrewSC	mhmmm :)
14:23	pameyer	I've never dug into the mydata endpoint, so I don't have a great idea what it does
16:52	pdurbin	andrewSC: I agree with pameyer that you should serialize your datasets to JSON, do edits to that JSON, and then overwrite the draft datasets with the edited JSON, all using the native API. The only problem with this plan is that the process is not well documented: https://github.com/IQSS/dataverse/issues/3777
16:53	andrewSC	lol nice timing
16:53	andrewSC	was trying to get to the bottom of a nullpointer just now
16:53	pdurbin	And thanks for letting us know that your email issue is resolved. Good stuff.
16:53	andrewSC	mhmm
16:54	pdurbin	Hopefully some tips in that issue help. Please please consider making a pull request against doc/sphinx-guides/source/api/native-api.rst if you have time to improve the docs.
16:55	andrewSC	:)))
17:00	pameyer	andrewSC: in my experience, nullpointer with native API means that I typo'd my JSON
17:01	pameyer	the "best" approach I've found has been to hook up a debugger to glassfish to find out where the typo is
17:01	andrewSC	pameyer: that was my initial reaction too.. turns out it was just what the ticket described re: having the files key in the JSON i was trying to PUT
17:01	andrewSC	removing that key and sending again was successful!
17:03	pdurbin	phew
17:03	andrewSC	;)
17:04	andrewSC	also found out postman, the http/api client/debugger does something to the request that curl doesn't.. i switched over to insomnia for now. all seems to be working as expected
17:08	pdurbin	cool
18:39	andrewSC	you guys ever seen this during reindexing? https://gist.github.com/andrewSC/c6a1ca33baef42a04516a6d443f77c0f
18:40	pdurbin	pameyer has
18:40	andrewSC	i tried to get the dataset explicitly (assuming the id given the sequence of what the prev id and next id was)
18:40	pdurbin	and I have too
18:40	pdurbin	andrewSC: which release or branch are you on?
18:40	andrewSC	and i got a 404
18:40	andrewSC	4.8.5
18:41	pdurbin	dsDescriptionDate. hmm
18:41	pdurbin	andrewSC: would you be willing to open an issue about this?
18:41	pameyer	yup that does look farmiliar
18:42	andrewSC	added two comments to that gist
18:43	andrewSC	with what i see in the log and what i tried to curl and got a 404 with
18:44	andrewSC	if you delete a draft, are all references removed from the db? because only thing i can think of is dv trying to reindex datasets that were deleted but there's still a ref somewhere
18:45	andrewSC	draft that was never published*
18:45	pdurbin	I just added a comment too.
18:46	andrewSC	huh interestingg
18:46	pdurbin	I wouldn't guess it has anything to do with deleting a draft but who knows.
18:46	pameyer	andrewSC: I don't think that's related to deleting drafts
18:46	andrewSC	yeah npnp i can open an issue
18:46	andrewSC	touche
18:46	pdurbin	thanks!
19:08	andrewSC	mmmmmm interesting
19:09	andrewSC	turned up the logging on edu.harvard.iq.dataverse.search.IndexServiceBean
19:09	andrewSC	looking through the output now
19:16	andrewSC	https://gist.github.com/andrewSC/a47ac332b4cbd89f1dd42a5bc9ba0c75
19:16	andrewSC	smoking gun!
19:18	andrewSC	although that seems obvious from the output without fine logging.. it seems like it doesn't like parens?
19:29	pameyer	hmmm - but where are the parens coming from?
19:29	andrewSC	right? that's what i'm trying to hunt down now
19:29	pameyer	weird
19:45	andrewSC	https://github.com/IQSS/dataverse/issues/4558
19:45	andrewSC	Hopefully that's somewhat descriptive of what I've done so far
19:46	andrewSC	just interesting
19:47	andrewSC	one of the datasets in question, when i pulled it from the api as well as looked at it in the dv frontend didn't have parens
19:48	andrewSC	got to here https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java#L211 and the next thing i would have done is checked datasetFields contained
19:48	andrewSC	gotta run though
22:01		pameyer left #dataverse
22:14	pdurbin	thanks for opening that issue
22:14		pdurbin left #dataverse

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.