IQSS logo

IRC log for #dataverse, 2018-03-29

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
00:06 kevin joined #dataverse
01:08 jri joined #dataverse
04:08 jri joined #dataverse
07:35 jri joined #dataverse
07:40 jri joined #dataverse
07:40 jri joined #dataverse
08:45 jri joined #dataverse
08:56 jri joined #dataverse
13:29 andrewSC morning all!
13:30 andrewSC So I'm trying to design some sort of action plan for an ask that was made of me and i wanted to run it past you guys to get some thoughts (if any).
13:31 andrewSC We have some datasets (probably around 400ish?) in our DV instance that need to have some normalization applied to them (changing first names of authors to full first names (mike -> michael), updating keywords, casing on titles, etc.
13:32 andrewSC What i was planning on doing is getting all the datasets using the mydata endpoint (since all our datasets are unpublished atm), handing them off to some folks to do the normalization, then taking those modified datasets and editing/updating the current DV instance.
13:33 andrewSC Something I noticed yesterday is that the mydata endpoint doesn't seem to include keywords in its dump of each dataset?
13:33 andrewSC but the solr results do?
13:33 andrewSC so i'm kinda trying to figure that part out
13:34 andrewSC then it seems the only way to do any sort of "re-ingest" would be using the SWORD interface
13:34 andrewSC which seems to imply that i may want to be extracting these using that interface in the first place so there isn't a funky translation step from SWORD to JSON
13:34 andrewSC or vice versa
13:34 andrewSC any thoughts or possible suggestions for what i'm trying to accomplish?
14:05 pameyer joined #dataverse
14:06 pameyer hi andrewSC
14:07 pameyer one thing that occurs to me would be to use the native API to get info for all the datasets, possibly convert to something more user friendly than JSON, and use that edited JSON to update the unpublished datasets
14:08 pameyer should have all the information in it; but be aware that that's sometimes referred to as an "edit" native API is actually an "over-write everything with the new values" API
14:12 andrewSC mmmmmmmmmm
14:13 andrewSC ahhh
14:13 andrewSC for some reason i thought the undocumented mydata endpoint was the only way to get draft datasets via api these days..
14:14 andrewSC seems i can just specify :draft on the native api, provide the persistent id, and work it out
14:14 andrewSC i'll dig into that and see what comes of it!
14:14 andrewSC BTW
14:15 andrewSC the final solution to that email situation i was having and the google suite control policy enforced at ncsu had to do with not having the system email set... and it actually wasnt the sending that was the issue, it was the delivery to the ncsu.edu email address
14:16 andrewSC not even sure why that changed? because i set the root email address in DV as well as the two locations in glassfish where the email should be set
14:16 andrewSC but that system email had to be set via curl command
14:20 andrewSC and i mean that's my own damn fault because it is _very_ clearly stated in the documentation that without setting that system email, email just won't work lol
14:21 andrewSC so yeah.. just thought i'd update you guys on what happened :)
14:21 andrewSC i did end up getting mailgun working though so we have that going for us as well
14:22 pameyer andrewSC: thanks for the update, and good to hear that you finally got it working
14:23 andrewSC mhmmm :)
14:23 pameyer I've never dug into the mydata endpoint, so I don't have a great idea what it does
16:52 pdurbin andrewSC: I agree with pameyer that you should serialize your datasets to JSON, do edits to that JSON, and then overwrite the draft datasets with the edited JSON, all using the native API. The only problem with this plan is that the process is not well documented: https://github.com/IQSS/dataverse/issues/3777
16:53 andrewSC lol nice timing
16:53 andrewSC was trying to get to the bottom of a nullpointer just now
16:53 pdurbin And thanks for letting us know that your email issue is resolved. Good stuff.
16:53 andrewSC mhmm
16:54 pdurbin Hopefully some tips in that issue help. Please please consider making a pull request against doc/sphinx-guides/source/api/native-api.rst if you have time to improve the docs.
16:55 andrewSC :)))
17:00 pameyer andrewSC: in my experience, nullpointer with native API means that I typo'd my JSON
17:01 pameyer the "best" approach I've found has been to hook up a debugger to glassfish to find out where the typo is
17:01 andrewSC pameyer: that was my initial reaction too.. turns out it was just what the ticket described re: having the files key in the JSON i was trying to PUT
17:01 andrewSC removing that key and sending again was successful!
17:03 pdurbin phew
17:03 andrewSC ;)
17:04 andrewSC also found out postman, the http/api client/debugger does something to the request that curl doesn't.. i switched over to insomnia for now. all seems to be working as expected
17:08 pdurbin cool
18:39 andrewSC you guys ever seen this during reindexing? https://gist.github.com/andrewSC/c6a1ca33baef42a04516a6d443f77c0f
18:40 pdurbin pameyer has
18:40 andrewSC i tried to get the dataset explicitly (assuming the id given the sequence of what the prev id and next id was)
18:40 pdurbin and I have too
18:40 pdurbin andrewSC: which release or branch are you on?
18:40 andrewSC and i got a 404
18:40 andrewSC 4.8.5
18:41 pdurbin dsDescriptionDate. hmm
18:41 pdurbin andrewSC: would you be willing to open an issue about this?
18:41 pameyer yup that does look farmiliar
18:42 andrewSC added two comments to that gist
18:43 andrewSC with what i see in the log and what i tried to curl and got a 404 with
18:44 andrewSC if you delete a draft, are all references removed from the db? because only thing i can think of is dv trying to reindex datasets that were deleted but there's still a ref somewhere
18:45 andrewSC draft that was never published*
18:45 pdurbin I just added a comment too.
18:46 andrewSC huh interestingg
18:46 pdurbin I wouldn't guess it has anything to do with deleting a draft but who knows.
18:46 pameyer andrewSC: I don't think that's related to deleting drafts
18:46 andrewSC yeah npnp i can open an issue
18:46 andrewSC touche
18:46 pdurbin thanks!
19:08 andrewSC mmmmmm interesting
19:09 andrewSC turned up the logging on edu.harvard.iq.dataverse.search.IndexServiceBean
19:09 andrewSC looking through the output now
19:16 andrewSC https://gist.github.com/andrewSC/a47ac332b4cbd89f1dd42a5bc9ba0c75
19:16 andrewSC smoking gun!
19:18 andrewSC although that seems obvious from the output without fine logging.. it seems like it doesn't like parens?
19:29 pameyer hmmm - but where are the parens coming from?
19:29 andrewSC right? that's what i'm trying to hunt down now
19:29 pameyer weird
19:45 andrewSC https://github.com/IQSS/dataverse/issues/4558
19:45 andrewSC Hopefully that's somewhat descriptive of what I've done so far
19:46 andrewSC just interesting
19:47 andrewSC one of the datasets in question, when i pulled it from the api as well as looked at it in the dv frontend didn't have parens
19:48 andrewSC got to here https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java#L211 and the next thing i would have done is checked datasetFields contained
19:48 andrewSC gotta run though
22:01 pameyer left #dataverse
22:14 pdurbin thanks for opening that issue
22:14 pdurbin left #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.