Time
S
Nick
Message
00:06
kevin joined #dataverse
01:08
jri joined #dataverse
04:08
jri joined #dataverse
07:35
jri joined #dataverse
07:40
jri joined #dataverse
07:40
jri joined #dataverse
08:45
jri joined #dataverse
08:56
jri joined #dataverse
13:29
andrewSC
morning all!
13:30
andrewSC
So I'm trying to design some sort of action plan for an ask that was made of me and i wanted to run it past you guys to get some thoughts (if any).
13:31
andrewSC
We have some datasets (probably around 400ish?) in our DV instance that need to have some normalization applied to them (changing first names of authors to full first names (mike -> michael), updating keywords, casing on titles, etc.
13:32
andrewSC
What i was planning on doing is getting all the datasets using the mydata endpoint (since all our datasets are unpublished atm), handing them off to some folks to do the normalization, then taking those modified datasets and editing/updating the current DV instance.
13:33
andrewSC
Something I noticed yesterday is that the mydata endpoint doesn't seem to include keywords in its dump of each dataset?
13:33
andrewSC
but the solr results do?
13:33
andrewSC
so i'm kinda trying to figure that part out
13:34
andrewSC
then it seems the only way to do any sort of "re-ingest" would be using the SWORD interface
13:34
andrewSC
which seems to imply that i may want to be extracting these using that interface in the first place so there isn't a funky translation step from SWORD to JSON
13:34
andrewSC
or vice versa
13:34
andrewSC
any thoughts or possible suggestions for what i'm trying to accomplish?
14:05
pameyer joined #dataverse
14:06
pameyer
hi andrewSC
14:07
pameyer
one thing that occurs to me would be to use the native API to get info for all the datasets, possibly convert to something more user friendly than JSON , and use that edited JSON to update the unpublished datasets
14:08
pameyer
should have all the information in it; but be aware that that's sometimes referred to as an "edit" native API is actually an "over-write everything with the new values" API
14:12
andrewSC
mmmmmmmmmm
14:13
andrewSC
ahhh
14:13
andrewSC
for some reason i thought the undocumented mydata endpoint was the only way to get draft datasets via api these days..
14:14
andrewSC
seems i can just specify :draft on the native api, provide the persistent id, and work it out
14:14
andrewSC
i'll dig into that and see what comes of it!
14:14
andrewSC
BTW
14:15
andrewSC
the final solution to that email situation i was having and the google suite control policy enforced at ncsu had to do with not having the system email set... and it actually wasnt the sending that was the issue, it was the delivery to the ncsu.edu email address
14:16
andrewSC
not even sure why that changed? because i set the root email address in DV as well as the two locations in glassfish where the email should be set
14:16
andrewSC
but that system email had to be set via curl command
14:20
andrewSC
and i mean that's my own damn fault because it is _very_ clearly stated in the documentation that without setting that system email, email just won't work lol
14:21
andrewSC
so yeah.. just thought i'd update you guys on what happened :)
14:21
andrewSC
i did end up getting mailgun working though so we have that going for us as well
14:22
pameyer
andrewSC: thanks for the update, and good to hear that you finally got it working
14:23
andrewSC
mhmmm :)
14:23
pameyer
I've never dug into the mydata endpoint, so I don't have a great idea what it does
16:52
pdurbin
andrewSC: I agree with pameyer that you should serialize your datasets to JSON , do edits to that JSON, and then overwrite the draft datasets with the edited JSON, all using the native API . The only problem with this plan is that the process is not well documented: https://github.com/IQSS/dataverse/issues/3777
16:53
andrewSC
lol nice timing
16:53
andrewSC
was trying to get to the bottom of a nullpointer just now
16:53
pdurbin
And thanks for letting us know that your email issue is resolved. Good stuff.
16:53
andrewSC
mhmm
16:54
pdurbin
Hopefully some tips in that issue help. Please please consider making a pull request against doc/sphinx-guides/source/api/native-api.rst if you have time to improve the docs.
16:55
andrewSC
:)))
17:00
pameyer
andrewSC: in my experience, nullpointer with native API means that I typo'd my JSON
17:01
pameyer
the "best" approach I've found has been to hook up a debugger to glassfish to find out where the typo is
17:01
andrewSC
pameyer: that was my initial reaction too.. turns out it was just what the ticket described re: having the files key in the JSON i was trying to PUT
17:01
andrewSC
removing that key and sending again was successful!
17:03
pdurbin
phew
17:03
andrewSC
;)
17:04
andrewSC
also found out postman, the http/api client/debugger does something to the request that curl doesn't.. i switched over to insomnia for now. all seems to be working as expected
17:08
pdurbin
cool
18:39
andrewSC
you guys ever seen this during reindexing? https://gist.github.com/andrewSC/c6a1ca33baef42a04516a6d443f77c0f
18:40
pdurbin
pameyer has
18:40
andrewSC
i tried to get the dataset explicitly (assuming the id given the sequence of what the prev id and next id was)
18:40
pdurbin
and I have too
18:40
pdurbin
andrewSC: which release or branch are you on?
18:40
andrewSC
and i got a 404
18:40
andrewSC
4.8.5
18:41
pdurbin
dsDescriptionDate. hmm
18:41
pdurbin
andrewSC: would you be willing to open an issue about this?
18:41
pameyer
yup that does look farmiliar
18:42
andrewSC
added two comments to that gist
18:43
andrewSC
with what i see in the log and what i tried to curl and got a 404 with
18:44
andrewSC
if you delete a draft, are all references removed from the db? because only thing i can think of is dv trying to reindex datasets that were deleted but there's still a ref somewhere
18:45
andrewSC
draft that was never published*
18:45
pdurbin
I just added a comment too.
18:46
andrewSC
huh interestingg
18:46
pdurbin
I wouldn't guess it has anything to do with deleting a draft but who knows.
18:46
pameyer
andrewSC: I don't think that's related to deleting drafts
18:46
andrewSC
yeah npnp i can open an issue
18:46
andrewSC
touche
18:46
pdurbin
thanks!
19:08
andrewSC
mmmmmm interesting
19:09
andrewSC
turned up the logging on edu.harvard.iq.dataverse.search.IndexServiceBean
19:09
andrewSC
looking through the output now
19:16
andrewSC
https://gist.github.com/andrewSC/a47ac332b4cbd89f1dd42a5bc9ba0c75
19:16
andrewSC
smoking gun!
19:18
andrewSC
although that seems obvious from the output without fine logging.. it seems like it doesn't like parens?
19:29
pameyer
hmmm - but where are the parens coming from?
19:29
andrewSC
right? that's what i'm trying to hunt down now
19:29
pameyer
weird
19:45
andrewSC
https://github.com/IQSS/dataverse/issues/4558
19:45
andrewSC
Hopefully that's somewhat descriptive of what I've done so far
19:46
andrewSC
just interesting
19:47
andrewSC
one of the datasets in question, when i pulled it from the api as well as looked at it in the dv frontend didn't have parens
19:48
andrewSC
got to here https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java#L211 and the next thing i would have done is checked datasetFields contained
19:48
andrewSC
gotta run though
22:01
pameyer left #dataverse
22:14
pdurbin
thanks for opening that issue
22:14
pdurbin left #dataverse