IQSS logo

IRC log for #dataverse, 2021-04-02

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
13:56 pdurbin joined #dataverse
15:04 pameyer joined #dataverse
15:08 nightowl313 joined #dataverse
15:13 nightowl313 hi all .. lurking at yesterday's notes ... per the conversation about OS ... i'm very curious what others are switching to ... we are currently on Centos 8 but have tested stream
15:13 nightowl313 i'm thinking we may want to switch back to centos 7 ... things were fine there
15:14 nightowl313 we also are considering ubuntu
15:14 nightowl313 as all of our other things run on ubuntu
15:15 nightowl313 but kind of waiting to see what others do =)
15:16 pameyer nightowl313: we've had good luck with ubuntu LTS for some of our (non-dataverse) servers, but we're probably not going to move everything to it
15:17 pameyer we're also waiting to see what others do :) but part of that is for research workstations / HPC stuff.  switching back to centos 7 (or "downgrading" things that have been updated to cent8) gives more time for things to shake out
15:18 pameyer I'm a little concerned about ubuntu's move to snap packages, but so far it hasn't impacted us for server stuff
15:19 nightowl313 yea i think for us it would be better to use what is "recommended" or at least tested and others in the community are using ... i think i will try a downgrade to centos 7 on a test instance
15:20 pameyer I'm not sure if it matters for your site, but shibboleth is something that might need double-checking for an ubuntu switch.  java/apache/postgres I'd expect to be less of an issue
15:21 pameyer and sticking with the "recommended" platform is usually a good idea.  sometimes there are reasons not to, but I usually try to stick with the recommendations
15:23 nightowl313 agreed! i guess this will be a good test of our disaster recovery process! will have to install centos 7 fresh and recreate from db dump and copied files, right? does that preserve the history/versions?
15:29 nightowl313 or no ... i guess i would just have to point to the s3 and rds with new install .. sorry thinking this all through =) have only done new installs up until now, other than testing DR
15:30 nightowl313 on another note ... does anyone know if the unc number in the citation changes when you make metadata changes?
15:30 nightowl313 i'm all over the place! =D
15:30 pdurbin I assume you mean the UNF and no, it only changes when contents of tabular files change.
15:31 nightowl313 yes unf
15:31 nightowl313 thanks
15:31 pdurbin UNF is a checksum of data, not metadata. :)
15:31 pameyer testing backups is always a good thing - I think most people have had the experience at some point where "we have backups" turns into - nope, we never tested the restore and it didn't work
15:32 pameyer good to check it before you need it
15:33 nightowl313 i did do some testing of restoring backups for our DR plan (using the magical dataverse-ansible as well!) .. it was a lot of work!
15:36 pdurbin That's why so many shops don't do it. :)
15:36 nightowl313 but i did actually get it all to work ... brand new instance with db restored from backup and backups of s3 data copied to new s3 buckets .. i hope we never have to do that though
15:37 nightowl313 thanks for the info on the unf number ... i really had no idea what that was =)
15:39 pdurbin Sure. For what it's worth some folks in the community aren't a big fan of UNF: https://github.com/IQSS/dataverse/issues/7328
15:40 nightowl313 oh that is interesting!
15:41 nightowl313 we've had a few requests from folks to be able to remove the version number from the citation .. or at least have more flexibility over whether a new version gets created
15:41 nightowl313 or shows on the citation .. but I know that kind of defeats the purpose of version history! =D
15:42 nightowl313 they publish something with the citation info and they don't want it to not match what they've published ... we are just trying to work with folks and make sure they don't publish until they are really ready
15:43 pdurbin Do you know about CuratePublishedDatasetVersionCommand?
15:44 nightowl313 no ... looking it up
15:44 pdurbin That's what it's called in code. Here are the docs: https://guides.dataverse.org/en/5.3/admin/dataverses-datasets.html#make-metadata-updates-without-changing-dataset-version
15:45 pdurbin Good for fixing metadata typos, etc.
15:46 nightowl313 omg how did i miss that? i think i need to just read straight through the docs again from start to finish ... i do search for these things!
15:46 pdurbin Nah, just ask. It's easier. :)
15:48 nightowl313 well, i did know that the option to republish current version would appear if it is just a minor version change ... is this a different option
15:49 nightowl313 i didn't realize that was only for superuser, though
15:49 pameyer there could be a bit of a circular sitation - dataset citation goes into a manuscript, but can't add the manuscript citation to the dataset until it gets published
15:57 pameyer ... which appears to be completely avoided (at least for superusers) by the documentation pdurbin linked
15:58 pdurbin hopefully
16:02 nightowl313 okay, we have used this before for just that reason (although didn't realize it was just superuser function) ... but will probably not advertise it widely =)
16:05 pdurbin good to have in your back pocket
16:09 nightowl313 for sure! ;D
16:33 nightowl313 argh .. i tried to publish the draft and keep the current version, and i'm getting an error " Command edu.harvard.iq.dataverse.engine.command.i​mpl.CuratePublishedDatasetVersionCommand@53aa3b failed: Cannot merge an Entity that has been removed: edu.harvard.iq.dvn.core.study."
16:38 nightowl313 and ... "Response has already been committed, and further write operations are not permitted. This may result in an IllegalStateException being triggered by the underlying application. To avoid this situation, consider adding a Rule `.when(Direction.isInbound().and(Response.i​sCommitted())).perform(Lifecycle.abort())`, or figure out where the response is being incorrectly committed and correct the bug in the offending code.|#]"
16:40 nightowl313 yikes .. have not made changes since update to 5.3
16:59 nightowl313 tried deleting the draft and recreating it .. same thing .. anyone know what that error might indicate? have not had any issues with publishing until now ... should i put in a support ticket?
17:17 pdurbin nightowl313: yes, a support ticket please: support@dataverse.org
17:19 nightowl313 okay i submitted one (although i used dataverse_support@help.hmdc.harvard.edu, is that an old email?) .. .this does not happen on our test dataverse instance, so  may be related to the dataset itself
17:20 nightowl313 i also sent a community email ... lol ... covered all bases! just worried it is system-wide ... since it is a prod dv instance I can't do a lot of testing adding new datasets
17:31 pdurbin it looks like it came through ok: https://help.hmdc.harvard.edu/Ticket/Display.html?id=301437
17:35 nightowl313 +1 ... thanks for the link!
20:09 nightowl313 joined #dataverse
20:11 nightowl313 well, another friday question, which may or may not be related to my last question ... any idea why the metrics that appear on the dataset pages would not be updating? (ie: views, downloads, citations)?
20:11 nightowl313 If I download a file from our test instance, and refresh the page, the “# Downloads” metric increases by 1. On our prod instance, they all show 0 even after downloading a file.
20:12 nightowl313 i'm having problems today
20:13 pameyer do both have the same MDC config?
20:13 pdurbin For citations you have to have all the Make Data Count stuff installed.
20:15 nightowl313 they should ... i usually test everything on test and then do the same on prod, but it is possible ... i will check that .. i think i thought make data count was separate from the site stats
20:17 pameyer I'm not sure - it seems at least remotely plausible that if one had MDC on, the other didn't, and MDC configuration made some changes to the metrics display
20:17 pameyer but this is low confidence speculation on my part, so if there are better ideas probably worth investigating those first
20:20 nightowl313 i had make data count installed and was getting metrics, but something stopped along the say ... the main site "Downloads" is reporting okay, just not the datasets .. re-checking everything now
20:23 pdurbin If you're using MDC, the views and downloads don't show up until you run Counter Processor on the logs. I think in the guides we suggest doing this nightly.
20:23 pdurbin Here's the crazy diagram I made: https://guides.dataverse.org/en/5.3/_images/make-data-count.png
20:32 nightowl313 i was just looking at that .. i admit i need to spend some time understanding it better ... i had set up a cron job, but looks like it isn't running
20:35 pdurbin Ah, that would do it.
20:35 pdurbin Lots more moving parts with MDC than the out of the box metrics that Dataverse does.
20:36 pameyer cron jobs not running when they should trip things up
20:37 nightowl313 it is writing daily logs
20:37 nightowl313 so that one is running1
20:38 pdurbin Well, the logs should be written by Dataverse.
20:38 pameyer I've been bitten a few times by cron jobs that run, have an error, and try to email it somewhere.  if the system cron email isn't set right, or it gets lost in a filter, it can complicate troubleshooting
20:38 pdurbin Then CP crunches the logs.
20:38 pdurbin The Dataverse slurps up the result of that crunching (a SUSHI file in JSON format).
20:39 pdurbin Then*
20:39 pameyer I've also tripped over shell/path differences in cron jobs a time or two
20:40 nightowl313 it is the counter logs ... i can see those for every day since i set it up in october ... but i understand about the various places for adding cron jobs! =)
20:41 pdurbin If you have the logs, you have the data. :)
20:43 nightowl313 i stopped there though and didn't go through the rest of the steps to send to datacite .. but the site should be updating shouldn't it? we have metrics on the main page, but none of the datasets show downloads or views
20:43 nightowl313 granted it doesn't get very much action yet! ;-D
20:46 nightowl313 it is the main.py that is not running nightly
20:51 pameyer can you tell if where it's failing?
20:55 pdurbin Yes, it should work.
20:55 pdurbin But I'm heading out. Happy to chat more next week.
20:55 nightowl313 sorry i'm not making any sense ... it loooks like it should work, but if i upload a file to a published dataset, the "Downloads" value on the page doesn't increment ... they are all 0
20:56 pameyer if you manually run the cron job (outside cron), does that update the counts?
20:57 nightowl313 will try that now ... was just going through everything to make sure it was still set up .. i do get counter logs everyday
20:58 pdurbin left #dataverse
21:33 nightowl313 i think it is a permission error ... i apparently originally installed it as root
21:35 nightowl313 going to start over :)
22:32 pameyer good luck - I think I'm going to disconnect for the weekend too
22:35 nightowl313 happy weekend to all! thanks for the help!

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.