IQSS logo

IRC log for #dataverse, 2019-08-26

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
06:46 jri joined #dataverse
07:29 poikilotherm joined #dataverse
10:51 pdurbin joined #dataverse
10:52 pdurbin poikilotherm: welcome back!
11:16 jri joined #dataverse
11:25 poikilotherm Hey pdurbin :-)
11:25 poikilotherm Thx
11:27 pdurbin_m joined #dataverse
11:29 pdurbin_m poikilotherm: my trip to PIDapalooza and FOSDEM was approved. I'm hoping to book flights today. Want to try to meet up?
11:29 poikilotherm Wow, cool!
11:29 poikilotherm I still need to ask my boss if I may go to Pidapalooza...
11:29 poikilotherm Fosdem is not far away... ;-)
11:46 pdurbin_m Right. I think I'll book hotels for 3 nights in each city. Any suggestions for where to stay in Brussels?
12:34 pdurbin pmauduit: hi! Did you see the comment I left at https://github.com/IQSS/dataverse-ansible/issues/99#issuecomment-524398212 ?
13:38 donsizemore joined #dataverse
14:00 pmauduit not yet, sorry I've been on a meeting all the afternoon
14:00 pmauduit I'll have a look asap
14:08 pmauduit I guess it's kindof normal to have prometheus on /prometheus/ now, even on the 9090 port
14:08 pmauduit that I'd have expected
14:12 pmauduit pdurbin: what is the user on the ec2 instance already ?
14:13 pmauduit centos, right
14:15 pmauduit ok, I replied
14:16 pdurbin pmauduit: yes, "centos"
14:18 pmauduit next thing I'd like to do is to configure JMX, but no idea where you can tweak the Java variables for the J2EE container
14:20 donsizemore @pmauduit this is where i limited the scope of the initial issue =)
14:22 pdurbin pmauduit: JMX first or Grafana first? I was thinking maybe we could start with a very basic Grafana dashboard of operating system level stuff like disk usage, CPU, memory, etc.
14:29 pdurbin I'm fine with whatever. :)
14:29 pdurbin donsizemore: I'm glad to see you're still following along even though you've got a busy week coming up. Upgrades and such, right? I forget. :)
14:30 donsizemore we're on a custom-patched 4.11 now; re-indexing just finished, i'm restoring original file sizes and tracking down dataset index failures
14:31 * pdurbin looks at https://dataverse.unc.edu
14:31 pdurbin awesome
14:32 pdurbin How long does reindexing take? Roughly?
14:41 pmauduit pdurbin: it's egal to me also
14:41 pmauduit *equal
14:42 pdurbin pmauduit: ok, if it's ok with you, I think I'd like to have a basic Grafana dashboard first. I'm not even ssh'ed in anymore. Do you want to try to install it? :)
14:42 pmauduit but we already have some material to be ported to the ansible playbook
14:42 pmauduit I can see if it's yum packaged
14:43 pdurbin True and we definitely do want to put all this in Ansible but I think donsizemore is quite busy with other stuff at the moment.
14:46 pmauduit :( seems that a package is available but not sure it's grafana in itself
14:46 pmauduit pcp-webapp-grafana.noarch : Grafana web application for Performance Co-Pilot
14:49 pmauduit ok done, using the yum repo advised by the grafana project
14:51 pdurbin perfect
14:52 pdurbin We already use custom repos for EPEL and Postgres.
14:52 pdurbin custom yum repos
14:52 pmauduit I commented on the ticket
14:52 pdurbin Looks good. The next step is to create and export a dashboard?
14:53 pmauduit provide it via the http conf before ?
14:53 pdurbin sure!
14:53 pdurbin Do you need a hand with that? Please feel free to make whatever edits you want. :)
14:55 pmauduit http://ec2-3-81-53-52.compute-1.amazonaws.com/grafana/login
14:55 pmauduit that's ok ;)
14:56 pdurbin Nice! Can you configure it so a login isn't required?
14:58 pmauduit probably, I cannot find the option for now though, but admin/admin is the default
14:59 pdurbin Ok. We'll find it later. I know it's possible. :)
15:01 donsizemore @pdurbin re-indexing on our current hardware and number of datasets takes ~90 minutes
15:02 pdurbin donsizemore: cool, I expected much worse :)
15:02 donsizemore 404 datasets don't want to re-index under solr 7.3.1, so far the majority are deaccessioned and the rest display in the web interface
15:03 donsizemore since we had to blow away solr for the 4.11 upgrade, i moved the retroactive original filesize and ReExportAll steps for the last
15:03 pdurbin By 404 do you mean unpublished?
15:04 donsizemore then i'll re-index one more time to see if that number changes. no, just coincidence that 404 datasets threw indexing failures
15:04 donsizemore i didn't think reordering the filesize or JSON-LD export steps should've affected solr
15:04 donsizemore and reordering those steps cut our downtime drastically
15:05 pdurbin gotcha, you seem to be back up. that's good :)
15:06 donsizemore Mandy hasn't started using it yet =) she was the anti-King Midas during our testing
15:06 donsizemore features that worked just fine for me failed consistently for her
15:07 pdurbin It's good to have people around like that. :)
15:07 pmauduit pdurbin: http://ec2-3-81-53-52.compute-1.amazonaws.com/grafana/d/COPs6vKZk/overall-metrics?orgId=1&from=now-30m&to=now
15:09 pdurbin pmauduit: CPU Load! Looks great! https://i.imgur.com/CcIEgO9.png
15:09 pmauduit I'm adding the memory graphes, once I found out in which unit collect is providing the info ;)
15:10 pmauduit should be in bytes(2GB of used ram seems legit ?)
15:11 pdurbin well, we could compare it to what munin is reporting for memory
15:11 pmauduit then reload the grafana dashboard
15:11 pdurbin Please see http://ec2-3-81-53-52.compute-1.amazonaws.com/munin/localhost/localhost/memory.html
15:12 pmauduit seems in sync
15:12 pdurbin cool
15:13 pdurbin Meanwhile I found https://grafana.com/blog/2019/05/16/worth-a-look-public-grafana-dashboards/ but not the config on how to make grafana dashboards public.
15:13 pmauduit the dashboard might be public by default, but would require admin account to set up your dashboard as well as datasources
15:13 pdurbin Ah, there's something called [auth.anonymous]
15:13 pdurbin please see https://stackoverflow.com/questions/33111835/how-to-set-up-grafana-so-that-no-password-is-necessary-to-view-dashboards
15:14 pmauduit http://ec2-3-81-53-52.compute-1.amazonaws.com/grafana/admin/settings
15:14 pmauduit it's over here
15:14 pmauduit so stored somewhere in the db I guess, but might be overriden in the grafana.ini
15:15 pdurbin Ok, please feel free to go ahead and make the graphs public.
15:16 pmauduit found !
15:17 pmauduit http://ec2-3-81-53-52.compute-1.amazonaws.com/grafana/d/COPs6vKZk/overall-metrics?orgId=1 accessible in "privacy mode" now
15:18 pmauduit (with no authà
15:18 pmauduit )
15:37 pdurbin pmauduit: perfect! I was just at standup and showed Danny afterwards. I'm not sure if you know Danny but he's our project manager.
15:38 pmauduit no I don't think so, but great !
15:38 pmauduit (if he's not around on irc, we probably never exchanged together)
15:39 pmauduit pdurbin: I notice that centos does not start / enable the services by default after a yum install
15:39 pmauduit I don't know if it's a problem (as we might provision once and let the VM run until its "death")
15:40 pdurbin pmauduit: yes, that's a feature of centos. It drives me crazy that debian does the opposite. :)
15:40 pmauduit but it's good to know
15:43 pdurbin Don't worry, when donsizemore and I get all this stuff added to dataverse-ansible, we'll make sure all the services start on boot.
15:44 pdurbin pmauduit: should we move on to monitoring Glassfish or Solr via JMX? What's our next move? :)
15:45 pdurbin Or should we export the dashboard as is? CPU and memory is a good start!
15:48 pmauduit pdurbin: I checked the ansible module for grafana, there is everything to each json as input and get ansible configured as output (datasource + dashboard)
15:49 pdurbin pmauduit: oh! That might be nice. I'll let donsizemore decide though. I'm fine with whatever. :)
15:50 bjonnh pdurbin: what may I know ?
15:52 pdurbin bjonnh: heh. Nevermind. There's an "about" page coming for Harvard Dataverse that will explain it all some day. :)
15:53 pmauduit pdurbin: https://github.com/IQSS/dataverse-ansible/issues/99#issuecomment-524916069
15:54 pdurbin pmauduit: cool. JSON format works for me!
16:01 pdurbin pmauduit: do you think it would be easier to monitor Glassfish or Solr?
16:07 pdurbin pmauduit: I just found this if it helps: https://lucene.apache.org/solr/guide/7_3/using-jmx-with-solr.html (Dataverse supports Solr 7.3.x right now)
16:08 pdurbin This is even more specific: https://lucene.apache.org/solr/guide/7_3/monitoring-solr-with-prometheus-and-grafana.html
19:32 donsizemore joined #dataverse
19:33 donsizemore @pdurbin knock knock, o ye of the institutional knowledge?
19:35 pdurbin donsizemore: hit me
19:39 donsizemore @pdurbin so, on our production dataverse we have 404 datasets which cause solr to throw an index failure
19:40 donsizemore @pdurbin on our test server, same warfile, same solr version, import of production database, we only have our accustomed 22 dataset indexing failures that we've never tracked down
19:41 donsizemore schema.xml is identical. i do note a couple differences in solrconfig.xml (which... i thought i used the copy from dvinstall.zip for both, but looks like i dropped the ball on the test server there (and it only fails on 22)
19:41 pdurbin oh! 404 datasets, I thought you were talking about the http code... I'm with you now :)
19:42 pdurbin When you try to index one of those datasets individually, is there a stacktrace in server.log?
19:42 donsizemore correct. i'm trying to suss out the difference. one of those being solrconfig.xml. may i send you a diff?
19:42 donsizemore they all say Exception info: null
19:43 pdurbin right but there are probably line numbers in there
19:43 pdurbin that's what I want to see, the line numbers and which java classes
19:44 donsizemore at edu.harvard.iq.dataverse.__EJB31_Gen​erated__DataverseServiceBean__Intf__​__Bean__.findRootDataverse(Unknown Source)
19:44 pdurbin no line numbers?
19:44 donsizemore there are several, i can collate them. is it because we renamed the root dataverse to 'unc' years ago?
19:45 donsizemore javax.ejb.TransactionRolledbackLocalException: Exception thrown from bean         at com.sun.ejb.containers.EJBContainerTran​sactionManager.checkExceptionClientTx(E​JBContainerTransactionManager.java:662)
19:45 pdurbin Can you you please email the whole server.log file to support@dataverse.org? More of the stacktrace would help a lot.
19:45 donsizemore yes yes thank you.
19:46 donsizemore the "problem" datasets show up in the web interface, which i take to mean that solr didn't completely blow up during indexing
19:46 donsizemore unless the dataset page itself is populated by the database rather than solr
19:46 donsizemore most of the problem datasets are deaccessioned but not all of them
19:46 pdurbin Can you please link me to one of the problem datasets?
19:47 donsizemore here's one that isn't deaccessioned https://dataverse.unc.edu/dataset.xhtml?persistentId=doi:10.15139/S3/12410
19:49 donsizemore and if i try to index them manually through the API the server log doesn't barf
19:50 pdurbin Thanks for the link. This dataset seems to be properly indexed. Is it?
19:50 donsizemore the only difference i can find is the boost logic in solrconfig.xml, which is current on prod but out of date on test
19:50 donsizemore it was our third failure during index-all
19:51 pdurbin Are you suffering from this issue? index all fails to index some datasets but they can be indexed individually #5575 https://github.com/IQSS/dataverse/issues/5575
19:51 donsizemore so far they all look properly indexed in the web interface. do you suppose solr is tripping on some field and we get a blue screen of indexing in the glassfish log?
19:51 donsizemore this is exactly the stack trace, o ye of the institutional knowledge
19:53 donsizemore so, the solution is... upgrade to 4.14+ (which we're not ready to do yet) or simply re-index them manually?
19:53 pdurbin Yeah, I think so. We don't know where the bug is. Reindexing manually seems to be the workaround.
19:55 pdurbin donsizemore: please feel free to leave a comment on that issue
19:57 donsizemore the memory/resource angle makes perfect sense because our test server succeeds and it's otherwise doing jack
19:59 pdurbin Oh, was there a comment about memory or resources in the issue?
19:59 donsizemore yes, from jim myers
20:01 pdurbin Ah, I see it. Man, this bug has bit everyone. :(
20:15 donsizemore i scripted a re-index with a sleep 1 each time. no errors! (even the 22 we /should've/ seen)
20:15 pdurbin nice
20:15 pdurbin hooray for workarounds, I guess
20:15 pdurbin sorry for all the bugs :)
20:16 donsizemore kasha and i absconded to a bar to celebrate the upgrade; i'm re-indexing with bold rock
20:16 pdurbin :)
20:17 pdurbin cheers

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.