IQSS logo

IRC log for #dataverse, 2015-07-22

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
15:40 axfelix joined #dataverse
16:57 donsizemore joined #dataverse
18:11 donsizemore Hello! just curious if anyone is around and interested in troubleshooting pegged CPU?
18:16 pdurbin donsizemore: well, I can try a bit. Is this for DVN 3.x?
18:17 donsizemore correct - an Dataverse 4 upgrade is in our near future. and thank you for your help!
18:19 donsizemore i've been poring through the domain server, access and export logs but don't see a particular culprit. Akio and I went down a few rabbit holes earlier this afternoon
18:22 pdurbin donsizemore: I'm going to guess you don't want to simply restart glassfish
18:23 donsizemore already done, happy to do it again
18:23 pdurbin huh. and still pegged
18:24 pdurbin what's the load from `uptime`?
18:24 donsizemore we were getting a number of harvesting errors, but they had me disable the schedule for that
18:24 donsizemore it's been hovering around 5 all day. i found one IP in the access log that had sent an enormous number of requests earlier but that stopped around 1230 EDT
18:25 pdurbin 5 isn't crazy high at least
18:29 pdurbin donsizemore: I assume you have symptoms as well. site is slower than usual or whatever
18:29 donsizemore this particular VM has a ton of resources thrown at it, so the machine (and Glassfish) are perfectly responsive. but the load is usually closer to 0
18:30 pdurbin ok. responsive is good :)
18:31 donsizemore we spent part of the morning trying to correlate export errors / nullpointer exception to bad data of any sort, but can't pin anything down
18:36 pdurbin I wonder if that's related or not.
18:41 donsizemore on launch the glassfish log gets a string of javax.enterprise.system.container.e​jb.com.sun.ejb.containers|_ThreadID​=17;_ThreadName=Thread-2;|EJB5184:A system exception occurred  during an invocation on EJB StudyServiceBean, method: public void edu.harvard.iq.dvn.core.study.StudySe​rviceBean.exportStudy(java.lang.Long) errors, but once exporting finishes the CPU doesn't die down
18:42 pdurbin donsizemore: so if you restart glassfish and *don't* try to export everything is fine? it's only when you restart glassfish and try an export that the CPU gets pegged?
18:43 donsizemore it seems to be launching export on its own
18:44 pdurbin huh
18:45 pdurbin I assume you don't want that.
18:46 pdurbin donsizemore: this *might* help: http://wiki.greptilian.com/java/glassfish/howto/purge-jms-queue/
18:47 pdurbin if export is done via a JMS queue, which I'm not sure of
18:55 pdurbin donsizemore: I'd start by trying to list anything in any JMS queues
18:58 donsizemore DSBIngest     Queue  RUNNING  0      -         1      -         0      0       0      0.0
18:58 donsizemore IndexMessage  Queue  RUNNING  0      -         1      -         5      0       5      366.0
18:58 donsizemore mq.sys.dmq    Queue  RUNNING  0      -         0      -         0      0       0      0.0
19:00 pdurbin so a count of 5 in the IndexMessage queue?
19:00 donsizemore correct
19:00 pdurbin is that expected?
19:11 donsizemore Akio is checking documentation. if I try to view the jms/DSBIngest Connection Factory in the Glassfish console, I get SEVERE|glassfish3.1.2|org.glassfish.​admingui|_ThreadID=37;_ThreadName=Th​read-2;|RestResponse.getResponse() gives FAILURE.  endpoint = 'https://localhost:4848/management/domain/resources/connector-connection-pool/jms%2FIndexMessage'; attrs = '{}'
19:12 pdurbin ok
19:18 pdurbin donsizemore: my only point is that when unexpected stuff starts happening when you start glassfish it may be because stuff is in the JMS queue.
19:21 donsizemore point absolutely well taken - just to be sure, the JMS queue is safe to purge?
19:22 pdurbin well, I would think you'd be able to restart indexing anyway
19:23 donsizemore 10-4 -- please forgive me for being squeamish; i'm new =)
19:24 pdurbin probably good to be a bit squeamish
19:24 pdurbin :)
19:24 pdurbin donsizemore: please feel free to open a ticket for all this. I'm just trying to figure out why your CPU is pegged. support@dataverse.org
19:26 donsizemore #224689 from this morning
19:26 donsizemore the purge returned successful but i still have a consumer count of 1 in DSBIngest and 5 in IndexMessage - should I purge IndexMessage specifically?
19:27 pdurbin huh. sounds like it didn't do anything
19:34 donsizemore Lucene seems to be the CPU hog
19:37 pdurbin donsizemore: yeah? how can you tell?
19:38 donsizemore turned on threading in "top" (which I should've done in the first place but I'm learning)
19:38 donsizemore then stracing those processes shows Lucene contending over write-locks
19:39 donsizemore stat("/usr/DVN/lucene/index-dir/write.lock", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
19:39 pdurbin how do you turn on threading in top?
19:39 donsizemore on Linux, with a capital H
19:39 donsizemore i meant to look that up earlier and got sidetracked messing with jstat etc
19:40 donsizemore i had also seen Lucene complaining about write locks timing out earlier, but thought that was a symptom (timeout) rather than a cause
19:40 pdurbin in the ticket you opened I see this: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/usr/DVN/lucene/index-dir/write.lock
19:41 donsizemore (actually, it may still be a symptom)
19:41 donsizemore will stopping Glassfish and manually removing the lock cause things to go any wonkier than present?
19:42 pdurbin not sure
19:43 pdurbin sorry, I've spent more time on Dataverse 4 by now
19:46 donsizemore it was Lucene's write lock. right in front of my (blushing) face
19:47 donsizemore and we, too, hope to be on Dataverse 4 soon
19:48 pdurbin donsizemore: you removed the lock and you're all set now?
19:51 donsizemore things look wonderfully normal. i apologize for my many dumb questions but i'm learning new-to-me technologies, the Dataverse among them
19:51 pdurbin no no, it's all good. good reminder of how useful strace is :)
20:13 * pdurbin replies on https://help.hmdc.harvard.edu/Ticket/Display.html?id=224689
22:00 garnett joined #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.