katalin susztak:thank you. so, thank the organizers for inviting me, and i am fully aware of the fact thatmy talk which is going to be half-an-hour exactly is between you and getting lunch,and i know it’s been a long session, so i want to tell a little bit about the workthat we have been doing over the last five years. my lab was part of the roadmap project,so if you’ve been working on this just a way of an introduction. i’m actually a physicianscientist, just like the first speaker, so i am a nephrologist. so, what i do on a dailybasis, i have patients who are on dialysis, so we have about half a million patients inthe united states. and they spend an excessive amount of time over there, so it’s fourhours three times a week, and it’s not the
best to have it. so, in general just a way of introductionto the kidney, the kidney is basically a, you know, a [unintelligible] organ, right,and on a microscopic basis it consists of -- i think you can see this -- so it consistsof the structure which is called glomerulus, where you just basically filter your blood.it’s actually you filter a lot, so it’s about 100 cc per minute. so, you filter aboutone coffee every two minutes, and then you -- as you probably noticed you don’t peeout 10 buckets, actually 18 buckets of water every day, so that’s because you have theselong and convoluted and different parts of a tubal system which basically reabsorbs thewater and electrolytes, and then there’s
some form of a secretion, so the functionof the kidney is measured by the filtering function of this glomerulus, and this is afibroid kidney, so in kidney disease that we study and of course advanced adrenal diseaseis basically you get this scarring of the organ where you lose the epithelial cellsand then the glomerulus as well. so, the function is measured how much you filter and nephrologistshave really simple people so you filter 100 cc per minute. you know we like that roundnumber 100. that’s having measured it, so i know you -- many of the people have thisnotion that, well why to care about kidney disease? you have dialysis and transportation.indeed we do have it, but i just want to tell you that if you have end-stage kidney diseaseand you are on dialysis, you have about 20,
25 percent chance of living through five years,and that’s just a little bit better than getting lung cancer of aml and then it’sactually largely worse than many of the common cancers, and just a way of putting renal canceron this bar -- on this graph as well, so actually the survivor of [unintelligible] cancer isslightly better than being on dialysis, so it’s not a trivial problem, and also itcosts about $30 billion a year which is actually 10 percent of the medicare budget despitethese patients actually i think consists of only 1 percent of the total population ofit. so, it’s quite costly. you do better if you get a transplant, but very few peopleare able to get a transplant. so, why do people develop kidney disease andhow can we solve it, and that’s what my
lab is trying to understand. so, as nancycox kind of introduced us. it’s a complex trait. we have a contribution -- a geneticcontribution and then we have these numbers for hereditability and you can see that thisreaches .3 to .7. right now, we believe the hereditability of gfr amount in europeansis somewhere around .3 to .7 comes in for african americans for end-stage kidney disease,and i’m going to show an example of what could explain that actually very high hereditability,and then a bunch of environmental factors, aging. that’s why most people -- it contributesvery strongly for kidney disease development, diabetes and smoking. and then, here you arewith kidney disease. so, how to understand the genetics of kidneydisease; we have gwas and i think people have
kind of talked about this quite extensively.this is the data for the most updated gwas paper from ckdgen that my lab collaboratesquite significantly and there is a new one in the pipeline. this one has about 67,000participants in it, and the new one is going to have about -- more than 100,000 cases ofeuropean descent and what you see here that some of the low side of these come out tobe significance and then we were able to increase the significance and i actually don’t knowhow many on this part -- this graph, but right now we have about 67 curated loci that wework on that has shown reproducible association in people, the european descent in chronickidney disease development. i will talk a little bit about this top locusover here on chromosome 16, and as you know,
you know we all love geneticists -- we alreadygave a name, so we don’t have nothing to do with -- after that they know what the genesthat cause kidney disease indeed as it was explained in the very beginning. we reallydon’t know whether these are the actual genes that underlie their association or causesrelated to disease development. so, as for many other traits, for kidney disease alsothese snps are in the non-coding area of the genome, so 80 percent are non-coding and thenwe have the questions that have been discussed before that; how do these snps actually leadto kidney disease development? so we would just like to know which one is the causalsnp, which one is the target cell type. really because i’m a cell biologist mostly so wereally would like to know the target genes,
and then maybe the mode of this regulationwould not be as bad as well. so, what my lab -- so this is the framework,the way i -- we think about it and then i think many of the people in the [unintelligible]thinks about this of how we could understand and make sense of these gwas. so, we thinkthat this causal variance somehow localized the regulatory region and disease relevancecell type. i’m going to give data and there are papers from john stem and brad bernsteinalso looking at the kidney associated traits that we believe that actually these cell typessomewhere localized in the kidney. it’s not really an [unintelligible] phenotype.that’s what we talked about already as well. so, the variant should alter the target gene,especially in this disease relevant cell type
via most likely altering transcription factorbinding although we could maybe accept other mechanisms. what we add to this is that we believe thatthe target expression -- the target should be expressed in the kidney, and then we alsothink that the target expression should change in disease states, and then we would liketo have a correlation how the genotype and the disease states changes the target expression,so if the risk allele increases the target expression, we hope that we find the samekind of correlation if you look at samples from patients with chronic kidney disease.and obviously the target expression should somehow cause kidney disease and thereforeshould be functional, so i will go through
a couple of examples, so the first one isthat this should be localized in the regulatory region in the kidney. so, to understand that, my lab physicallystarted to develop this fairly large kidney bank, so we have more than 1,000 samples atthe moment, 1,200 on the last count and then what we have here is slightly similar forother gwas data, so this is actually updated with clinical data in real time, so mostlythese are collected for unaffected part of tumor nephrectomies and those patients disease-- kidney disease incidences fairly high, 20 percent of them and since the common conditionis called kidney disease is diabetes, hypertension so these are actually quite highly prevalentconditions in people who are getting nephrectomies,
who are, you know, the usual 58 year old malesor females, and what we have built in is this data is updated itself, so we have not justthe static clinical update, but it updates over the years as -- so we have informationfor functional decline. we have done a fairly detailed histopathologicalexamination, which is not just like whether you have a disease or you don’t have disease,but we use many parameters that are -- we hope to use as maybe as endophenotypes aswe score different things that people under the microscopes can score off of the differentiationof epithelial cells, the scarring, the inflammatory cells, and so on just by visually looking,so we have large efforts to do transcriptome analysis, and i think we are about 500 samplesthat we have done already. and because i’ve
told you that there are two different segmentsin the kidney, one is this glomerulus which is the filter and the tubules that kind ofprocess the filtrate, so these -- we micro-dissect all sample to glomeruli and tubules. we haveepigenome analysis, mostly methylation, and we are working on what i will show later toisolate different cell types out of the kidney and make chip-seq base chromatin annotationfor them and then we have genotype all the samples that we have processed using biobankbecause it is much cheaper and then obviously we tried to integrate all of that togetherto figure out what’s causing kidney disease, so the causal variance should be somewherein the kidney, so to do that we get this kind of organ transplant of kidneys where we usejust the kidney cortex itself or we separate
different cell types out of it, and usingthe end-code based chromatin -- i mean chip-seq marks, the h3k27 acetylation and then k4 monomethylationas an enhancer marks and k4 trimethylation as promoters and k36 for methylation is astranscribed regions of two annotate regions in different cell types. so, now if you look at the snps, so we couldlook at in the kidney, so this is just a so-called adult kidney of what you find is -- what wefind is -- and that’s fairly similar. what’s published is that a large percentage of thesnps of the six or seven of the locus i’d actually localized the enhancers, so thisactually -- there are several ways to do this -- this is mapping just the leading snp thatis published in the paper, and then we can
kind of enhance this to about 65 percent ifyou take all the tagging snps in the ld block and then you accept that if one of the ldsactually in an enhancer, then you call it as an enhancer, but not more than that forthe kidney, and that’s -- there is a significant enrichment if you compare it to like a h1stem cell and the fibroblast and this is actually encode data, and then we looked at multipleencode cell type, so indicating that kidney disease associated polymorphisms are localizedto enhance the region in the kidney. so now we can do a little bit better thanthat because we have now these multiple cell types that we make out of the kidney, andthen we make the maps for these cell types as well, and then we can also say that thisis actually not just somewhere in the kidney,
but maybe in some enrichment. although, iwould take this with a grain of salt, but you see an enrichment that it is somewherein the tubule epithelial from all the places when we compare it to other cell types that’sin the kidney of glomerular epithelial cells and epithelial fibroblasts in mesangial cells,that seems to be the cell type where we see kind of more clustering of these ckd associatedpolymorphisms. so, that’s very nice, but that computational, and then obviously mylab is very interested in the mechanism, so we have to actually do the hard work so wehave to screen through these enhancers and then show that they are actually localizedand then to act as a regulatory region in the kidney, so to do that, we actually usethe zebra fish system and this very nice reporter
system where you have an mcherry flying bytwo tol2 sites, and then you can do large-scale cloning into it which we got via [unintelligible]fisher who has helped us quite a bit. so we clone all these, so computative [sic] enhancersover here and then we use a fish where we have -- it’s a transgenic fish where welabeled the tubule, so the zebra fish has actually just one filter by two little tubeson the side, so we label this with green and therefore if the clone in the mcherry, wecould see that whether it’s in the -- you could screen very efficiently whether yousee that. so, here it is in real life, so this is thetube which is green and this is the mcherry of this -- this is actually that chromosome16 locus which we are working on dissecting
which had the highest peak on the gwas andthen we are dissecting into multiple regions, and you see that that actually localizes againto the tubules, so the histone-based attestation and now a validation coincides that both ofthem -- this region, somewhere in this region is able to drive expression to the kidney,so it’s a kidney specific regulatory element. so, that’s very nice. the question is obviouslywhich -- because we are somewhat biology based is, what are the target genes of these variants,so this is nice that it’s in regulatory region, but you know, what are the targettranscripts? and to do that we toyed a little bit with in vitro transfectional luciferase[unintelligible] looking at them that many of these genes actually are putative targetsand not expressed in these cell lines that
we can easily transfect, so we mostly use,looking at -- working through -- using eqtls which have been introduced before, so basicallyyou’re looking at the genetic variations and the transcript expression, and then sowe have -- because we have a lot of kidneys that are genotype and we have transcript leveldata, then we can use now a kidney specific data to annotate the variance, so dependingon the genotype, you see variation in gene expression. so, this is a result -- so this is 100 ofthe kidneys that we have because this is more of a homogeneous cu decent. we feel that’simportant and then you find, you know, large number of so called e genes that are genesthat are snps that are associated with transcript
level changes in the kidney, so just to probably-- i should have introduced that, that some had the kidneys left out of all these bigefforts, so gtex is not very good at collecting kidneys and that big science paper that justcame out, they had three kidneys. although, i have to say that they made a major conclusionout of it that i’m not 100 percent sure, and i think kidney is being transplanted,so it’s hard to collect them, so i think it’s a quite useful, unique resource, andalso in roadmap john and brad bernstein had some kidney data here and there, but it reallywas not well represented even in the roadmap data and it’s not really part of reallyencode, so maybe in a way of advertising should be included and so i feel that these effortsare actually quite important.
so, we have a number of e genes which is quiteconsistent of what gtex is finding and that many of them are -- seem to be quote, quote,shared genes, but one-third of these is shared what’s not published in gtex, so this isthe ccq2 [unintelligible], so with 100 samples we cannot really do trans, so this is thesnp location, this is the transcript location and each spot is represented here if thatsnp is significantly regulate the target gene expression, and in real life it looks likethis. this is i think one of the best eqtl-plus that we have, so this particular variant whichcould be c/c, or c/t, and t/t, and then you see that this solid carriers, you know, thetubules are mainly -- you know, express high number of salt carriers, because that’swhat it’s function; it has to reabsorb salt
and water, and you see that this variancehas a very nice strong effect on the transcript level of this particular salt carrier. and then this is another one. i showed thisbecause this being proposed by the ckdgen consortium. and they did functional studiesindicating that this variance actually influences the level of this gene. they did not haveeqtl data in the paper; what they did is they did a morpholino-based knock-down of thisgene and that showed a phenotype, but indeed looking at the eqtl now, this affect is notas great as this one. i guarantee you, but there is an association between the genotypeof this and the target gene of this, and that seems to validate what is inside there.
so doing this obviously you can see very smallfraction of overlap with the ckd gwas hits and what you could do you could obviouslyyou can just look at the gwas snps whether you can find an association for any type oftarget gene. so to be very transparent, right now i think we have three or four where wehave good statistical significance and then hopefully we will have more maybe by dissection,or other matters that we are doing. just in a way of introducing, indeed these e snpsare enriched and they are more an enhancer and specifically this is an overlap of thetubal cell line h3k4 monomethylation and the e snp location and this is -- e snp is outcontrol snps, and you see an enrichment, and that is not there if you use other type ofregulatory marks and then actually this is
also not there if you are looking at othercell types, and thinking so this glomerulus epithelial cells and mesangial cells, so againsomehow indicating that the tubal epithelial cells may be the important cell type for thekidney and [unintelligible] development. so, i’m going to show you an example ofthat. so, this is that [unintelligible] chromosome 16 and what you see is this is the snps thatare showing the highest significance and then these are the genes under here similarly thathave been shown previously by the other speakers, and well, you probably saw the first ploton the disk. it is something called umod. umod has a urinary gene, has a name urinein it. so, it has something to do with the kidney, so that’s why this spot is actually-- was labeled with a big sign umod in the
kidney and that’s believed to be -- thissnp is actually -- seems to increase the expression of this gene by some studies, and what weknow that the gene expression actually decreases in disease development. so, the snp shouldincrease the expression of this gene, but in disease the gene expression goes down.so, if we look at this locus again because now we have eqtl data, but you see it is actuallyquite broader, so there are couple of other genes around it as well. so, this is the locus again, so these arethe snps here. this is that umod. these are the other genes over here, and then here ishow the eqtl looks. so, this is the transcript expression of the umod genes. there is a littletrend for increased expression, what has been
described in the literature, but it didn’treach statistical significance in our data. then you’re looking at the next gene overhere, which is actually a gene family, acsm, something to do with acyl-coa medium-chain.i really -- it’s not really well annotated in the literature, but there are five of them,and they are right here together. and this one did not show a change, but this one ifyou look at it, there is a very nice change between the genotype and an expression ofthis gene and actually there are -- pkm values for this gene is fairly decent showing asan e gene. this one did not, and this one again shows some association and here is notas nice as for this one, and expression of this gene is actually much lower, so indicatingthat for us when we look at this snp, it was
associated that this gene as a target gene,now, maybe one gene away is where we find the significant effect on gene expression. so, we included two additional cordelia thatthe target should be expressed in the disease-relevant tissue in the kidney so this is actually anillumina body data rna-seq data, and what you see is the expression of these genes ofthat area in the kidney. what you see is this gene umod that’s proposed to be -- is highlyexpressed, but our target is also fairly nicely expressed in the kidney. maybe some expressionin the liver, but it’s indeed it is very nicely expressed, and then if you look atthe protein expression, indeed, again, it’s fairly nicely expressed in the kidney as well.now, we also added that target expression
should change in kidney disease development.so, because we have a 1,000 samples, we can actually look at the correlation of the geneand kidney function because that’s a kidney function [unintelligible] changes, so goingfrom 100 to zero, you still see that there is quite nice r square and correlation, andthen that’s not just rna expression, but you can pick random samples from the top andon the bottom and then the protein expression correlate with disease development as well. so, alteration of the target can cause kidneydisease, so the target should be functional in the kidney. so, for this again we use thezebra fish system, and the morpholino knock-down. so, as i discussed the function of the kidneyis to get rid of salt and water. if the kidney
doesn’t function, you don’t get rid ofsalt and water and that’s represented in the fish as having an edema, so they puffup and then they have a lot of -- it’s probably called [unintelligible], so they have saltand water in excess. and, that’s what you see if you knock down the orthologue of thisacsm gene in zebra fish. so, in kind of -- and that’s kind of the proposed function ofthis acsm is something to do with acyl-coa and fatty acid metabolism, somewhere not muchknown in the literature. so, in conclusion, so we have this roadmapto understand gwas associated hit. i think human tissue samples and especially largenumber of human tissue samples are really critical to get to this; we used the epigenomemaps to identify regulatory regions, model
organisms to validate the causal variance,eqtl maps for target gene identification, and then we look at -- in addition to thatwe also look at the correlation of the genes, the kidney function because we feel that shouldalso be present, and then use model organisms, and the zebra fish seems to be a fairly quickscreening tool to figure this out, and then i showed you this out of the three that wehave as a hit, but mainly this is limited by the eqtls because right now these identify,i think just very few variance with significant affect because our sample size is small. anda couple of other issues that’s -- so that -- and the gene; maybe that has to do somethingwith fatty acid metabolism. i don’t know how i am about time, but ihave a few other things that i wanted to share,
so i will go through that quickly. so, youknow that the snps actually explain 2 percent of the hereditability and then we have about30 to 70 percent, so what about the others? so these variants you know explain very little.so, where is the missing hereditability, and then there are several things to think aboutthis: more samples, deeper sequencing, ethnic groups, and epigenetics. i will show you anexample for two of these. one is i think is absolutely tangential to the meeting, buti think it’s a beautiful example of genetics, so i cannot skip that, so -- and that’sabout different ethnic groups. so, the first slide that i showed you gwas was europeansand then you have the 67 regions, each of them adding together maybe explaining 2 percentof hereditability. now, if you do the same,
a mixture study in a black population forkidney disease, you get this one and only beautiful, big hit on chromosome 22, one hit,and that turns out to be a variant, a coding region variance in a gene called apol1, sothat’s very, very rare for any kind of complex trait, and that turns out to be that therewas, as evolutionary pressure to maintain that coding region variant because that variantprotects people from trypanosomiasis, which is the african sleeping sickness. so, i guess shows similarities to malariaand sickle cell, so this is the same exact story. the heterozygote form of this variantprotects you from trypanosome and then this is the lysis of the trypanosome by this g1variant, but if you have two copies of this
variant, you get kidney disease and then [unintelligible]ratios for kidney disease is not insignificant, go from two to 100x and if you actually gethiv on top of getting this variance, it’s almost like sure to develop this disease withthis two alleles. so, just in a way of that, so we -- my lab contributed to this by makinga mouse model for the variant, and indeed if we produce variants into specific celltype in the kidney which is these glomerulus epithelial cells, you get disease development.so, indicating that indeed this coding region variance is disease causing, so that’s oneway of finding those rare variants with large affect size going into a different population,but as part of the roadmap for five years we were looking at whether epigenetic differencescould explain this missing heritability. so,
this is actually -- just this part of my talkis pretty much published so if we looked at samples of 100 micro-dissected human patientsamples, kidney samples with different conditions of kidney disease, and then this is what [unintelligible]dissected, and then we looked at changes in this tubular epithelial cells that we micro-dissectedfrom patients samples of 100 kidneys, and with the genome via methylation analysis usinga method -- i would say it’s a -- something like an mre-chi like a methylation-sensitive-- [unintelligible] digestion was developed by john greally at einstein, and of coursethis illumina 40 to 50 arrays. and what we find is that indeed you can identify thisepigenetic changes in healthy and disease kidneys that are able to cluster normal anddisease samples quite nicely and separately,
and if you look at validation cohort, again,you’ll see that these methylation differences cluster and different in control samples anddisease samples, but i just would like to show some of the other things. so, we got fantastic p values with even fairlysmall samples, but what you see is the difference in methylation differences in absolute valuesscale is small, so what you see in kidney disease, and i think i see that in multipleother disease conditions. there are changes, there are very consistent changes; we canreplicate it in different samples the same changes, but the absolute difference in methylationlevel is fairly small, unlike in cancer when you can see a difference going from zero methylationto 100 percent methylation, these methylation
differences are small and of course the futureshould tell whether they are actually significant going through that route. we looked at whetherthese methylation differences are randomly distributed to the genome or they are maybeon promoters. there is a lot of data on promoter methylation differences influencing gene expression,but when we looked at by [unintelligible] mapping, these differentiated mapping regionswere depleted on promoter regions. we could hardly find any [unintelligible] differencein a promoter, and when we looked at by [unintelligible] mapping, these differentially methylated regionswere depleted on promoter regions. we could hardly find any methylation difference ina promoter, and when we looked at -- by chip-seq base annotation where they are, they wereactually on enhancers, and they were on kidney
specific enhancers when we were able to -- welooked at the nine encode cell lines again. so, these are small differences on enhancers;therefore, we could look at with that they could potentially influence transcriptionfactor binding, so we looked at the same computational analysis, and we find that they influenceseveral transcription factors. one of them was for example, six2, and we found a bunchof others, and then i -- very few of them are nephrologists, probably, in the audience,but this is actually a very important kidney development or transcription factor, so isthese two others. so, it seems that there was some sort of an enrichment on these enhancersthat they can computationally bind kidney specific developmental transcription factorsover here.
now, looking at the other way of whether thesedifferential methylation is actually functional, we looked at gene expression by mapping themto the nearby genes, and indeed we find correlation between differential methylations and transcriptlevel differences, so maybe these differential methylations actually drive gene expression,and if they drive gene expression maybe there are of course important in disease development,so we have some of like -- about 40 percent of them were correlating with gene expressionand this is going to be my last slide. and, they were also again enriched for developmentalprocesses. the same you find it when you do enhancers for h3k4 monomethylation; again,they are in enriched for developmental processes. so, that correlates with some of the dataand the literature that kidney disease maybe
developmentally programmed. this is a slidei borrowed from francine einstein from einstein, so if you feed rats on a controlled dietsand look at the pups, versus if you feed rats in a calorie restricted diet, then you lookat these pups and you see is that these pup with a calorie restricted diet developed onemeasure of kidney disease which [unintelligible] in there and that correlates the differencesin their epigenome and cytosine methylation levels, indicating that maybe indeed theyare programmed somewhere early on. so, this second set of conclusion is thatyou find small, but highly consistent cytosine methylation changes in kidney disease. tubalsamples they’re isolated, the methylation changes are enriched on kidney specific enhancers,and then they are enriched on fibrosis and
developmental genes are affected more commonlyand maybe that’s consistent that somehow this kidney disease has some sort of developmentalorigin which is being proposed in the literature in the past, and i would like to say thatmost of the work has been done by really talented graduate students, yi-an ko; she will be heretomorrow, and huigang yi, who is an informatics person in the lab, and the second half ofthe project is published and that was part of this roadmap epigenomics project and wehave lots of collaborators who helped us with the gwas studies or eqtl analysis and manyof the other work we have been doing. thanks so much. [applause]
[end of transcript]