Article Text

A model established using marital status and other factors from the Surveillance, Epidemiology, and End Results database for early stage gastric cancer
  1. Lixiang Zhang,
  2. Baichuan Zhou,
  3. Panquan Luo,
  4. Aman Xu,
  5. Wenxiu Han,
  6. Zhijian Wei
  1. General Surgery, First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China
  1. Correspondence to Professor Aman Xu, General Surgery, First Affiliated Hospital of Anhui Medical University, Hefei 230022, Anhui, China; xuaman{at}; Dr Zhijian Wei, General Surgery, First Affiliated Hospital of Anhui Medical University, Hefei 230022, Anhui, China; 305801533{at}


Currently, the postoperative prognosis of early stage gastric cancer (GC) is difficult to accurately predict. In particular, social factors are not frequently used in the prognostic assessment of early stage GC. Therefore, this study aimed to combine the clinical indicators and social factors to establish a predictive model for early stage GC based on a new scoring system. A total of 3647 patients with early stage GC from the Surveillance, Epidemiology, and End Results database were included in this study. A Kaplan-Meier survival analysis was used to compare differences in prognosis between different marital status, as an innovative prognostic indicator. Univariate and multivariate analyses were used to screen available prediction factors and then build a nomogram using the Cox proportional hazard regression model. The univariate analysis and multivariate analysis revealed that age at diagnosis, sex, histology, stage_T, surgery, tumor size, and marital status were independent prognostic factors of overall survival. Both the C-index and calibration curves confirmed that the nomogram had a great predictive effect on patient prognosis in training and testing sets. This nomogram based on clinical indicators and marital status can effectively help patients with early stage GC in the future.

  • marital status
  • prognosis
  • cancer

Data availability statement

Data are available in a public, open access repository.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, an indication of whether changes were made, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Significance of this study

What is already known about this subject?

  • The prognosis of early stage gastric cancer (GC) has always been the focus of GC research.

  • According to National Comprehensive Cancer Network guidelines, the prognosis of patients with early stage GC is correlated with age, tumor site, pathological stage, and other factors, but social factors were not taken into account and the impact of these factors on prognosis has not been quantified and comprehensively applied.

  • The prognostic evaluation ability of marital status has been fully recognized for patients with liver cancer, lung cancer, and other tumors.

What are the new findings?

  • Based on previous studies, we innovatively introduced the indicator of marital status, which has been proved to have an impact on the prognosis of patients with tumor in a number of studies.

  • A variety of factors, including race, gender, treatment style, pathological stage, and marital status were summarized, and their influence was comprehensively quantified.

How might these results change the focus of research or clinical practice?

  • All patients diagnosed with early stage GC can use our nomogram to assess the prognostic risk after receiving corresponding treatment.

  • Patients with high risk may receive relevant adjuvant therapy and moderately increase the frequency of physical examination.

  • In relevant policies, we should provide more social help and care to the widowed or single people.


Gastric cancer (GC) is the fifth most common cancer and third leading cause of cancer-related deaths globally, with over 1 million new cases of GC and about 780 000 deaths in 2018.1–3 In the past few decades, GC has been a main factor that has increased disability-adjusted life years globally, especially in areas with a GC high incidence, such as Japan, China, and other Asian regions.4 5 GC is approximately twofold to threefold higher in men than in women and is uniformly rare in young people aged <50 years,6 with increasing incidence rates after 50 years of age. Early stage GC is defined as GC limited in the lamina propria, mucosa, or submucosa, regardless of lymph node metastasis. Early stage GC has a greater chance of successfully getting removed through radical resection than advanced GC, consequently having a better prognosis than that of the latter. Therefore, it is essential to diagnose and treat GC early to improve prognosis.

Even for patients with early stage GC who underwent systematic treatment, accurately predicting GC prognosis is difficult. Therefore, it is meaningful to establish a reliable predictive model in combination with post-treatment indicators. We obtained a large amount of clinical data regarding patients with early stage GC from the Surveillance, Epidemiology, and End Results (SEER) database to acquire a large sample size and great authenticity and incorporate various research indicators. According to current research, tumor size, stage_N, histology, age at diagnosis, tumor location, and other factors can affect survival.2 7 8 However, these indicators have limited clinical developments, such as the wide application of endoscopic surgery (endoscopic mucosal resection and endoscopic submucosal dissection (ESD)). These organ-sparing therapies remain problematic for cancer cells and metastatic lymph node residues9–11; they also make fast recovery.12 13 Therefore, early GC treatment may also be a prognostic factor for patients with cancer. Moreover, there are also other indicators that are related to the prognosis of patients with cancer. Marital status, which is associated with prostate, cervical and rectal cancer,14–16 has emerged as an innovative risk factor in recent years. Some reports studied the impact of race on survival17 18 ; these reports studied multiple races and were different from traditional studies that had only focused on one race. Chemotherapy has been proven to be effective for treating GC in long-term clinical applications. In recent years, neoadjuvant chemotherapy has become an important part of the treatment of advanced GC,19 but the effect of chemotherapy on early stage GC remains controversial.

To the best of our knowledge, few studies have focused on the effect of these early stage GC indicators. Therefore, we performed a nomogram that can assess the impact of various indexes comprehensively to provide a basis for the prediction of the overall survival (OS) of patients with early stage GC.

Materials and methods

Data source and patient selection

In this study, we acquired data from the SEER database of patients with GC to evaluate the degree of the aforementioned factors. A nomogram, which is stable and visible, was used in our data analysis. A nomogram is based on multivariate analyses and integrates multiple predictive indicators; it can be used to diagnose diseases and predict their incidence or progression. We built a prediction nomogram based on independent accurate GC predictors. This gave us the ability to select an optional therapeutic regimen for individual patients. Research was restricted to tumors limited to the lamina propria, mucosa, and submucosa. Exclusion criteria in our study were as follows: (a) benign or stromal tumors; (b) distant metastasis or distant lymph node metastases; (c) second malignant primary indicator; (d) unknown chemotherapy; and (e) unknown survival time. Finally, 3647 cases were screened and included in this study as Figure 5 figure 1 showed. They were randomly divided into two groups—training and testing sets—based on a 3:1 ratio, respectively, meaning that 2719 people were in the training set and 928 people were in the testing set.

Data collection and end point

The following variables were included in our study: age at diagnosis, race, gender, tumor location, histology, grade, stage_T and stage_N, surgery in the primary site, lymph node dissection, chemotherapy, radiation, tumor size, insurance, and marital status. The main end point was OS, which was defined as the time from diagnosis until death due to any reason.

Age was divided into seven subgroups: ≤40, 40–50, 50–60, 60–70, 70–80, 80–90, and 90–100 years; race was divided into three subgroups: white, black, and other; the ‘other’ subgroup included Indian, Asian, and other minorities; tumor site was classified into eight subgroups according to the anatomy of the stomach: fundus, body, antrum, pylorus, lesser curvature, greater curvature, gastric overlapping area, and not otherwise specified (NOS); based on International Classification of Disease for Oncology, third edition, histology was divided into five subgroups: adenocarcinoma, signet ring cell carcinoma, special-type carcinoma, including carcinoid tumor, goblet cell carcinoid, and squamous cell carcinoma, other carcinomas, including neoplasms, diffuse type carcinoma, and linitis plastica, and unknown. Surgery was divided into five subgroups: no cancer-direct surgery, endoscopic surgery, partial gastrectomy, total gastrectomy, and unknown; lymph node dissection was divided into four subgroups: none, one to three regional lymph nodes removed, four or more regional lymph nodes removed, and unknown; and tumor size was divided into eight subgroups: invisible to the naked eyes, ≤1 cm, 1–2 cm, 2–3 cm, 3–4 cm, 4–5 cm, >5 cm or widespread, and unknown. Marital status was divided into six subgroups: married (including domestic partner), divorced, separated, widowed, single (never married), and unknown.

Statistical methods

Continuous variables were expressed as mean±SD. Categorical variables were identified by frequency and proportion, which were both analyzed by Student’s t-test and Pearson’s χ2 and Fisher’s exact tests. A Kaplan-Meier analysis was performed to describe and compare survival among different variables, and parameters included mean and median survival times with a 95% CI. We also performed the log-rank test to compare the significance of survival curves. In the Cox proportional hazards regression analysis, variables that were considered significant in the univariate analysis were put in the multivariate analysis. These indicators that were ultimately meaningful were used to establish a nomogram to predict 3-year and 5-year OS. The parameters of the Cox proportional hazards regression analysis included HRs and 95% CIs. The C-index was employed to measure the reliability of the nomogram. We also built calibration curves to examine outcomes. All data were analyzed using SPSS (V.23.0) and R software (V.3.4.3).


Baseline characteristics

A total of 141 954 patients were extracted from the SEER database, and 3647 suitable patients with early stage GC were included in this study. We divided patients into two cohorts—training (n=2719, 75% of data) and testing sets (n=928, 25% of data). Of the included patients, 1793 (49.2%) were male and 1854 (50.8%) were female. Moreover, 2231 were white, 607 were black, and 809 were put in the ‘other’ race subgroup. Regarding marital status, 1957 were married, 274 were divorced, 41 were separated, 630 were widowed, 512 were single, and 233 were classified into the ‘unknown’ group. Baseline characteristics of the training set is shown in online supplemental table 1. Age at diagnosis (p<0.001), race (p<0.001), gender (p<0.001), histology (p=0.007), grade (p=0.009), stage_T (p=0.025) and stage_N (p<0.001), surgery (p<0.001), tumor size (p=0.005), and insurance (p<0.001) were significantly different among marital status groups.

Supplemental material

Kaplan-Meier survival analysis of marital status groups

To explore the influence of different marital status groups on OS, the Kaplan-Meier survival analysis was performed in all patients in the training set. As shown in figure 2, married individuals had the best prognosis (average OS=72.084, 95% CI=70.847 to 73.321), and the OS of widows was the worst (average OS=60.150, 95% CI=57.057 to 63.244). To verify whether gender is related to the above-mentioned results, we conducted the Kaplan-Meier survival analysis in patients with GC of different genders. As shown in figure 3A,B, there were significant differences in OS between sexes (p<0.001). In both male and female patients with early stage GC, survival was highest for married individuals (male average, OS=69.187, 95% CI=67.446 to 70.928; and female average, OS=76.357, 95% CI=74.783 to 77.930), and survival was the worst in widows (male average, OS=51.704, 95% CI=45.206 to 58.202; and female average OS=61.885, 95% CI=58.476 to 65.293). It is worthy to note that survival was significantly better in divorced female patients than in divorced male patients. Simultaneously, we also performed the Kaplan-Meier analysis of each known marital status group among genders, except for the ‘separated’ group as it had a small sample size and was consequently of limited reference. As shown in figure 4, there were significant differences between male and female patients in each marital status group (married, p<0.001, figure 4A; divorced, p=0.020, figure 4B; widowed, p=0.025, figure 4C; and single, p=0.026, figure 4D). Survival was better in female patients than in male patients.

Figure 2

Kaplan-Meier survival analysis of overall survival among different marital status groups in patients with early stage gastric cancer (p<0.001).

Figure 3

Kaplan-Meier survival analysis of overall survival among different marital status groups in genders. Overall survival among different marital status groups in (A) male patients (p<0.001) and (B) female patients (p<0.001) with early stage gastric cancer.

Figure 4

Kaplan-Meier survival analysis of each known marital status group among different genders, except for the ‘separated’ group. Overall survival (A) between married male and female patients (p<0.001), (B) between divorced male and female patients (p=0.008), (C) between widowed male and female patients (p=0.009), and (D) between single male and female patients (p=0.029).

Prognostic factors of patients with GC

Univariate analysis results are shown in table 1. The analysis showed that age at diagnosis, gender, histology, stage_T and stage_N, surgery, lymph node dissection, chemotherapy, radiation, tumor size, and marital status were significant prognostic factors. These univariate analysis factors were included in the multivariate analysis. Multivariate results showed that age at diagnosis, sex, histology, stage_T, surgery, tumor size, and marital status were independent prognostic factors for OS (table 2).

Table 1

Univariate analysis of patients with early stage gastric cancer

Table 2

Multivariate analysis of patients with early stage GC

Prognostic nomogram for OS

Based on Cox regression models, a nomogram was constructed to predict the 3-year and 5-year OS of patients with early stage GC (figure 5). This nomogram created a scoring system in which each included variable can obtain a corresponding score of 0–100 according to their contribution to OS. After these scores were added to calculate the total score, the corresponding OS was predicted based on the scale at the bottom of the figure. This nomogram showed that tumor size was the most important prognosis factor, followed by age at diagnosis and surgery. Stage_T, marital status, gender, and histology also have a moderate impact on the prognosis of patients with early stage GC. The nomogram obtained in our study had good predictive ability and reliability.

Figure 5

Nomogram predicting the overall survival of patients with early stage gastric cancer.

Validation of the nomogram

In this study, we built a model that can predict the prognosis of patients with GC based on the SEER database; this model was validated by the testing set. The C-index was 0.791 and 0.685 in the training and testing sets, respectively, demonstrating that our nomogram was useful for patients with GC. Simultaneously, a calibration curve was used to examine the nomogram’s ability to predict the 3-year and 5-year OS of patients of training and testing sets. As shown in online supplemental figure 1, the prediction of the nomogram was closely related to the observed results. We also performed a receiver operating characteristic curve; the 3-year survival area under the curve (AUC) was 0.774 and 0.717 in training and testing sets respectively, and the 5-year AUC was 0.773 and 0.722, as shown in online supplemental figure 2.

Supplemental material

Kaplan-Meier curves for nomogram

According to scoring results, we divided patients into high-risk and low-risk groups (high-risk group and low-risk group were bounded by the median of the risk score) and performed the Kaplan-Meier survival analysis on these groups. As shown in online supplemental figure 3, there was a significant difference between the Kaplan-Meier curves of the high-risk group and those of the low-risk group, further demonstrating the reliability of the nomogram.


GC has two of the highest morbidity and fatality rates among cancers, originating from the gastric mucosal epithelium.2 It can grow in various sites of the stomach and can easily develop hematogenous or lymphatic metastases.3 In recent years, GC started to occur in young patients.1 It is known that, even at early stages, GC may recur or develop metastases; therefore, it is important to maintain routine treatment and reviews to prolong patient survival.20 But excessive treatment and examination will increase the financial burden on patients; however, it will affect GC prognosis. For example, enhanced CT, which is effective in diagnosing GC, is expensive and extremely unhealthy. Therefore, it is important to build a reliable nomogram that can accurately evaluate the recurrence risk of patients with GC postoperatively. Many studies revealed few GC prognostic factors, such as tumor size and invasion depth. However, these factors were limited and focused only on tumor growth and not on the patients’ general condition and treatment information. Our research was based on the SEER database and included different races, innovatively adding some indicators that were proven to be associated with many kinds of cancer14 16; such indicators, such as marital status, are rarely used for GC. Although some studies have used nomograms to predict the prognosis of patients with GC,21–23 we attempted to establish a prognostic nomogram combining multifarious clinical indicators, pathological characteristics, and treatment information to evaluate the probability of 3-year and 5-year OS of such patients.

In our study, the nomogram was more credible and persuasive as the outcomes were obtained from the data of the training set and then validated by testing set. First, we performed a univariate analysis including all factors; of these factors, we selected those that were significant, including age, sex, histology, and surgery, and brought them into the multivariate analysis. The multivariate analysis revealed that age at diagnosis, sex, histology, stage_T, surgery, tumor size, and marital status were independent prognostic factors of OS. A nomogram was constructed based on these factors, and the C-index was 0.791. Calibration curves showed great consistency between prediction and observation results, and there was a significant difference between the high-risk and low-risk groups. Moreover, the AUCs of 3-year and 5-year survivals were 0.774 and 0.773, respectively.

The nomogram has been continuously proven to be a reliable and accurate prognostic prediction tool in recent studies. It can evaluate survival using various comprehensive indicators and acquire a better prediction effect than other prediction tools. For patients with early stage GC, based on the nomogram obtained in this study, combined with clinical information, we can obtain a postoperative patient risk rating. For high-risk patients, review frequency and follow-up times should be increased. Patients themselves should pay more attention to symptom fluctuation and improvements in lifestyle.

From the seven factors included in our nomogram, tumor size was the largest contributor to OS. This is in line with our usual perception, which is that a larger tumor is more aggressive and that a barely visible carcinoma in situ is indolent. Ohashi et al thought that both tumor size and depth could be used as combined prognostic indicators.24 Our scoring system also included tumor invasion depth, and the T1b score was moderately higher than the T1a score. Tumors with a higher T stage have deeper infiltration, and there are more vascular and lymphatic vessels in the submucosa than in the mucosa, causing tumor cells to spread further and making them more difficult to remove; this directly worsens patient survival. Age was also an OS risk factor. Looking at the overall trend, old patients scored higher nomogram scores and had a worse prognosis than young patients. This might be attributed to the fact that elderly patients have a worse general condition and immune tumor cell clearance and more underlying diseases than the former.

It is worthy to note that different surgical methods also have a certain impact on prognosis. Patients who did not undergo surgery had the worst prognosis, indicating that surgery is still the most effective treatment for GC. Patients who underwent partial gastrectomy scored best on the nomogram, while patients who underwent endoscopic surgery and total gastrectomy had similar scores. This does not mean that total gastrectomy is not effective for treating GC because the condition of patients who need total gastrectomy might be more serious. Whether partial gastrectomy or endoscopic surgery is better for early stage small-diameter GC has remained a controversial issue in clinical practice, and a few studies have been dedicated to provide references to choose the correct treatment. Nishizawa and Yahagi indicated that patients receiving ESD generally had a better quality of life postoperatively; however, they also had a higher incidence of metachronous GC.25 Mun et al reported that endoscopic surgery has fewer complications than traditional surgery based on the fact that the OS of the former is no less than that of the latter.12 Nomura and Okajima suggested that we should try to reduce the extent of gastrectomy if GC can still be cured.26 Our nomogram showed that partial gastrectomy was generally better than endoscopic surgery, indicating that, according to the current medical level, we should be cautious in using endoscopic surgery instead of traditional surgery. For patients with large tumor diameters and poor histological types, partial gastrectomy should be preferred to ensure radical resection. Because of its lower trauma rate and higher safety, endoscopic surgery can be applied to patients who cannot tolerate surgery due to advanced age or underlying diseases.

Marital status, a factor that is rarely enrolled in GC research, also showed moderate influence on survival in our nomogram. Married patients had the best prognosis, followed by single patients, and the prognosis of separated patients was the worst. This result was consistent with the Kaplan-Meier survival curve. We speculate that this might be due to the fact that married patients had better financial conditions and emotional encouragement, while separated patients might be more likely to experience financial difficulties emotional loss. Previous studies have shown that lower social support had a poorer prognosis for patients with cancer,27 and marriage, one of the most social factor, was related to the prognosis of early GC in our study, this could be attributed to early diagnosis from the reminder and supervision of their partner. And our research showed that the married patients have the highest proportion of T1a and N0 than other group. Besides, research reported unmarried patients were less likely to have chemotherapy in patients with cancer (787/1360, 57.9%).28 Therefore, the single patients and widowed patients need more attention and social help. Our study showed that the prognosis of female patients are better than male patients, this was consistent with other study,29 this might be related to genetic differences between men and women. Li et al 30 reported that the expression of different core genes and differences in pathways were associated with the variation observed among patients with GC of different races and sexes.28 Different lifestyles as a result of different sexes might also affect prognosis. Careful support may be required to improve the prognosis of male patients. Considering histology, signet ring cell carcinoma had the worst survival according to the nomogram. This result is also consistent with traditional clinical knowledge. Riihimäki et al demonstrated that signet ring adenocarcinomas had a higher probability of metastasis within the peritoneum, bone, and ovaries than adenocarcinomas.31 The higher risk of metastasis may be the reason for the worse prognosis in patients with signet ring cell carcinoma.

To the best of our knowledge, this is the first SEER-based nomogram combining comprehensive clinical indicators to predict OS in patients with early stage GC. However, our research has some limitations. To improve the reliability of our study, we divided the screened SEER data into training and testing sets at a ratio of 3:1; however, the validation of the local medical center data was still missing. Second, retaining unclearly classified data or data displayed as ‘unknown’ enlarged the scope of application of the nomogram, and increased mutual interference between data to a certain extent; this affected the accuracy of the nomogram. Third, some well-known risk factors of GC, such as family history, alcohol and Helicobacter pylori (HP) infection, were not enrolled. These indicators were scarce in the SEER database as it was difficult to acquire them. For example, there is no clear standard to determine whether the patient has a history of drinking alcohol based on the amount of alcohol consumed, the frequency of drinking, and the time of abstinence. Moreover, HP infection examinations are not routinely performed in many areas, making it difficult for the data to be applied in large databases.


In conclusion, our nomogram included age at diagnosis, sex, T stage, histology, tumor size, surgery, and marital status as risk factors effectively predicted the prognosis of early stage GC. This nomogram can help assess the prognosis and treatment of patients with GC.

Data availability statement

Data are available in a public, open access repository.

Ethics statements

Patient consent for publication

Ethics approval

Not applicable.


Supplementary material

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • LZ and BZ are joint first authors.

  • LZ and BZ contributed equally.

  • Contributors LZ and BZ made the contribution to the main work equally, they designed the study. LZ provided the databases and prepared supplemental figures 1–3. PL, AX, WH, and ZW provided help and analyzed the data. BZ prepared figures 1–4 and wrote the manuscript. LZ acts as guarantor and be responsible for the overall content. All authors contributed to the article and approved the submitted version.

  • Funding This study was funded by Natural Science Foundation of Anhui Province, No. 2 108085QH337.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.