In numerous research disciplines, including epidemiology, it is common to compare different occupational categories, such as office workers and non-office workers. When only self-reported occupation titles are available, it is necessary to categorize individuals based on their self-reported titles. Thus, the possibility to identify office workers via self-reported occupation titles can enhance research on the health and well-being of office workers in large population-based epidemiological studies, even without specific questions about office work. This paper introduces data and R code that can be used to assign a proxy variable for office worker based on responses to an open-ended question (OEQ) about occupation in Swedish questionnaires. The proxy variable is based on the Swedish Standard Classification of Occupations 2012 (SSYK 2012), which includes 8946 occupation titles. Using a translation key, the titles have been categorized into three groups: managers, white-collar workers, and blue-collar workers. White-collar workers (including managers) are considered office workers, while blue-collar workers are classified as non-office workers. The proxy variable has been refined using pilot data from the Swedish population-based epidemiological resource LifeGene. The R code, together with the proxy variable, can be used in any dataset with a Swedish OEQ about occupation, facilitating the categorization of respondents as either white-collar or blue-collar workers and serving as a proxy variable for office worker. The R code can be used for OEQs regardless of language, provided there is a dataset with a standard classification of occupation in the desired language.